WP Machine Learn Hadoop

Embed Size (px)

Citation preview

  • 8/10/2019 WP Machine Learn Hadoop

    1/2

    White Paper

    Machine Learning in the Enterprise Hadoop

    2014 Alethe Labs All rights reserved. Alethe Labs and the Alethe Labs logo are trademarks or registered trademarks of Alethe Labs. All other

    trademarks are the property of their respective companies. Information is subject to change without notice.

    1.Abstract

    Over the past few years organizations

    are storing huge amounts of data sets

    and using this Big Data as their

    competitive advantage. The challengelies when a human has to intervene in

    every scenario of processing and

    analyzing Big Data. The ideal case is

    made when machine can collaborate

    and help humans in decision-making.

    This white paper examines the role of

    machine learning in the most popular

    Big Data platform, Hadoop.

    This is one of the many white papers we

    plan to publish and understand the roleof machine learning with Hadoop to

    enhance business processes.

    2. Key Words

    Hadoop, Big Data, Machine Learning

    3. Introduction

    Organizations today generate huge

    amount of data. This data includes bothunstructured and structured data that

    is stored in silos of databases and

    archive solutions. Managing and

    analyzing this data using legacy

    systems is challenging and sometimes

    impractical. Companies today focus on

    generating business benefits out of their

    huge data sets. Hadoop is unparalleled

    as a high efficiency Big Data platform to

    store, process and analyze data in cost

    effective model.

    The challenge lies in acquiring

    knowledge base from the experts to

    understand the processing of data sets,

    the data interpretation techniques and

    creating value that empowers business

    decisions. Either we can create different

    processes for knowledge base or we can

    code the learning technique inside the

    machine i.e. Machine Learning.

    4.Apache Hadoop & Big Data

    Apache Hadoop is an open source

    distributed software platform for

    storing and processing data running on

    multiple servers. Hadoop is written inJava. Hadoop implements a

    computational paradigm named

    Map/Reduce, where the application is

    divided into many small fragments of

    work, each of which may be executed or

    re-executed on any node in the cluster.

    In addition, it provides a distributed file

    system (DFS) that stores data on the

    compute nodes, providing very high

    aggregate bandwidth across the cluster.

    Using Hadoop, you can store petabytesof data reliably on tens of thousands of

    servers while scaling performance cost-

    effectively by merely adding inexpensive

    nodes to the cluster.

    Following is the Hadoop physical

    architecture with the master & slave

    nodes and the Hadoop logical

    architecture with Map Reduce.

  • 8/10/2019 WP Machine Learn Hadoop

    2/2