dfsdfsdfsdsfdsfdsfs

Embed Size (px)

Citation preview

  • 7/27/2019 dfsdfsdfsdsfdsfdsfs

    1/18

    www.blinksis.com Blinksis Technologies, Inc.

    Big DataWhat is it and where did it

    come from?

    PRMIA Event on Big Data

    January 15, 2013

    Michael Di Stefano

  • 7/27/2019 dfsdfsdfsdsfdsfdsfs

    2/18

    www.blinksis.com January 15. 2013PRMIA Event on Big Data

    Agenda

    What is Big Data What Makes Big Data Different

    Why New Big Data Technologies

    Big Data in Financial Services

    Performance is Relevant

  • 7/27/2019 dfsdfsdfsdsfdsfdsfs

    3/18

    www.blinksis.com January 15. 2013PRMIA Event on Big Data

    What is Big Data

    From Wikipedia a collection ofdatasets so large and complexthat it

    becomes difficult to processusing on-hand databasemanagement tools

    Strata (OReilly) Big data is data that exceeds the processing capacity of

    conventional database systems. The data is too big, moves toofast, or doesnt fitthe strictures of your database architectures.To gain valuefrom this data, you must choose an alternative wayto process it

    Microsoft Research Businesses today operate on the monitor-mine-manage (M3)

    cycle: they monitor and archive large amounts of data, whichthey mine to derive insightssuch as models. The models areused during the manage phase to add value to the business

  • 7/27/2019 dfsdfsdfsdsfdsfdsfs

    4/18

    www.blinksis.com January 15. 2013PRMIA Event on Big Data

    Mention Big Data to Someone

    Hadoop(distributed file system)

    MapReduce (programmingmodel large datasets)

    Hbase(db on top of hadoop)

    Cassandra (db)

    MongoDB (db)

    Pig (analyze large data sets)

    Dremel (scalable, interactivead-hoc query system for analysis ofread-only nested data)

    Hive (db on top pf hadoop) HiveQL

    Rhadoop

    Flume (collecting, aggregatingand moving large amounts of log toa centralized data store)

    Yarn (NextGen MapReduce)

    Avro (data serialization)

    ZooKeeper (distributedconfiguration, synchronizationservice,and naming registry)

  • 7/27/2019 dfsdfsdfsdsfdsfdsfs

    5/18

    www.blinksis.com January 15. 2013PRMIA Event on Big Data

    Theres More

    A Familiar Diagram

  • 7/27/2019 dfsdfsdfsdsfdsfdsfs

    6/18

    www.blinksis.com January 15. 2013PRMIA Event on Big Data

    Why Did All This Come To Be

    Internet Search/Indexing Web Logs

    Machine Generated Data

    Social Data/Networks Science Research (astronomy, atmospheric science,

    genomics, biogeochemical, biological, )

    Military/Surveillance

    Medical records

    E-Commerce

  • 7/27/2019 dfsdfsdfsdsfdsfdsfs

    7/18www.blinksis.com January 15. 2013PRMIA Event on Big Data

    What Makes Big Data Different

    Remember the Definition of Big Data Data large and complex

    Difficult to process using on-hand database managementtools

    Too big

    Too fast

    Doesnt fit the strictures of your database architectures

    Gain value

    monitor and archive large amounts of data

    mine to derive insights add value to the business

  • 7/27/2019 dfsdfsdfsdsfdsfdsfs

    8/18www.blinksis.com January 15. 2013PRMIA Event on Big Data

    The Basics How Is It Done

    How to store 100s of Terabytes ?

    MasterServer

    Server 1 Server 2 Server n

    Database Server Cluster? Lots of Commodity Disks?

    Hadoop

  • 7/27/2019 dfsdfsdfsdsfdsfdsfs

    9/18www.blinksis.com January 15. 2013PRMIA Event on Big Data

    The Basics How Is It Done

    How to Query/Index/Simple Calcs 100s of Terabytes ?

    MapReduce

    Start

    Map

    Map

    Map

    Map

    Map

    Map

    Reduce

  • 7/27/2019 dfsdfsdfsdsfdsfdsfs

    10/18www.blinksis.com January 15. 2013PRMIA Event on Big Data

    The Basics How Is It Done

    Steps Start to Finish?

    Start

    Map

    Map

    Map

    Map

    Map

    Map

    Reduce

    DataData

    DataData

    Data

    Get Data Save Data Query Data

  • 7/27/2019 dfsdfsdfsdsfdsfdsfs

    11/18www.blinksis.com January 15. 2013PRMIA Event on Big Data

    Big Data in Financial Services

  • 7/27/2019 dfsdfsdfsdsfdsfdsfs

    12/18

    www.blinksis.com January 15. 2013PRMIA Event on Big Data

    The 5th Dimension - Time

  • 7/27/2019 dfsdfsdfsdsfdsfdsfs

    13/18

    www.blinksis.com January 15. 2013PRMIA Event on Big Data

    When and Where to Use It

    What is the Balance Between Value of the Result vs. Time ? Time to Store Data Time to Query/Analyze Data Complexity of the Query/Analysis

    Start

    Map

    Map

    Map

    Map

    Map

    Map

    Reduce

    Data

    DataDataData

    Data

    Get Data Save Data Query/Analyze Data

  • 7/27/2019 dfsdfsdfsdsfdsfdsfs

    14/18

    www.blinksis.com January 15. 2013PRMIA Event on Big Data

    Actoinable Analytics Big Data

    Start

    Map

    Map

    Map

    Map

    Map

    Map

    Reduce

    In-Depth AnalysisReporting

    DataData

    DataData

    Data

    In-Time AnalyticsAct As Soon AsData Is Available

  • 7/27/2019 dfsdfsdfsdsfdsfdsfs

    15/18

    www.blinksis.com January 15. 2013PRMIA Event on Big Data

    Can Big Data Help Wall Street

    New and Improved Trading Strategies

    Tick Data

    News

    Social Media

    (Twitter)

    Social Media(Facebook)

    Social Media

    (Others)

    Does Social Media (MarketSentiment) Indicate Trends

    in Market Directions:

    Is Increased Traffic in SocialMedia a Leading MarketIndicator?

    Does Social Media Contentindicate Consumer like anddislike that will affect a stocks

    value?

  • 7/27/2019 dfsdfsdfsdsfdsfdsfs

    16/18

    www.blinksis.com January 15. 2013PRMIA Event on Big Data

    Can Big Data Help Wall Street

    KnightCapital

    (8/2012)

    JPMCRogueTrader

    (4/2012)

    DOW

    1000ptGlitch

    (5/2010)

    ??????

    Trading Systems Out PacingRisk/Surveillance/Compliance Systems

  • 7/27/2019 dfsdfsdfsdsfdsfdsfs

    17/18

    www.blinksis.com January 15. 2013PRMIA Event on Big Data

    Open Discussion

  • 7/27/2019 dfsdfsdfsdsfdsfdsfs

    18/18

    www.blinksis.com January 15. 2013PRMIA Event on Big Data

    Thank You

    Michael Di Stefano

    (732) [email protected]