Big Data in Munich RE - SAS · PDF fileBig Data Analytics @ Munich Re ... • Hue • Hive • Ambari ... Hive HiveHive Hive Hive Hadoop Frontend SERVER / OS SERVER / OS x < y

Embed Size (px)

Citation preview

  • Big Data Analytics @ Munich ReSAS Global Forum Executive Program - Orlando

    Wolfgang Hauner Marc Wewers

    Chief Data Officer, Munich Re IT Architect, Munich Re

    Center for International Earth Science Information Network - CIESIN - Columbia University. 2016. Gridded Population of the World,

    Version 4 (GPWv4): Population Count. Palisades, NY: NASA Socioeconomic Data and Applications Center (SEDAC).

    http://dx.doi.org/10.7927/H4X63JVC.

    http://dx.doi.org/10.7927/H4X63JVC

  • Agenda

    2

    Data Analytics Framework

    1Technology

    2

    Method Example: AI

    3Case Study: Cross Selling

    4

  • Loc-based services

    Smart Home

    Telematics

    Virtual

    Assistant

    Systems

    Haptic

    Technologies

    Integrated

    Systems

    Autonomous Systems

    and Devices

    Automated

    Decision

    TakingCloud/Client

    ArchitectureNew Payment

    Models

    Big Data

    Internet

    of Things

    Cybersecurity

    Digitalization

    Computing

    Everywhere

    Robotics/Drones

    Wearable Devices

    Risk-based

    Security

    Context-aware

    Computing

    Open Data

    Collaborative

    Consumption

    Predictive

    Analytics

    Industrialization 4.0

    Web 4.0

    Web-Scale IT

    Software-defined

    Anything

    Crowdsourcing

    Mobile Health

    Services

    3D Printing

    Augmented and virtual worlds

    Citizen Development

    User Centered Design

    Digital

    Identity

    On-Demand-Everything

    Big Data in Trend Radar

    3

    Big Data

    Digitization

    Internet of Things

  • When does it become BIG Data

    40,000,000,000,000,000,000,000

    ByteKilobyteMegabyteGigabyteTerabytePetabyteExabyteZettabyte

    Source: IBM

    4 KB Commodore

    VC 203.5 inch

    floppy disk

    4 TB in Memory

    Big Data Platform

    MR

    Petabyte storage

    big data platform

    Google,

    Facebook,

    Microsoft,

    All words ever

    spoken by

    humans

    Yes or No

    4

    43 zettabytes of data will probably be generated by 2020

    300 times the volume in 2005

    Data contained

    in a library floor

  • Big Data

    Analytics

    Methods

    Regression Models

    Machine Learning

    Models

    Text Mining

    Technology

    Hardware

    (Compute power)

    Software

    (SAS, R, Spark, )

    Data

    Internal Data

    External Data

    Structured Data

    Unstructured Data

    People

    Data Scientists

    Data Engineers

    Business People

    IT Architects

    Big Data Analytics is a Combination of Methods,

    Technology, Data and People

    5

  • Building the Team, and the Environment

    Programming

    Story-telling

    Statistics

    Visualization

    System

    Implemen-

    tation

    DB Administration

    Maths

    Modelling Data Storage

    Business-/

    Domain

    knowledge

    6

    Business-

    UnitsIT

  • Building the infrastructure

    BI Lab Production

    7

    Data Lake (HDFS)

    Long term unstructured and structured data

    Industry icon, tool box and image: used under license from shutterstock.com

    Other icons: Munich Re

    SASHANA Hadoop Stack

    HANA

    User InterfaceUser Interface User Interface

    SASHANA Hadoop Stack

    HANA

    User InterfaceUser Interface User Interface

    A2P

  • Wolfgang Hauner / Big Data Analytics & Artificial Intelligence 8Source: Munich Re

  • Which topics drive our clients?

    Up-/Cross-Selling

    Data

    Sources

    Textmining Churn

    Analysis

    Supply

    Chain

    Social Media

    Analysis

    Fraud

    Detection

    Big Data

    Technology

    Predictive

    UW

    Telematics

    Sensor

    Data/IoT

    Geospatial

    9

  • Big Data use cases in insurance

    Make the uninsurable insurable

    Diabetics

    Wind Energy

    Consolidate the information and process

    Automated underwriting

    Risk management platform

    Artificial Intelligence supported workflow

    Early Loss Detection

    Visual Loss Adjustment

    10

    Image: dpa Picture Alliance Image: Getty Images

    Image: Getty Images

    Image: used under license from shutterstock.com

    Image: used under license from shutterstock.com

    Image: used under license from shutterstock.com

  • Agenda

    11

    Data Analytics Framework

    1Technology

    2

    Method Example: AI

    3Case Study: Cross Selling

    4

  • Design principles for Big Data & Analytics Platform

    12

    SAS & Hortonworks Self-Service Multi Tenancy

    One Central Datalake

    DevOps On-Prem & CloudContinious improvement

    Automation

    Hybrid

  • Roadmap to Production via Lab environment

    Q2 - 2015 Q3 - 2015 Q4 - 2015

    Setup of new

    BI-Lab Hadoop Cluster

    On-boarding & support of

    Big Data & Analytics pilots

    Stabilization of BI-Lab Hadoop cluster

    Authentication & Security

    AutomationNew BI-Lab Hadoop Cluster available

    Large shared cluster

    Dedicated clusters

    Single-Node cluster

    Pilot SAS Hadoop Integration

    13

    Design Setup / Build Run

    Setup of first

    BI-Lab Hadoop

    Cluster

    Enhance / Optimize

    Enhance / Optimize

  • Building the Big Data & Analytics Platform

    Production Environment

    14

    Design Setup / Build Run

    2016

    Release v1.0 Release v2.0 Release v3.0

    SAS 9.4 M3

    SAS Visual Analytics (VA)

    Self-Service Data Upload

    SAS Embedded Process

    for Hadoop

    SAS Enterprise Guide (EG)

    SAS MS Office Add-in

    Data Access to SAP HANA,

    Oracle & MS SQL-Server

    SAS Enterprise Miner

    SAS Contextual Analysis

    SAS Mobile BI iOS App

    HDP 2.3

    Hue

    Hive

    Ambari

    Ranger with LDAP

    Sqoop

    Pig

    Spark 1.4

    Oozie

    HDP 2.4.2

    Ambari Views

    Spark 1.6

    Solr Cloud

    Tesseract

    Start setup platform Release v4.0

    01 02 03 04 05 06 07 0812 09 10 11 122017

    SAS VA Row Level Security

    SAS HA

    HDP 2.5

    Atlas

    Zeppelin

    Data Catalogue Tool

    Data Lineage

    Compliance & Security

    Optimize

    2-week iterations withRolling Upgrades

    Enhance / Optimize

    Enhance

  • Big Data & Analytics Production Environments

    IT and Business Deployment

    Sandbox

    Integration

    Production

    Self-Service

    Ad-hoc Analytics

    Scheduled

    Analytics

    Business

    Deployment

    IT D

    eplo

    ym

    ent

  • Big Data & Analytics Production Environments

    Scalability

    Sandbox

    Integration

    Production

    SAS

    HWX

    SAS

    YARN

    EP

    LASR

    Hive

    YARN

    HiveYARN

    Hive

    YARN

    Hive

    Scalability

    LASR LASR

    LASR

    LASR

    LASR

    EP EP

    EP

    Hive

    YARN

    Hive

    YARN

    Hive

    YARN YARN

    EP

    EP

    Hive

    Hive

    YARN

    Hive

    YARN

    Hive

    YARN

    Hive

    YARN

    Hive

    YARN

    Hive

    YARN

    EP EP EP EP

    EP EP EP EP

  • Simplified Server Architecture SAS and HDP

    Data Node 1 Data Node 2 Data Node 3 Data Node x Data Node x+1 Data Node x+2 Data Node y

    SAS

    In-Memory

    LASR

    SAS

    In-Memory

    LASR

    SAS

    In-Memory

    LASR

    SAS

    In-Memory

    LASR

    SAS EP SAS EP SAS EP SAS EP SAS EP SAS EP SAS EP

    HDFS HDFS HDFS HDFS HDFS HDFS HDFS

    SAS

    Mmgt &

    Metadata

    Hadoop

    Mmgt &

    Metadata

    Hive Hive HiveHive Hive Hive Hive

    Hadoop

    Frontend

    SERVER / OS SERVER / OS SERVER / OS

    x < y

    EP = Embedded Process

    bring calculation to data

    Ambari Views,

    Zeppelin,

    17

    YARN YARN YARNYARN YARN YARN YARN

    Solr Solr SolrSolr Sol Solr Solr

    Spark Spark SparkSpark Spark Spark Spark

  • Lessons learned

    Make use of Lab environment

    Enable Self-Service

    Agile IT-Project management

    18

    1

    2

    Automatization

    Security

    YARN queue management3

    4

    5

    6

  • Agenda

    19

    Data Analytics Framework

    1Technology

    2

    Method Example: AI

    3Case Study: Cross Selling

    4

  • Artificial Intelligence (AI) is coming

    20

    Automation of physical tasks

    1st Machine Age

    Automation of cognitive tasks

    2nd Machine Age

    Image: used under license from shutterstock.com

    Image: dpa Picture Alliance

  • AI Evolution

    21

    1997

    IBMs deep blue defeats world

    chess champion

    Purely rule based

    2011

    IBMs Watson AI system wins

    Jeopardy match against

    human players

    Mixed machine learning and

    rule based

    2016

    Googles DeepMind defeats top

    ranked Go player (Lee Se-dol)

    Purely machine learning based

    Rule based Hybrid Learning based

    Image: dpa Picture Alliance / Stan_Honda Image: dpa Picture Alliance / Seth Wenig Image: dpa Picture Alliance / Lee Jin-man

  • Insurance specific AI

    22

    Munich Re as industry

    leader in insurance-

    specific-AI

    Insurance

    specific AI

    General AI Google, Facebook, Microsoft, Open AI