Upload
barrettpeterson
View
371
Download
1
Embed Size (px)
DESCRIPTION
Citation preview
1ICPAS Metro Chapter Barrett Peterson September 25, 2013
BUSINESS INTELLIGENCE & ADVANCED ANALYTICS
The Search for Patterns, Waldo, and Black SwansBarrett Peterson, C.P.A.
ICPAS Chicago Metro Chapter, September 25, 2013
2ICPAS Metro Chapter Barrett Peterson September 25, 2013
WHYBUSINESSINTELLIGENCE?
Information
Good Data
Good Analysis
3ICPAS Metro Chapter Barrett Peterson September 25, 2013
BIG DATA AND ANALYTICS - WHY
PREDICTION and PATTERN
IDENTIFICATION
4ICPAS Metro Chapter Barrett Peterson September 25, 2013
• Digitization Datafication • Correlation, more that causality• Reduced emphasis on sampling• “Messy” data usable for many
applications, but not all
BIG DATA AND ANALYTICS – CRITICAL ATTRIBUTES
5ICPAS Metro Chapter Barrett Peterson September 25, 2013
• Reduced privacy and handling “private” data• Over reliance on, and over confidence in. data
and analysis• Currency – correlations can change over time• Predictions are hard to make, especially about
the future. - Niels Bohr [Not Yogi Berra].
BIG DATA AND ANALYTICS - RISKS
6ICPAS Metro Chapter Barrett Peterson September 25, 2013
HISTORY AND BACKGROUND
7ICPAS Metro Chapter Barrett Peterson September 25, 2013
• Computer based business intelligence systems is an idea that is middle aged – about 40 . Previously described as:
– Decision support systems [DSS]– Executive information systems
[EIS]– Management information systems
[MIS]
A LITTLE BACKGROUND
HISTORY
A trip down memory lane
8ICPAS Metro Chapter Barrett Peterson September 25, 2013
• Internet Development– ARAPNET and others – 1960s– Internet Protocols – 1982, presumably by Al Gore
• IBM researcher Edgar Codd credited with development of relational data base theory in 1970.
• IBM’s Donald Chamberlin and Raymond Boyce develop structured query language [SQL] in the early 1970s to manipulate and retrieve data from IBM’s early relational data base management system
• World Wide Web and 1st web browser invented by Tim Berners-Lee in 1990 by combining the internet, hypertext mark-up language, and Uniform Resource Locator [URL] system. Became Nexus.
• Mosaic, designed by Marc Andressen became the first commercial web browser [Netscape].
• Development of big data enabling database designs and high speed processing during the last 15 years.
A LITTLE BACKGROUND
History
ImportantTechnologyInventions
9ICPAS Metro Chapter Barrett Peterson September 25, 2013
• Development of the primary infrastructure– Database design– Processing and Storage Hardware– Server Development and Massively Parallel
Processing• Improved telecommunications speed• Hardware miniaturization, capacity, and speed
– Memory [RAM] capacity– Storage capacity and transfer speed– Bus speed– Video processing capacity and speed
• Increased hardware speed and capacity• Digital formats for sensors, cameras, RFID, and
other data collection sources• Mobile computing• “Cloud” capability exploits many of these
developments
A LITTLE BACKGROUND
History
DriversEnablingBI and AdvancedAnalytics
10ICPAS Metro Chapter Barrett Peterson September 25, 2013
• Analytics• Business Intelligence• Knowledge Management• Content Management• Data Mining• Big Data• Data Integration• Datafication• Gameification• Blob [Binary Large Object]
A LITTLE BACKGROUND
TERMINOLOGY
A consultant’s collection ofconfusing names - a sampler
11ICPAS Metro Chapter Barrett Peterson September 25, 2013
• CPU speed and power– Moore’s law– Multi-core chips– Solid State Memory
• Storage improvement and cost reduction– Greatly increased capacity – petabytes and
more; IBM’s first hard drive in 1958 was 3.75MB
– Greatly increased access/transfer speed– Greatly reduced cost
• Data collection from a wide range of devices
• Data communications – speed and volume• Database management techniques and
software• Application speed and power
A LITTLE BACKGROUND
DriversAndEnablers ofBigData
12ICPAS Metro Chapter Barrett Peterson September 25, 2013
BUSINESS INTELLIGENCE AND ADVANCED
ANALYTICSDEFINED
13ICPAS Metro Chapter Barrett Peterson September 25, 2013
A system comprised of “computer” hardware, storage hardware, operating system, database software, file systems, and application software to:
• Collect, “clean”, filter, “tag”, and integrate data
• Store data [hardware and software]• Provide knowledge management, analytical
, and presentation tools to translate data into decision useful information
TONIGHT’S CRITICAL DEFINITIONS
BusinessIntelligence
14ICPAS Metro Chapter Barrett Peterson September 25, 2013
• Prehistoric – Mainframe Era– DSS, EIS, MIS– Hierarchical Master Data Files
• The Current Era [Primarily] – Business Intelligence– Primarily “structured” data [data that can
be represented in relational /dimensional tables or flat files], and BLOB [binary large object] formats
– Analysis of “known”, defined ,patterns– Presented in tables, simple charts, and
dashboards
• Emerging – Big Data and Advanced Analytics– to discover new, changing, or variable
patterns– A wide variety of “unstructured” digital
data formats added to “structured” data– Emerging storage structures– “Exploratory” analytics – Zoomable User Interface [ZUIs]– Solid State Memory and Solid-State Drives
TONIGHT’S CRITICAL DEFINITIONS
Business IntelligenceGenerations
15ICPAS Metro Chapter Barrett Peterson September 25, 2013
THE HARDWARE AND SOFTWARE ELEMENTS OF
BUSINESS INTELLIGENCE
16ICPAS Metro Chapter Barrett Peterson September 25, 2013
• Computer – CPU, Memory, and Operating System Software• Data Collection
– Master Data Management– Collection Processes and Devices– Data Cleansing Processes and Software
• Data Storage – Petabyte capable– Physical Devices and Storage Management Software– Data Management and Integration– Database Software Storage
• Relational – Traditional ERP/Transaction systems• Dimensional – Traditional Data Warehouse, including
associated BLOB• Distributed , Multiple Server, Storage Systems• NoSQL [Not Only SQL] Distributed Operational Stores• Apache Hadoop for Highly Parallel Processing and
certain Intensive Data Analytics Applications• DBMS System: Apache Cassandra; Amazon Dynamo• Middleware Software• High Speed Data Communications – Petaflop capable• Business Intelligence Application Software
– OLAP, Dashboard, and Chart Reports– Statistical Analysis and Presentation Tools
BUSINESS INTELLIGENCE ELEMENTS
PrincipalComponentsfor MaximumApplication
17ICPAS Metro Chapter Barrett Peterson September 25, 2013
• Data Governance and Management– Uniform terminology– Uniform meaning– Uniform units of measure– Metadata
• Data Structure and Attributes– Structured - Relational/Dimensional– Unstructured– Rate of change, context, and other
attributes
• Data Collection and Preparation– Filtering, particularly “Big Data”, and
“tagging”– Extract, Transform, Load [ETL] for
“structured data
• Data Base File Systems• Data Storage and Retrieval
– Capacity– Access/Retrieval speed
BUSINESS INTELLIGENCE ELEMENTS
DATAISSUES:THECORNERESTONE
18ICPAS Metro Chapter Barrett Peterson September 25, 2013
• Metadata management– Business definitions , rules, sources– Technical attributes, such as type, scale,
transformation methods– Processing requirements – filtering, tagging,
ETL, aggregation, summarization• Data Definitions and data dictionaries
– Name– Unit(s) of measure
• Data collection and filtering or transforming requirements– Sources – internal and external– Context addition/filtering requirements
• Data integration specifications– Multiple platforms and applications– Mapping to intermediate data marts
• Privacy requirements– Personal Identifying data– Laws: HIPPA, Privacy act
BUSINESS INTELLIGENCE ELEMENTS
MASTERDATAGOVERNANCEANDMANAGEMENT
19ICPAS Metro Chapter Barrett Peterson September 25, 2013
• Data Structures– “Structured” Data , principally text
and numbers capable of incorporation in relational or dimensional tables
– “Unstructured” Data, not suitable for relational tables, many in newer data formats, including images
• Big Data Attributes– Both “structured” and “unstructured”– The four major “Vs” of big data
• Volume - huge• Velocity – fast changing, unlike
structured• Variety – format and content• Variability – lacks the consistency, and
perhaps precision, of structured data
BUSINESS INTELLIGENCE ELEMENTS
DataStructuresandAttributesAre CriticalDrivers
20ICPAS Metro Chapter Barrett Peterson September 25, 2013
• Content Structure – Traditional Financial Data – Numerical– Sign/Debit or Credit– Text Descriptions
• Database Management Structures– Legacy Systems: Hierarchical and Network– Transaction Systems: Relational
• Relations [Tables]. Attribute [columns], Instance [Rows]
• Rules: no duplicate rows; single value for attributes– Warehouse Systems: Dimensional
• Facts [data items, usually a dollar amount or unit count]
• Measures – dollar or count for facts• Dimensions – groups of hierarchies and descriptors of
various aspects or context for the facts/measures
– Big Data Databases Unstructured
• Microsoft Office and Similar File Formats • Photography and Art
BUSINESS INTELLIGENCE ELEMENTS
Data StructuresITLingo
21ICPAS Metro Chapter Barrett Peterson September 25, 2013
RELATIONALTABLEILLUSTRATION
“Tuple” is borrowed from mathematics and set theory and is used in database design to refer to the attributes of an “item” or “value” [row], the subject or title of the table. Value examples include customers, vendors, orders, product SKUs
Business Intelligence Elements
22ICPAS Metro Chapter Barrett Peterson September 25, 2013
BUSINESS INTELLIGENCE ELEMENTS
MATHCAN BECOMPLICATED
23ICPAS Metro Chapter Barrett Peterson September 25, 2013
• Numbers and words/letters– Relational/Dimensional– Spreadsheets– Word Processing documents
• Sound and Music• Photo• Video• Video Game• CAD Design• Graphical
– PDF– Raster, Vector Graphics– Statistical Visualization
• Scientific• Signal• XML [Web based mark-up formats]• Geo-Location• Web Logs
BUSINESS INTELLIGENCE ELEMENTS
DATAFILETYPECATEGORIES,ALMOST ENGLISH
24ICPAS Metro Chapter Barrett Peterson September 25, 2013
• Collection– Company transaction/ERP systems– Purchased, such as Nielsen, IRI– Vendor supplied, such as bank
transactions– Sensor readings– Cameras– Mobile device traffic – Phones, Tablets
• Filtering– Adding context such as date or location– Eliminating “chatter” from high volume
data– Error correction
• Aggregation & Integration
BUSINESS INTELLIGENCE ELEMENTS
DATACOLLECTIONAND PREPARATION
25ICPAS Metro Chapter Barrett Peterson September 25, 2013
DATA COLLECTION - RFID
RFID tag RFID tag reader
26ICPAS Metro Chapter Barrett Peterson September 25, 2013
DATA COLLECTION
Various sensors Surveillance Camera
27ICPAS Metro Chapter Barrett Peterson September 25, 2013
DATA FILTERING AND CLEANSING IS IMPORTANT
28ICPAS Metro Chapter Barrett Peterson September 25, 2013
• Relational – SQL • Dimensional – SQL, OLAP• Binary Large Object [BLOB] – binary data, most
often photos, video, audio, or PDF files• Massively Parallel-Processing [MPP]• Apache Hadoopp Distributed File System [HDFS] –
Java – Google File System [GFS], used solely by Google– Google Map Reduce
• Amazon S3 filesystem [used by Amazon]• NoSQL, MySQL• Storm• Resource Description Framework [RDF] Databases,
like Big Data
BUSINESS INTELLIGENCE ELEMENTS
DATABASEFILE SYSTEMS
29ICPAS Metro Chapter Barrett Peterson September 25, 2013
BUSINESS INTELLIGENCE ELEMENTS
SELECT BIG DATADATABASEMANAGEMENTSYSTEMS
• Significant Originators– Google MapReduce– Google File System [GFS]– Amazon S3 filesystem
• Continuing Developments– Apache Software Foundation
• Apache Cassandra distributed database management system
• Apache Hadoop software framework to support data-intensive distributed applications
• Apache Hive, a data warehouse structure built on Hadoop
• Pig - high level programming language for creating MapReduce programs with Hadoop
– Significant to Technology Development• Facebook [uses MySQL as a DBMS system,
with Memcache]• Yahoo• LinkedIn [Project Voldemort]
30ICPAS Metro Chapter Barrett Peterson September 25, 2013
• Convergence aspect of mainframes and servers
• Massively parallel , multiple server, distributed processing, in multiple data centers – grid computing
• Multi-core , high capacity, lower power consumption, CPUs
• Memory servers for RAM employing DRAM comprised of Fully Buffered Direct Inline Memory Modules [FBDIMM]
• Solid state flash drive storage• Greatly improved., and less
costly, hard drive storage
BUSINESS INTELLIGENCE ELEMENTS
COMPUTERHARDWARECONSIDERATIONS
31ICPAS Metro Chapter Barrett Peterson September 25, 2013
BI CONFIGURATION SIZES
Small – BI, but not Big
Data capable MediumLarge – IBM Sequoia At
Livermore Labs
32ICPAS Metro Chapter Barrett Peterson September 25, 2013
• Data Storage Terminology– Memory – CPU direct connected, often called
RAM– Storage – not directly connected to the CPU
• Data Storage Device Types– Memory
• DRAM – based• Flash memory – based Solid-State Drives
[SSDs]– Storage
• Hard Disk Drives [HDD]• Optical Drives – CDs, DVDs
• Data Storage Systems– Direct Attached– Network Attached Storage [NAS]– Storage Area Network [SAN]– pNFS – Parallel Network file systems
BUSINESS INTELLIGENCE ELEMENTS
DATASTORAGEHARDWARE/SOFTWARE
33ICPAS Metro Chapter Barrett Peterson September 25, 2013
• Traditional Reporting Systems– ERP systems, including extract and
presentation tools– Downloads to Excel and similar programs for
analysis using functions and pivot tables• Presentation Tools• Specialized Analytics
– IBM InfoSphere BigInsights and InfoSphere Streams
– IBM Netezza– ParAccel Analytic Database– EMC Greenplum– SAS High Performance Computing– Information Builders WebFocus
• Exploratory Tools, like IBM SPSS [originally Statistical Package for the Social Sciences]– Data mining with specialized algorithms– Statistical analysis and related charting
software
BUSINESS INTELLIGENCE ELEMENTS
BIAPPLICATIONSOFTWARE
34ICPAS Metro Chapter Barrett Peterson September 25, 2013
• BI Reporting• Predictive Analytics• Data Exploration - correlation• Data Visualization - graphical• Instrumentation Analytics• Content Analytics• Web Analytics• Functional Applications• Industry Applications• Location Tracking
BUSINESS INTELLIGENCE ELEMENTS
ADVANCEDANALYTICSAPPLICATIONTYPES
35ICPAS Metro Chapter Barrett Peterson September 25, 2013
BUSINESS INTELLIGENCE ELEMENTS
USESTATISTICALTECHNIQUESAPPROPRIATELY
36ICPAS Metro Chapter Barrett Peterson September 25, 2013
ALGORITHMS CAN BE TREACHEROUS
DATAMODELSHAVE LIMITS
37ICPAS Metro Chapter Barrett Peterson September 25, 2013
BI AND ADVANCE ANALYTICS OUTPUT ILLUSTRATIONS
38ICPAS Metro Chapter Barrett Peterson September 25, 2013
EXAMPLES OF USES
39ICPAS Metro Chapter Barrett Peterson September 25, 2013
• Sales and Operations Planning• Financial Instruments Modeling• Production Control• Online Retail• Economics and Policy Development• Agriculture/Farming• Weather Analysis/Prediction• Environmental Impact Assessment• Healthcare Diagnosis and Records Management• Genomic Analytics and Pharmaceutical and Medical
Research• Natural Resource Exploration• Research Physics• Road, Rail Traffic Management• Security Surveillance: Business, Government• Astronomy• Logistics Management, Including GPS Tracking• Electrical and Telecommunications Grids Mgmt• Social Media – Facebook, LinkedIn, Google+, Twitter,
YouTube, Pinterest• TV shows – Star Trek, Person of Interest• Cloud Services – computing, Storage• Credit Scoring
SELECTED EXAMPLES OF USES
40ICPAS Metro Chapter Barrett Peterson September 25, 2013
• Retail– Amazon– Dell– Delta Sonic Car Washes
• Data Services– IBM– Google– Amazon
• Financial Services• Manufacturing
– McCain Foods – Frozen foods– Boeing
• Transportation and Logistics– Logistics – UPS, FedEx– Rail – UP, CSX, TTX– Air – United, AMR, Southwest
• Social Media– LinkedIn– Facebook
• Government– NSA PRISM and Other tools– CIA – Palantir Software
• Medicine and Health– Center for Disease Control (CDC)– J. Craig Venter Institute
• Science– Livermore Labs
SELECTEDUSERS
41ICPAS Metro Chapter Barrett Peterson September 25, 2013
• Technical Elements– Direct on-line access– Amazon specialized “Big Data”
database – Distributed and extremely large
data centers– Highly automated, high technology
warehouses– High supplier [vendors] integration
• User Benefits– Favorable prices– Suggested associated purchases– Individual interest advertising
SELECTED EXAMPLES OF USE
AMAZON
42ICPAS Metro Chapter Barrett Peterson September 25, 2013
• Technical Elements– Web driven order entry and
custom purchase configuration– Tracking of sales correspondence
with promotional offers– Supplier re-order integration
• User Benefits– Ability to customize purchase– Reasonable cost– Prompt delivery
SELECTED EXAMPLES OF USE
DELL
43ICPAS Metro Chapter Barrett Peterson September 25, 2013
• Technical components– Shared component and assembly
designs– More detailed quality
specifications and product tolerances
– Control of assembly schedule– “Real time” exchange of technical
information– Dissemination of best practices
• Customer benefits– Faster deliveries– Increased product quality– Reduced defects
SELECTED EXAMPLES OF USE
BOEING
44ICPAS Metro Chapter Barrett Peterson September 25, 2013
• Techniques employed– Collect cellphone and GPS signals,
traffic cameras, and roadside sensors– Identify accidents, traffic jams, and
road damage– Emergency vehicles can be dispatched– Update traffic websites– Sends messages to drivers’ GPS
devices and cellphones– Uses supercomputers running Intrix
application• Benefits
– Eliminates traffic congestion faster– More timely relief for accident victims– Facilitate road paving scheduling
SELECTED EXAMPLES OF USE
NEW JERSEYDEPARTMENTOFTRANSPORTATION
45ICPAS Metro Chapter Barrett Peterson September 25, 2013
• Technical Elements– General LinkedIn Structure
• Personal Profile• Individual Connections• Groups• Company and Other Searches• Endorsements• Attached application partners
– Slideshare, Owned by LinkedIn• User Benefits
– Networking with professional contacts– Personal branding capabilities– Business Development– Job Search enhancement
SELECTED EXAMPLES OF USE
46ICPAS Metro Chapter Barrett Peterson September 25, 2013
LINKEDIN PROFILE PAGE SAMPLE
47ICPAS Metro Chapter Barrett Peterson September 25, 2013
Facebook Page Sample
48ICPAS Metro Chapter Barrett Peterson September 25, 2013
TRENDS• More, bigger, faster – big data gets
bigger• Cloud services continue to expand• Mobile computing expands• Hadoop becomes more common• Interactive data visualization will expand• Social media type platforms will
increase their prominence• Analytics skills demands will increase• Privacy Issues will become prominent
49ICPAS Metro Chapter Barrett Peterson September 25, 2013
RESOURCES• Books
• Competing on Analytics, Davenport & Harris• Analytics at Work, Davenport, Harris, & Morison• The Data Asset, Fisher• Data Strategy, Adelman, Moss, Abai• Big Data, Cukier, Mayer-Schonberger
• Websites• The Data Warehouse Institute – tdwi.org• IBM data analytics: www.ibm.com, smarter planet
50ICPAS Metro Chapter Barrett Peterson September 25, 2013
SUMMARYWHY USE BI AND ADVANCED ANALYTICS
INSIGHTFROMDATA