View
226
Download
4
Category
Preview:
Citation preview
Zhangxi LinTexas Tech University
ISQS 6339, Data Mgmt & BI1
ISQS 6339, Data Management & Business Intelligence
Introduction
What is Business Intelligence
ISQS 6339, Data Mgmt & BI4
A Simple Definition: The applications and technologies transforming Business Data into Action
Business intelligence (BI) is a business management term refers to applications and technologies which are
used to gather, provide access to, and analyze data and information about their company operations.
Business intelligence systems can help companies gain more comprehensive knowledge of the factors affecting their business, and help companies to make better business decisions.
YouTube: What is BI? 2’Microsoft Business Intelligence Surface Demo 6’ 34”
Data, information, and knowledge
ISQS 6339, Data Mgmt & BI5
Data – a collection of raw value elements or facts used for calculating, reasoning, or measuring.
Information – the result of collecting and organizing data in a way that establishes relationship between data items, which thereby provides context and meaning
Knowledge – the concept of understanding information based on recognized patterns in a way that provides insight to information.
Online Video What is business intelligence? 10’ 36”Retail and Big Data Revolution, 2’12”Big data, 7’ 12”Big data terms, 31’ 19”
Driving force - Big DataA collection of data sets so large and complex
that it becomes awkward to work with using on-hand database management tools.
Difficulties include capture, storage, search, sharing, analysis, and visualization.
The trend to larger data sets is due to the additional information derivable from analysis of a single large set of related data, as compared to separate smaller sets with the same total amount of data.
8/14/20127 Copyright 2012
Zettabyte (ZB)A quantity of information or information
storage capacity equal to 1021 bytes or 1,000 exabytes.
As of April 2012, no storage system has achieved one zettabyte of information. The combined space of all computer hard drives in
the world was estimated at approximately 160 exabytes in 2006.
Seagate reported selling 330 exabytes worth of hard drives during the 2011 Fiscal Year.
As of 2009, the entire World Wide Web was estimated to contain close to 500 exabytes. This is a half zettabyte.
1,000,000,000,000,000,000,000 bytes = 10007 bytes = 1021 bytes
9
Market"Big data" has increased the demand of information
management specialists - major companies have spent more than $15 billion for this.
This industry is worth more than $100 billion and growing at almost 10% a year.
4.6 billion mobile-phone subscriptions worldwide and between 1 billion and 2 billion people accessing the internet.The world's effective capacity to exchange information
through telecommunication networks was 281 petabytes in 1986, 471 petabytes in 1993, 2.2 exabytes in 2000, 65 exabytes in 2007
It is predicted that the amount of traffic flowing over the internet will reach 667 exabytes annually by 2013.
8/14/201211 Copyright 2012
Approach - Cloud Computing Cloud computing is the
use of computing resources (hardware and software) that are delivered as a service over a network (typically the Internet). The name comes from the use of a cloud-shaped symbol as an abstraction for the complex infrastructure it contains in system diagrams. Cloud computing entrusts remote services with a user's data, software and computation.
Buzzword: SaaS/IaaS/PaaS
ISQS 6339, Data Mgmt & BI12
Distributed business intelligence
Deal with big data – the open & distributed approachLAMPHadoopMapReduceHDFSNOSQLZookeeperStorm
ISQS7339, Fall 201213
Apache Hadoop An open-source software framework for storage and
large scale processing of data-sets on clusters of commodity hardware.
The Apache Hadoop framework is composed of the following modules :Hadoop Common - contains libraries and utilities needed by other
Hadoop modulesHadoop Distributed File System (HDFS).Hadoop YARN - a resource-management platform responsible for
managing compute resources in clusters and using them for scheduling of users' applications.
Hadoop MapReduce - a programming model for large scale data processing.
Apache Hadoop's MapReduce and HDFS components originally derived respectively from Google's MapReduce and Google File System (GFS) papers.
ISQS 6339, Data Mgmt & BI14
Hadoop 2: Big data's big leap forward The new Hadoop is the Apache Foundation's attempt
to create a whole new general framework for the way big data can be stored, mined, and processed.
The biggest constraint on scale has been Hadoop’s job handling. All jobs in Hadoop are run as batch processes through a single daemon called JobTracker, which creates a scalability and processing-speed bottleneck.
Hadoop 2 uses an entirely new job-processing framework built using two daemons: ResourceManager, which governs all jobs in the system, and NodeManager, which runs on each Hadoop node and keeps the ResourceManager informed about what's happening on that node.
ISQS 6339, Data Mgmt & BI23
The process of BI
ISQS 6339, Data Mgmt & BI25
Data -> information -> knowledge -> actionable plans
Data -> information: the process of determining what data is to be collected and managed and in what context
Information -> knowledge: The process involving the analytical components, such as data warehousing, online analytical processing, data quality, data profiling, business rule analysis, and data mining
Knowledge -> actionable plans: The most important aspect in a BI process
Actionable Knowledge
ISQS 6339, Data Mgmt & BI26
An information asset retains its value on if the converted knowledge is actionable.Need some methods for extracting value from
knowledgeThis is not a technical issue but an organizational
one – need empowered individuals in the organization to take the action
There is an issue of Return on Investment (ROI)
BI Problems
ISQS 6339, Data Mgmt & BI27
StructuredDetecting Credit card fraudSetting Loan parametersMarket segmentation/Mass customizationDeciding Marketing mixCustomer ChurnReducing employee turnover Improving Quality/Efficiency …
UnstructuredData explorationUtilization of resources (stored knowledge) to maximum
effectiveness…
BI Applications
ISQS 6339, Data Mgmt & BI28
Customer AnalyticsCustomer profilingTargeted marketingPersonalizationCollaborative filteringCustomer satisfactionCustomer lifetime valueCustomer loyalty
Sales Channel AnalyticsMarketingSales performance and pipeline
BI Applications (2)
ISQS 6339, Data Mgmt & BI29
Supply Chain AnalyticsSupplier and vendor managementShippingInventory controlDistribution analysis
Behavior AnalysisPurchasing trendsWeb activityFraud and abuse detectionCustomer attritionSocial network analysis
The Evolution of Business Intelligence
ISQS 6339, Data Mgmt & BI30
1st Generation – Traditional analytics (query and reporting)
2nd Generation – Traditional generation (OLAP, data warehousing)
2.5nd Generation – New traditional generation3rd Generation - Advanced analytics
Rules, predictive analytics and realtime data miningStream analytics
ISQS 6339, Data Mgmt & BI31
Business Intelligence Classifications
Traditional Analytics1st Generation Analytics (Query & Reporting)
2nd Generation Analytics (OLAP, Data Warehousing)
Advanced Analytics/OptimizationRules
Predictive AnalyticsReal-time and traditional Data Mining
Stream Analytics*Real-time, continuous, sequential analysis(ranging from basic to advanced analytics)
* In lieu of stream analytics, “embedded analytics,” although architecturally different, could potentially play the same role
3rd-Generation BI
Legacy BI
“New Traditional” Analytics“2.5-Gen” Analytics (In-Memory OLAP, Search-Based)
Source: Bill O’ ConnellIBM, Aug 2007
ISQS 6339, Data Mgmt & BI32
Business Intelligence Use Cases
Traditional Analytics1st Generation Analytics (Query & Reporting)
2nd Generation Analytics (OLAP, Data Warehousing)
Advanced Analytics/OptimizationRules
Predictive AnalyticsReal-time and traditional Data Mining
Stream Analytics*Real-time, continuous, sequential analysis(ranging from basic to advanced analytics)
* In lieu of stream analytics, “embedded analytics,” although architecturally different, could potentially play the same role
“New Traditional” Analytics“2.5-Gen” Analytics (In-Memory OLAP, Search-Based)
Example Target Solutions: Fraud Detection / Risk CRM Analytic Supply Chain Optimization RFID / Spatial Data Other High-VolumeFocus on what is
happening RIGHT NOW
Real-Time Threshold
Focus on what will happen
Analytic applications that apply statistical relationships in the form of RULES
Focus on what did happen
Turning data into information is limited by the relationships which the end-user already knows to look for.
Data mining to determine why something happened by unearthing relationships that the end-user may not have known existed.
Source: Bill O’ ConnellIBM, Aug 2007
Recommended