Upload
tommy96
View
964
Download
2
Tags:
Embed Size (px)
DESCRIPTION
Citation preview
1
Innovation in Data and Information Mining
MBA Technology ConferenceMarch 28, 2007
Linda C. Simmons, IBM Global Business Services
Innovations in Data and Information Mining
2
Innovation in Data and Information Mining
Unparalleled -- the largest private research institution in the world
Annual budget of almost $5B
Eight labs across the world on all continents
Over 3,000 researchers
5 Nobel Prize winners, 4 US National Medals of Technology, 3 National Medals of Science, 19 memberships in the National Academy of Sciences and more than 47 members of the National Academy of Engineering
Skills in mathematics, computer science, physics, operations research and many more
Over 30,000 US patents since 1993
Who is IBM Research?
3
Innovation in Data and Information Mining
• Developing effective tools and techniques for enabling a wide variety of Business Intelligence applications and solutions.
• Techniques for extracting actionable insights from structured (data) and unstructured (text) information.
• Enabling analytics for data and text within large-scale data and computing infrastructure environments.
• Work with clients to drive our research agenda for developing novel data mining solutions• To have data mining impact business and industry problem-solving in new and unique ways.
• Basic Research• Cost-Sensitive Learning, Active Learning, Reinforcement Learning,
Regularization Methods.• Systems Research: Developing highly scalable and fully automated predictive modeling capabilities
• Data-parallel architectures for leveraging database systems• Compute-parallel architectures for leveraging grid computing
• Solutions and Services• Customer Insights• Business Forecasting• Risk Management• Etc.
Data Mining Research Goals
Current Activities
4
Innovation in Data and Information Mining
Customer Interaction
Software Development
Theory Advancement
Database ManagementKnowledge Discovery and Data MiningKnowledge ManagementNatural Language ProcessingInformation Retrieval
Retail ManufacturingBanks and InsuranceTravel and TransportGovernmentIBM
Service and maintenance manufacturing, procurement, distributionproduct design, forecasting, pricing, and fulfillment
Parallel DatabasesOLAP AnalyticsParallel Data MiningUnstructured Information ManagementText Analytics and Mining
Academic CommunityProfessional Societies
IndustriesGBS - ODIS
IBM Businesses
Software GroupOpen Source
Multi-faceted Approach to our Data Mining Research Agenda
5
Innovation in Data and Information Mining
Supply chain solutions – Optimize, plan, model and analyze supply chain and transportation processes.
Advanced call center automation – Design and help deploy natural language voice-recognition and voice-mining solutions
Advanced networking services – Apply cutting-edge models, algorithms, software and expertise to help design, monitor and optimize enterprise networks and networked applications, e.g. storage area networks and IP telephony.
Business optimization and analytics – Optimize, plan, model, analyze and transform businesses to on demand models.
Collaboration – Realize the value of collaboration through a skilled assessment of the current environment for collaboration, methodologies that document end-user requirements for collaboration, strategic design for visualizing future collaborative states, and tools that support human communication.
Security and privacy – Assess, design and implement enhanced security processes and tools.
Emerging Innovation from Research
6
Innovation in Data and Information Mining
Emerging Innovation in Research (2)
e-business systems and architecture – Design and help deploy applications, middleware and Web content.
Grid and autonomic solutions – Apply cutting-edge models, software, designs and expertise to help quickly and accurately evaluate, design, pilot and optimize grid and autonomic capability in client distributed-computing systems.
Information mining and management – Gain business insight from structured and unstructured data, text, voice, video and more.
Mobile enablement – Apply new wireless and pervasive technology to improve security,reliability and integration.
Product lifecycle management – Improve product development processes through better tools, methodologies and collaboration.
Technology-based learning – Deploy prototype learning technology that can help improve learning effectiveness, increase accountability and boost productivity.
7
Innovation in Data and Information Mining
Client Big City Coach, a high-end car service company has a few hundred cars and drivers (more drivers than cars), which may service 1000 rides/day in several big cities nationwide.
Challenge This ground transportation leader wanted to increase vehicle and driver utilization, push customer service to new levels, and lower operating costs. The mathematical optimization concept came from discussions with our Research Math team.
Solution Developed a Fleet Optimization System (FOS) which gathers off-line and real-time information from a variety of internal and external sources and produces a near-optimal staffing plan. FOS made possible real-time adjustment of schedules and resource allocation.
Benefits - Increased vehicle utilization thru better visibility of scheduling info
- Increase efficiency – less downtime for drivers, more effective use of partner resources
- Improved customer service and satisfaction due to real time reallocation of cars/drivers
- Better resource management, esp. during peak traffic times, bad weather, & delays
Continual Optimization
8
Innovation in Data and Information Mining
Text Analytics for a Financial CommunicatorClient A leading financial communications powerhouse which prides itself on
providing an unequaled mix of electronic trading, data, analytics, calculation engines, and straight-through processing.
Challenge The company was interested in validating its hypotheses around text analytics, which enable computers to read documents and derive value from the output. The intent was to use text analytics to automate the data collection and analysis process.
Solution Strategy and Change Consulting, powered by computational linguists, performed in-depth analyses around the new technologies and solutions that the firm had been evaluating.
Benefits The firm now has validated and enhanced new product plays that it can leverage; in addition, it is realizing staff efficiencies that enables it to do more with the same number of people. Other benefits include data quality and time-to-market improvements. Overall, it can now better compete in the marketplace.
9
Innovation in Data and Information Mining
Client Famous Group – A subsidiary of A Big Finance Company
Challenge Automatic discovery of all credible and actionable risk groups in auto insurance policyholders to improve premium pricing, underwriting rules, and new business development.
Solution A data warehouse was put together that stored four years of 300 historical actors on 2 million policyholders, claims, and insured assets (autos). A new predictive modeling technology was developed that was optimized for discovering homogenous risk groups from this data. The generated models were represented as if-then rules.
Benefits Of all the rules that were generated, 43 were statistically significant and not known before. Marketing benefits analysis of 6 of these 43 discoveries suggested a $2 Million profit enhancement over a 2 million policyholder base.
Underwriting Profitability Analysis
10
Innovation in Data and Information Mining
Client A Big UK Grocery
Challenge Cross-Sell / Up-Sell services to consumers with handheld PDAs for anytime / anywhere shopping
Solution A solution was developed in which recommendations are generated by matching products to customers based on the expected appeal of the product and the previous spending of the customer. A combination of associations mining in the product domain and clustering in the customer domain is used for developing customer-specific recommendations.
Benefits In a pilot program with several hundred customers, a 1.8% boost in revenue was observed as a result of purchases made directly from the list of recommended products.
Customer Insight : Personalization of Product Recommendations
11
Innovation in Data and Information Mining
Client A Fifth Avenue Retailer
Challenge Optimize cross-channel customer messaging to maximize customer lifetime value
Solution A reinforcement learning based methodology was developed to model enterprise-customer. The developed methodology discovers customer responses on one channel as a result of a contact on another channel. The technology is highly scalable so it could address the large volumes of data that are typically available in a cross-channel scenario.
Benefits The system was benchmarked against the retailer’s current methodology for customer relationship management in the direct mail and store channels. Initial results suggest a 7-8% increase in store revenues.
Customer Insight: Lifetime Value Management
12
Innovation in Data and Information Mining
Passenger-Based Airline No-show Prediction Passenger-Based Airline No-show PredictionClient Air Elsewhere
Challenge Using detailed information on each passenger, predict the number of passengers who will not show for a flight. Accurate no-show forecasts are an essential input to airline revenue-management systems.
Solution Two different predictive models were built using passenger-based features extracted from over 1M passenger records. The first model used a segmented Naïve Bayes approach (ProbE) to estimate each passenger’s probability of not showing. The second model predicted the no-show fraction directly using a novel aggregationmethod for an ensemble of probabilistic models.
Benefits Various evaluation metrics demonstrated that the passenger-based models are more accurate than conventional history-based statistical models. A simple revenue model suggested that use of these models could produce between 0.4% and 3.2% revenue gain over the conventional model.
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Fraction of booked PNRs (sorted by no-show probability)
Frac
tion
of P
NR
no-
show
s
Passenger-Level [ProbE] Passenger-Level [APMR] Passenger-Level [C4.5] Historical Model [Statistical]Random
13
Innovation in Data and Information Mining
Call Center Text TAKMI Analysis
Analysts
Customers
The BusinessInformation about customer’s experiences with products or services
Better Products and Services; Increased Customer Sat.
The BusinessWhat customers are asking about; what they need to know
Better self-service; lower costs
Actionable Information
Notes taken by CSR’s
Call Center Text Mining
Vast amounts of textual dataInternal ReportsPatentsCustomers’ messagesetc.
Knowledge AcquisitionHidden regularities/factsTrends in contentsFeatures of specific topicsRelationship with other knowledge
IBM Research leads in Speech Recognition, Natural Language Understanding, Dialog Management, Language Generation and Speech Synthesis. Our approach combines the use of advanced statistical and machine learning
techniques with sophisticated grammars, digital dictionaries and encyclopedias
14
Innovation in Data and Information Mining
Interior node
Leaf node
Rec < 6m
Spend < $150
Rec < 3m
#delinq < 2
#kids < 2
Ret < $20
Tree structure is obtained by a recursive procedure using the best univariate splits at each stage
The leaf nodes define a non-overlapping, exhaustive partition of the input space
Final model is a collection of segments with their associated segment model in each leaf node
Splitting condition is based on minimizing the negative log-likelihood using search algorithms
Final tree is determined by a stopping condition based on test set or cross-validation error
Out-of-memory row-scan based procedure
Data-partitioned parallelism
We have new approaches to Segmentation-based predictive modeling
15
Innovation in Data and Information Mining
Automated methods for embedding in solutionsIntegrating structured and unstructured dataAbsorbing new ideas from learning theory and computational statistics for addressing typical issues with business data
Missing values, Data sparseness, High DimensionalitySupport Vector Machines, Predictive Rule Induction, Regularization Techniques
Streaming data miningOnline and incremental mining of streaming data
Outlier DetectionDetecting anomalies and abnormalities in data
Mine historical data to train patterns/models that can predict future behavior
BehaviorsResponse to Direct MailProduct Quality (Defects)Declining ActivityCredit RiskDelinquencyLikelihood to buy specific productsProfitabilityetc.
Score with models to reflect likelihood to exhibit the modeled behaviorAct to optimize business objectives based on these scores.
Traditional Predictive Mining Process
Current Predictive Mining Research
16
Innovation in Data and Information Mining
Security and Privacy Initiatives
• Secure Hardware Embedded Analytics– Leveraging cryptographic secure processing
technology
• Sovereign Information Integration– Need-to-know information sharing
• Privacy Preserving Data Mining– Assumes no trusted third party.
Security and Privacy Initiatives: Financial Services
17
Innovation in Data and Information Mining
• Secure processor →Ultimate data security.
• Memory-light data mining →Sophisticated analytics can
run inside processor.
• Memory-light DB2 → Secure data federation and
query processing capabilities across multiple data sources.
Encrypteddata transfer
Data are only decrypted
inside processor
….Enterprise 1
Database
Secure processor
Enable data analysis
inside secure processor
Enterprise N
Database
Memory-light data mining
Memory-light DB2
Secure Federated MiningArchitecture
18
Innovation in Data and Information Mining
Intra-bank Service Center ScenariosAnti-Money Laundering
Credit Risk RatingCRM
….
Intra-Bank Data Centralizer
Encrypteddata
Encrypteddata
SecureFederated
Mining
LOB NLOB 1
• Guarantees confidentiality1. Analyzing data from different
LOBs together to know customers.
2. Legislations limiting data sharing among LOBs.
• Guarantees that data will only be used for specialized purposes.
– Customers are more likely to allow banks to share their data among LOBs with this condition.
• Data federation allows multiple LOBs to share data without having central data warehouse.
19
Innovation in Data and Information Mining
EPAL –Enterprise Privacy Architecture Language
Implementing Privacy Management Using EPA
The Enterprise Privacy Authorization Language (EPAL) is a formal language to specify fine-grained enterprise privacy policies. It concentrates on the core privacy authorization while abstracting from all deployment details such as data model or user-authentication.
AuditManager
Log Data
Privacy Management
ServerE – P3P Policy Consent
Obligations Queue
Privacy Management Submission Monitor Legacy
Applications
Privacy Management Enforcement MonitorsWeb Data
Legacy Data
CPOPrivacy
ManagementConsole
CustomerEnterprise Employee
http://www.zurich.ibm.com/security/enterprise-privacy/epal
► EPAL specs published (07/2003)► Java ref implementation of EPAL & XACML
■ On alphaWorks: http://www.alphaworks.ibm.com/tech/dpm
► P3P ↔ EPAL mapping► WS Privacy specs and bindings: ongoing
EPAL –Enterprise Privacy Architecture Language
20
Innovation in Data and Information Mining
PrivacyPolicy
DataCollection
Queries
PrivacyMetadataCreator
Store
PrivacyConstraintValidator
DataAccuracyAnalyzer
AuditInfo
AuditInfo
AuditTrail
QueryIntrusionDetector
AttributeAccessControl
PrivacyMetadata
Other
DataRetentionManager
RecordAccessControl
EncryptionSupport
DataCollectionAnalyzer
# Name Age Phone1 Adams 10 111-11113 - - 333-33334 Daniels 40 -
05 0
1 0 01 5 02 0 02 5 03 0 0
0 .0 1 0 . 1 0 .2 0 . 5 1
A p p l i c a tio n S e le c ti v i ty
Que
ry E
xecu
tion
Tim
e
(sec
onds
)
O rig in a l Q u e rie sR e w ri t t e n Q u e rie s
Table Size: 10 million, no index
• Vision: Database systems that take responsibility for the privacy and ownership of data they manage, while not impeding the flow of information.
• Architectural principles derived from principles behind current legislations.
Hippocratic Database
21
Innovation in Data and Information Mining
Thank you!
Contact Information: Linda C SimmonsIBM Global Business [email protected] 904.491.0410Mobile 904.610.3723