18
The Inductive Software Engineering Manifesto Principles for Industrial Data Mining Paper Authored By: Menzies & Kocaganeli – Lane Dept of CS/EE, Bird, Zimmerman, & Schulte – Microsoft Rese Presentation By: Ebeid Soliman & Mason Schoolfiel

The Inductive Software Engineering Manifesto Principles for Industrial Data Mining Paper Authored By: Menzies & Kocaganeli – Lane Dept of CS/EE, WVU Bird,

Embed Size (px)

Citation preview

Page 1: The Inductive Software Engineering Manifesto Principles for Industrial Data Mining Paper Authored By: Menzies & Kocaganeli – Lane Dept of CS/EE, WVU Bird,

The Inductive Software Engineering

ManifestoPrinciples for Industrial Data

Mining

Paper Authored By:Menzies & Kocaganeli – Lane Dept of CS/EE, WVUBird, Zimmerman, & Schulte – Microsoft Research

Presentation By: Ebeid Soliman & Mason Schoolfield

Page 2: The Inductive Software Engineering Manifesto Principles for Industrial Data Mining Paper Authored By: Menzies & Kocaganeli – Lane Dept of CS/EE, WVU Bird,

Motivation

• This paper is a reflection of the authors’ applied data mining work, discussions with researchers, and software engineering practitioners.

• Document methods and experience from industrial practitioners

• The principal question is : what characterizes the difference between academic and industrial data mining ?

• Motivation: Successful data-mining projects in industry

Page 3: The Inductive Software Engineering Manifesto Principles for Industrial Data Mining Paper Authored By: Menzies & Kocaganeli – Lane Dept of CS/EE, WVU Bird,

Inductive Software Engineering

• “A branch of software engineering that focuses on the delivery of data mining based software applications to users”

• Understand user goals to inductively generate the models that most matter to the user

• Industrial practitioners are focused on users, whereas academic data mining research is focused on algorithms

Page 4: The Inductive Software Engineering Manifesto Principles for Industrial Data Mining Paper Authored By: Menzies & Kocaganeli – Lane Dept of CS/EE, WVU Bird,

Industrial Data Mining7 Principles

• Users before algorithms

• Plan for scale

• Early feedback

• Be open-minded

• Do smart learning

• Live with the data you have

• Broad skill set, big toolkit

Page 5: The Inductive Software Engineering Manifesto Principles for Industrial Data Mining Paper Authored By: Menzies & Kocaganeli – Lane Dept of CS/EE, WVU Bird,

Users before algorithms

•Guiding Principle – Users Before Algorithms

•Mining algorithms are only good if users fund their use in real-world applications

Page 6: The Inductive Software Engineering Manifesto Principles for Industrial Data Mining Paper Authored By: Menzies & Kocaganeli – Lane Dept of CS/EE, WVU Bird,

Users before Algorithms

Hallmarks of good interaction meetings

• Users bring senior management to the meetings

• Users keep interrupting (you or each other) and debating your results

• Indicates the users understand your explanation of the results

• Your results are touching on issues that concern them

• User begin to offer more data sources for analysis

• Users invite you to their workspace to show how to do part of the analysis

Page 7: The Inductive Software Engineering Manifesto Principles for Industrial Data Mining Paper Authored By: Menzies & Kocaganeli – Lane Dept of CS/EE, WVU Bird,

Plan for scaleKnowledge Discovery in

Databases (KDD)• KDD – Knowledge Discovery In Databases

• The non-trivial process of identifying valid, novel, potentially useful, and ultimately understandable patterns in data

• Repetition RequiredSteps that compose the KDD process - Fayyad 1996

Page 8: The Inductive Software Engineering Manifesto Principles for Industrial Data Mining Paper Authored By: Menzies & Kocaganeli – Lane Dept of CS/EE, WVU Bird,

Plan for scale• Most data mining is data pre-processing

• Gaining access to databases in business groups is time consuming

• To ensure repeatability automate as many KDD steps as possible

• Data mining methods are repeated multiple times

• Answer user questions

• Enhance data mining method or Fix bugs

• Deploy to different user groups

Page 9: The Inductive Software Engineering Manifesto Principles for Industrial Data Mining Paper Authored By: Menzies & Kocaganeli – Lane Dept of CS/EE, WVU Bird,

Plan for scale

• Observed Phases

• Scout - rapid prototyping, apply many methods to data, explore range of hypotheses, gain user interest (get feedback)

• Survey - experiment to find stable models - focusing on user goals

• Build - integrate models into a deployment framework – suitable for target user base

• Team size doubles after scouting, doubles after surveying – time implications!

Page 10: The Inductive Software Engineering Manifesto Principles for Industrial Data Mining Paper Authored By: Menzies & Kocaganeli – Lane Dept of CS/EE, WVU Bird,

Early feedback

• Simplicity first: before conducting very elaborate studies, try applying very simple tools to gain rapid early feedback

• Get Feedback Early and Often

• Discretize continuous attributes (determine what is ignorable)

Page 11: The Inductive Software Engineering Manifesto Principles for Industrial Data Mining Paper Authored By: Menzies & Kocaganeli – Lane Dept of CS/EE, WVU Bird,

Be open-minded

• Avoid a fixed hypothesis

• Avoid a fixed approach, particularly for data not been mined before

• Initial results are important and can change goals

Page 12: The Inductive Software Engineering Manifesto Principles for Industrial Data Mining Paper Authored By: Menzies & Kocaganeli – Lane Dept of CS/EE, WVU Bird,

Smart Learning

• Inductive agents, human or otherwise, make errors

• Don’t torture the data to meet preconceptions, but it can be ok to go “fishing”

• Important outcomes are riding on your conclusions - check & validate!

• Check the variance before concluding, it may be based on statistical noise

• Check conclusion stability against different sample sizes

• Check conclusion support to avoid conclusions based on a small percent of the data

Page 13: The Inductive Software Engineering Manifesto Principles for Industrial Data Mining Paper Authored By: Menzies & Kocaganeli – Lane Dept of CS/EE, WVU Bird,

Smart Learning

• Prevent spurious conclusions by carefully controlling data collection and focusing on a small space of hypotheses (IF YOU CAN)

• Rule learners – RIPPER and INDUCT check against randomly generated alternatives (if probabilities are the same you can delete the rule)

Page 14: The Inductive Software Engineering Manifesto Principles for Industrial Data Mining Paper Authored By: Menzies & Kocaganeli – Lane Dept of CS/EE, WVU Bird,

Live with the data you have

• Collecting data comes at a cost!

• Go mining with the data you have, not the data you hope to have at a later date

• Remove spurious data - conduct instance or feature selection studies

• 80 to 90% of rows and all but the square root of columns can be deleted before compromising performance of the learned model

• Be respectful but doubtful to all user-suggested domain hypotheses

Page 15: The Inductive Software Engineering Manifesto Principles for Industrial Data Mining Paper Authored By: Menzies & Kocaganeli – Lane Dept of CS/EE, WVU Bird,

Broad skill set, big toolkit

• Try multiple inductive technologies

• Inductive Engineers generate novel and insightful feedback for users

• Researchers can work to perfect a single algorithm

• Big ecology: Use tools supported by a large ecosystem of developers who are constantly building new modules (e.g. R, WEKA, MATLAB)

Page 16: The Inductive Software Engineering Manifesto Principles for Industrial Data Mining Paper Authored By: Menzies & Kocaganeli – Lane Dept of CS/EE, WVU Bird,

What does this mean for Industry?

• Implications for Project Management

• Scouting takes weeks, Surveying takes months, and Building takes years

• Implications for Training

• Communications skills

• Results briefing

• Scripting

Page 17: The Inductive Software Engineering Manifesto Principles for Industrial Data Mining Paper Authored By: Menzies & Kocaganeli – Lane Dept of CS/EE, WVU Bird,

Research to help Industry

• Research themes to benefit industrial data mining

• Analysis patterns for inductive engineers (like design patterns for developers)

• Design pattern for data miners

• Optimizations of learning algorithms

• Anomaly detectors

• Business-aware learners

Page 18: The Inductive Software Engineering Manifesto Principles for Industrial Data Mining Paper Authored By: Menzies & Kocaganeli – Lane Dept of CS/EE, WVU Bird,

Final Notes

• Conclusion – Be user-focused, keep these principles in mind

• Hopefully these generalities will be helpful

• Share your experiences and knowledge so that Industrial Inductive Engineering can mature