20
Bionic Info Pro: New Takes on an Old Theme Machine Learning, Taxonomy Creation, Big Data, Competitive Intelligence, and the Human Element Elaine M. Lasda Bergman Annual Conference Special Libraries Association Vancouver, BC, Canada Monday, June 9, 2014

Bionic Info Pro - Taxonomies and Machine Learning SLA 2014

Embed Size (px)

DESCRIPTION

Presentation for Special Libraries Association on machine assisted taxonomy creation and the human element.

Citation preview

Page 1: Bionic Info Pro - Taxonomies and Machine Learning SLA 2014

Bionic Info Pro:New Takes on an Old Theme

Machine Learning, Taxonomy Creation, Big Data, Competitive Intelligence, and the Human Element

Elaine M. Lasda BergmanAnnual Conference

Special Libraries Association Vancouver, BC, CanadaMonday, June 9, 2014

Page 2: Bionic Info Pro - Taxonomies and Machine Learning SLA 2014

Overview• A little bit about Machine Learning

• A little bit about Taxonomies

• A little bit about Big Data

• A little bit about Hybrid Techniques

Page 3: Bionic Info Pro - Taxonomies and Machine Learning SLA 2014

NOT NEW: Machine Learning for CI

Mena, Jesus. (1996). Data Mining for Competitive Intelligence, Competitive Intelligence Review, 7(4):18-25.

Page 4: Bionic Info Pro - Taxonomies and Machine Learning SLA 2014

Refinement of Machine Learning

• Decision Trees/Classification

• Clustering

• Anomaly Detection

Page 5: Bionic Info Pro - Taxonomies and Machine Learning SLA 2014

Refinement of Machine Learning

• Support Vector Machines- – Predictive Classification

• Association Rules– Marketbasket analysis

• Natural Language Processing– Sentiment Analysis

Page 6: Bionic Info Pro - Taxonomies and Machine Learning SLA 2014

Getting up to Speed• http://efytimes.com• 6 Video Tutorials and Playlists on

Machine Learning (January 2014)

Page 7: Bionic Info Pro - Taxonomies and Machine Learning SLA 2014

NOT NEW: Taxonomies in Information Retrieval

http://comsaad.blogspot.com/p/old-computer-photos.html

http://commons.wikimedia.org/wiki/File:A_Library_Primer_illustration_Joined_Hand.jpg

Page 8: Bionic Info Pro - Taxonomies and Machine Learning SLA 2014

Need for Taxonomic Structures

http://farm9.staticflickr.com/8262/8673326413_4492b5dc68_o.jpg

Page 9: Bionic Info Pro - Taxonomies and Machine Learning SLA 2014

NOT NEW: Datasets

http://www.conceptdraw.com/solution-park/resource/images/solutions/entity-relationship-diagram-(erd)/Diagramming-Crow's-Foot-ERD-Sample60.png

Page 10: Bionic Info Pro - Taxonomies and Machine Learning SLA 2014

Enter BIG DATA

http://commons.wikimedia.org/wiki/File:DARPA_Big_Data.jpg

Page 11: Bionic Info Pro - Taxonomies and Machine Learning SLA 2014

BigData Sources and Analysis DataType Qualities Analysis Tools Result

Social Media Demographics API integration More profiles of like-minded users

“Social Influencers” User Reviews NLP, Text Analysis Sentiment readings

“Internet of Things” Logs/Sensors/Check-Ins Parsing Usage and behavior patterns

SaaS Cloud/Web-based/Subscription software

Dist. data integration/in-memory caching technology/API integration

Usage behavior patterns, customer data, etc.

Public Data e.g., Amazon Data Market, WorldBank, Wikipedia

All above (depends on data structure) Depends on Dataset (and there are LOTS of them!)

Hadoop/MapReduce Volume! Parallel Processing/Parsing/Reduction Big patterns, correlations, needles in haystacks

Data Warehouses Internal transactional data Likely same as above Correlations, marketbasket, etc.

NoSQL/Columnar Volume! Fills gaps in Parallel processing tools Real time activity and patterns

In-Stream Monitoring Network traffic (streaming videos, system outages)

Packet evaluation, distributed query processing Network/Stream usage patterns

Legacy Data Usually PDFs & Documents/SemiStructured

Transformation tools(eg, Xenos d2e) + above Depends on content (could be all)

http://www.zdnet.com/top-10-categories-for-big-data-sources-and-mining-technologies-7000000926/

Page 12: Bionic Info Pro - Taxonomies and Machine Learning SLA 2014

Why “Concept Hierarchies” in an Unstructured Environment?

Page 13: Bionic Info Pro - Taxonomies and Machine Learning SLA 2014

Advantages• When term is too low to appear in

frequent item/rulesets• Create more interesting rules using

more general, aggregated concepts[DVD, wheat bread, home electronics, electronitcs, food]

Kumar, T.S. (2005) Introduction to Data Science

Page 14: Bionic Info Pro - Taxonomies and Machine Learning SLA 2014

Disadvantages• How low and how high in the hierarchy

do you set the threshold? • Increased computation time• If threshold is to high, redundant rules

for more specific terms can be summarized by rules using more general terms

Page 15: Bionic Info Pro - Taxonomies and Machine Learning SLA 2014

Hybrid Taxonomic Development

• Understand your auto-classification model

• Work with domain experts to create basic taxonomy

• Test Taxonomy in the Model• Rinse, repeat

Wendy Pohs,ASIS&T Bulletin 12/1/13

Page 16: Bionic Info Pro - Taxonomies and Machine Learning SLA 2014

Domain Knowledge and Thick Data

• Thick Data analysis primarily relies on human brain power to process a small “N” while big data analysis requires computational power (of course with humans writing the algorithms) to process a large “N”.

• Big Data reveals insights with a particular range of data points, while Thick Data reveals the social context of and connections between data points. Big Data delivers numbers; thick data delivers stories. Big data relies on machine learning; thick data relies on human learning.

http://ethnographymatters.net/blog/2013/05/13/big-data-needs-thick-data/ (Tricia Wang)

Page 17: Bionic Info Pro - Taxonomies and Machine Learning SLA 2014

Data Driven CI is Meaningless Without

Human/Domain Knowledge

http://www.wired.com/2014/04/your-big-data-is-worthless-if-you-dont-bring-it-into-the-real-world/

Page 18: Bionic Info Pro - Taxonomies and Machine Learning SLA 2014

Recap• Data Mining for CI is not new

• Refinement and Improvement

• Bigger, Weirder Data

Page 19: Bionic Info Pro - Taxonomies and Machine Learning SLA 2014

Recap• Where it’s at: Hybrid Schemas

• Thick Data, not just Big Data

• HUMAN ELEMENT IS ESSENTIAL

Page 20: Bionic Info Pro - Taxonomies and Machine Learning SLA 2014

Questions? Elaine Lasda BergmanUniversity at Albany

http://www.slideshare.net/librarian68

[email protected]

@ElaineLibrarian