44
Data Mining and Machine Learning- in a nutshell Arizona State University Data Mining and Machine Learning Lab Collective Intelligence 1 DATA MINING AND MACHINE LEARNING IN A NUTSHELL COLLECTIVE INTELLIGENCE PART I Mohammad-Ali Abbasi http://www.public.asu.edu/~mabbasi2/ SCHOOL OF COMPUTING, INFORMATICS, AND DECISION SYSTEMS ENGINEERING ARIZONA STATE UNIVERSITY http://dmml.asu.edu/

Collective Inteligence Part I

Embed Size (px)

DESCRIPTION

Collective Intelligence

Citation preview

Page 1: Collective Inteligence Part I

Data Mining and Machine Learning- in a nutshellArizona State University Data Mining and Machine Learning Lab Collective Intelligence 1

DATA MINING AND MACHINE LEARNINGIN A NUTSHELL

COLLECTIVE INTELLIGENCEPART I

Mohammad-Ali Abbasihttp://www.public.asu.edu/~mabbasi2/

SCHOOL OF COMPUTING, INFORMATICS, AND DECISION SYSTEMS ENGINEERINGARIZONA STATE UNIVERSITY

http://dmml.asu.edu/

Page 2: Collective Inteligence Part I

Data Mining and Machine Learning- in a nutshellArizona State University Data Mining and Machine Learning Lab Collective Intelligence 2

About Collective Intelligence

• Definition of collective intelligence– Examples happening around us

• What constitutes collective intelligence– Groups, number of members, variety, etc.

• How can one improve collective intelligence– What are necessary conditions to achieve CI– A case in data mining and machine learning?

• What can one do with collective intelligence in the age of social media– Opportunities for Data Mining

Page 3: Collective Inteligence Part I

Data Mining and Machine Learning- in a nutshellArizona State University Data Mining and Machine Learning Lab Collective Intelligence 3

Definitions for Collective intelligence

• Wikipedia– Collective intelligence is a shared or group intelligence that

emerges from the collaboration and competition of many individuals

• MIT Center for CI– Groups of individuals doing things collectively that seem intelligent

• Toby Segaran in Programming CI– Combining the behavior, preferences, or ideas of a group of people

to create novel insights

• Unknown– Collective intelligence is any intelligence that arises from - or is a

capacity or characteristic of - groups and other collective living systems

Page 4: Collective Inteligence Part I

Data Mining and Machine Learning- in a nutshellArizona State University Data Mining and Machine Learning Lab Collective Intelligence 4

Examples of collective intelligence - Wikipedia

• Wikipedia

• Thousands of contributors from across the world have collectively created the world’s largest encyclopedia

• with almost no centralized control

Page 5: Collective Inteligence Part I

Data Mining and Machine Learning- in a nutshellArizona State University Data Mining and Machine Learning Lab Collective Intelligence 5

Examples of collective intelligence - PageRank

• PageRank Algorithm used by Google

Page 6: Collective Inteligence Part I

Data Mining and Machine Learning- in a nutshellArizona State University Data Mining and Machine Learning Lab Collective Intelligence 6

Examples of collective intelligence - CAPTCHA

• CAPTCHA– Completely Automated Public Turing test to tell Computers and Humans Apart– A reverse Turing test (machine to human instead of human to machine)

• A service that helps to digitize books, newspapers and old time radio shows– About 200 million CAPTCHAs are solved by

humans around the world every day– More than 150,000 hours of work each day

Page 7: Collective Inteligence Part I

Data Mining and Machine Learning- in a nutshellArizona State University Data Mining and Machine Learning Lab Collective Intelligence 7

Vark.com

1. Send a question

2. Aardvark finds the perfect person to answer

3. Get their response

Page 8: Collective Inteligence Part I

Data Mining and Machine Learning- in a nutshellArizona State University Data Mining and Machine Learning Lab Collective Intelligence 8

Kasparov vs. the World

• Kasparov v. the World was a chess match held in 1999, when world champion Gary Kasparov played against “the World,” with the World’s moves determined by majority vote over the Internet of anyone who wanted to participate.

Kasparov eventually won, but he said it was the hardest game he ever played

Page 9: Collective Inteligence Part I

Data Mining and Machine Learning- in a nutshellArizona State University Data Mining and Machine Learning Lab Collective Intelligence 9

Examples of collective intelligence - Threadless

• Threadless.com

• In Threadless, anyone who wants to can design a T-shirt, submit that design to a weekly contest, and vote for their favorite designs

• the company harnesses the collective intelligence of a community of over 500,000 people to design and select T-shirts

Page 10: Collective Inteligence Part I

Data Mining and Machine Learning- in a nutshellArizona State University Data Mining and Machine Learning Lab Collective Intelligence 10

Examples of collective intelligence –Google Image Labeler

• It is a feature, in the form of a game, of Google Image Search that allows the user to label random images to help improve the quality of Google's image search results

Page 11: Collective Inteligence Part I

Data Mining and Machine Learning- in a nutshellArizona State University Data Mining and Machine Learning Lab Collective Intelligence 11

Examples of collective intelligence – Ant Societies

• Ant societies exhibit more intelligence than any other animal except for humans, if we measure intelligence in terms of technology. Ant societies are able to do agriculture, in fact, in several different forms of agriculture. Some ant societies keep livestock of various forms, for example, some ants keep and care for aphids for "milking”; Leaf cutters care for fungi and carry leaves to feed the fungi.

Page 12: Collective Inteligence Part I

Data Mining and Machine Learning- in a nutshellArizona State University Data Mining and Machine Learning Lab Collective Intelligence 12

Examples of collective intelligence - Games

• Games such as WorldCraft, The Sims, Halo or Second Life are designed to be more non-linear and depend on collective intelligence for expansion.

• This way of sharing is gradually evolving and influencing the mindset of the current and future generations.

Page 13: Collective Inteligence Part I

Data Mining and Machine Learning- in a nutshellArizona State University Data Mining and Machine Learning Lab Collective Intelligence 15

Principals of Collective Intelligence

• Collective intelligence is of mass collaboration. In order for collective intelligence to emerge, four principles exist to promote creativity: – Openness– Peering– Sharing and – Acting globally

Page 14: Collective Inteligence Part I

Data Mining and Machine Learning- in a nutshellArizona State University Data Mining and Machine Learning Lab Collective Intelligence 16

Openness

• Traditionally, people and companies are naturally reluctant to share ideas and intellectual property because these resources provide the edge over competitors.

• However, in time, openness is promoted when people and companies began to loosen hold over these resources as they reap more benefits in doing so.

• Openness enables products to gain significant improvement and scrutiny through transparent collaboration.

Page 15: Collective Inteligence Part I

Data Mining and Machine Learning- in a nutshellArizona State University Data Mining and Machine Learning Lab Collective Intelligence 17

Peering

• A form of horizontal organization with the capacity to create information technology and physical products.

• One example is the ‘opening up’ of the Linux program where users are free to modify and develop it provided that they made it available for others.

• Participants in this form of collective intelligence may have different motivations for contributing, but the results achieved are for the improvement of a product or service.

• “Peering succeeds because it leverages self-organization – a style of production that works more effectively than hierarchical management for certain tasks.”

Page 16: Collective Inteligence Part I

Data Mining and Machine Learning- in a nutshellArizona State University Data Mining and Machine Learning Lab Collective Intelligence 18

Sharing

• Research has shown that more and more companies have started to share some, while maintaining some degree of control over others, like potential and critical patent rights.

• This is because companies have realized that by limiting all their intellectual property, they are shutting out all possible opportunities.

• Sharing some has allowed them to expand their market and bring out products faster.

Page 17: Collective Inteligence Part I

Data Mining and Machine Learning- in a nutshellArizona State University Data Mining and Machine Learning Lab Collective Intelligence 19

Acting Globally

• The advancement in communication technology has prompted the rise of global companies, or e-Commerce that has allowed individuals to set up businesses at low to almost no overhead costs.

• The influence of the Internet is widespread, therefore a globally integrated company would have no geographical boundaries but have global connections, allowing them to gain access to new markets, ideas and technology.

• Therefore it is important for firms to get updated and remain globally competitive or they will face a declining rate of clients.

Page 18: Collective Inteligence Part I

Data Mining and Machine Learning- in a nutshellArizona State University Data Mining and Machine Learning Lab Collective Intelligence 20

Types of Collective Intelligence

Page 19: Collective Inteligence Part I

Data Mining and Machine Learning- in a nutshellArizona State University Data Mining and Machine Learning Lab Collective Intelligence 21

Elements of Collective Intelligence

• Staffing– Who is performing the task?

• Incentives– Why are they doing it?

• Goal– What is being accomplished?

• Structure, process– How is it being done?

Page 20: Collective Inteligence Part I

Data Mining and Machine Learning- in a nutshellArizona State University Data Mining and Machine Learning Lab Collective Intelligence 22

Elements of Collective Intelligence

• Who?– Hierarchy– Crowd

• Why?– Money– Love– Glory

• What?– Create– Decide

• Who– Collection– Collaboration

Page 21: Collective Inteligence Part I

Data Mining and Machine Learning- in a nutshellArizona State University Data Mining and Machine Learning Lab Collective Intelligence 23

Mapping the collective intelligence elements for Wikipedia

Page 22: Collective Inteligence Part I

Data Mining and Machine Learning- in a nutshellArizona State University Data Mining and Machine Learning Lab Collective Intelligence 24

Issues with Crowd Wisdom

• Questions– Why can the crowd be smarter than any individual

in the crowd?– Is it guaranteed? If not, what are the conditions

under which the crowd can make best decisions?– How can one gauge the reliability of crowd

wisdom? Is crowd wisdom valid, trustworthy, and verifiable?

– How to find a crowd, its leader/influencer/average opinion?

– How is each member influenced by others?

Page 23: Collective Inteligence Part I

Data Mining and Machine Learning- in a nutshellArizona State University Data Mining and Machine Learning Lab Collective Intelligence 25

Collective Intelligence and Societies

• The main base of all kinds of CI’s is society

• CI in traditional societies– Families, companies, countries, and armies are all

groups of individuals doing things collectively that, at least sometimes, seem intelligent

• CI in Web based societies- Social Networking sites– Internet and specially Web 2.0 applications

provide a platform for communications and building societies

Page 24: Collective Inteligence Part I

Data Mining and Machine Learning- in a nutshellArizona State University Data Mining and Machine Learning Lab Collective Intelligence 26

• Web 2.0• Social Computing

Collective Intelligence and the Internet

Page 25: Collective Inteligence Part I

Data Mining and Machine Learning- in a nutshellArizona State University Data Mining and Machine Learning Lab Collective Intelligence 27

Web Impacts on CI

• The ability of new media to easily store and retrieve information, predominantly through databases and the Internet, allows it to be shared without difficulty.

• Thus, through interaction with new media, knowledge easily passes between sources resulting in another form of collective intelligence

Page 26: Collective Inteligence Part I

Data Mining and Machine Learning- in a nutshellArizona State University Data Mining and Machine Learning Lab Collective Intelligence 28

WEB 2.0 and Many Variants

Page 27: Collective Inteligence Part I

Data Mining and Machine Learning- in a nutshellArizona State University Data Mining and Machine Learning Lab Collective Intelligence 29

Elements of WEB 2.0

Page 28: Collective Inteligence Part I

Data Mining and Machine Learning- in a nutshellArizona State University Data Mining and Machine Learning Lab Collective Intelligence 30

Web 2.0: Evolution Towards a Read/Write Platform

Web 1.0(1993-2003)

Pretty much HTML pages viewed through a browser

Web 2.0(2003- beyond)

Web pages, plus a lot of other “content” shared over the web, with more interactivity; more like an

application than a “page”

“Read” Mode “Write” & Contribute

“Page” Primary Unit of content “Post / record”

“static” State “dynamic”

Web browser Viewed through… Browsers, RSS Readers, anything

“Client Server” Architecture “Web Services”

Web Coders Content Created by… Everyone

“geeks” Domain of… “mass amatuerization”

Page 29: Collective Inteligence Part I

Data Mining and Machine Learning- in a nutshellArizona State University Data Mining and Machine Learning Lab Collective Intelligence 31

CI in Social Media

• Crowd members assign different weights to individual inputs on the basis of their relationship with the people who provided them and then make individual decisions – Blogosphere– Facebook– YouTube– Epinions.com– Amazon– eBay– Digg

Page 30: Collective Inteligence Part I

Data Mining and Machine Learning- in a nutshellArizona State University Data Mining and Machine Learning Lab Collective Intelligence 32

Blogging is the Most Recognized Example of Web 2.0

Page 31: Collective Inteligence Part I

Data Mining and Machine Learning- in a nutshellArizona State University Data Mining and Machine Learning Lab Collective Intelligence 33

Blogging is the Most Recognized Example of Web 2.0

Page 32: Collective Inteligence Part I

Data Mining and Machine Learning- in a nutshellArizona State University Data Mining and Machine Learning Lab Collective Intelligence 34

Wikipedia is a Collaborative Dictionary Being Edited in Real-time by Anyone

Page 33: Collective Inteligence Part I

Data Mining and Machine Learning- in a nutshellArizona State University Data Mining and Machine Learning Lab Collective Intelligence 35

Alive At ASU

Page 34: Collective Inteligence Part I

Data Mining and Machine Learning- in a nutshellArizona State University Data Mining and Machine Learning Lab Collective Intelligence 36

WEB 2.0 Technologies

• APIs

• RSS (Really Simple Syndication)– Content Syndication

• Web Services– Open Data

• AJAX (Asynchronous Javascript and XML)

• CSS (Cascading Style Sheets)– Content with Style

Page 35: Collective Inteligence Part I

Data Mining and Machine Learning- in a nutshellArizona State University Data Mining and Machine Learning Lab Collective Intelligence 37

WEB 2.0, Summing Up• Web 2.0 hard to define, but very far from just hype

– Culmination of a number of Web trends• Importance of Open Data

– Allows communities to assemble unique tailored applications

• Importance of Users

– Seek and create network effects• Browser as Application Platform

– Huge potential for new kinds of Web applications

Page 36: Collective Inteligence Part I

Data Mining and Machine Learning- in a nutshellArizona State University Data Mining and Machine Learning Lab Collective Intelligence 38

Programming Collective Intelligence

Page 37: Collective Inteligence Part I

Data Mining and Machine Learning- in a nutshellArizona State University Data Mining and Machine Learning Lab Collective Intelligence 39

Crawl the web

Page 38: Collective Inteligence Part I

Data Mining and Machine Learning- in a nutshellArizona State University Data Mining and Machine Learning Lab Collective Intelligence 40

Spiders (Robots/Bots/Crawlers)

• Start with a comprehensive set of root URL’s from which to start the search.

• Follow all links on these pages recursively to find additional pages.

• Index all novel found pages in an inverted index as they are encountered.

• May allow users to directly submit pages to be indexed (and crawled from).

Page 39: Collective Inteligence Part I

Data Mining and Machine Learning- in a nutshellArizona State University Data Mining and Machine Learning Lab Collective Intelligence 41

Search Strategies

Breadth-first Search

Page 40: Collective Inteligence Part I

Data Mining and Machine Learning- in a nutshellArizona State University Data Mining and Machine Learning Lab Collective Intelligence 42

Search Strategies (cont)

Depth-first Search

Page 41: Collective Inteligence Part I

Data Mining and Machine Learning- in a nutshellArizona State University Data Mining and Machine Learning Lab Collective Intelligence 43

Search Strategy Trade-Off’s

• Breadth-first explores uniformly outward from the root page but requires memory of all nodes on the previous level (exponential in depth). Standard spidering method.

• Depth-first requires memory of only depth times branching-factor (linear in depth) but gets “lost” pursuing a single thread.

• Both strategies implementable using a queue of links (URL’s).

Page 42: Collective Inteligence Part I

Data Mining and Machine Learning- in a nutshellArizona State University Data Mining and Machine Learning Lab Collective Intelligence 44

Spidering Algorithm

• Initialize queue (Q) with initial set of known URL’s.

• Until Q empty or page or time limit exhausted:– Pop URL, L, from front of Q.– If L is not to an HTML page (.gif, .jpeg, .ps, .pdf, .ppt…)

• continue loop.

– If already visited L, continue loop.– Download page, P, for L.– If cannot download P (e.g. 404 error, robot excluded)

• continue loop.

– Index P (e.g. add to inverted index or store cached copy).– Parse P to obtain list of new links N.– Append N to the end of Q.

Page 43: Collective Inteligence Part I

Data Mining and Machine Learning- in a nutshellArizona State University Data Mining and Machine Learning Lab Collective Intelligence 45

Keeping Spidered Pages Up to Date

• Web is very dynamic: many new pages, updated pages, deleted pages, etc.

• Periodically check spidered pages for updates and deletions:– Just look at header info (e.g. META tags on last update) to

determine if page has changed, only reload entire page if needed.

• Track how often each page is updated and preferentially return to pages which are historically more dynamic.

• Preferentially update pages that are accessed more often to optimize freshness of more popular pages.

Page 44: Collective Inteligence Part I

Data Mining and Machine Learning- in a nutshellArizona State University Data Mining and Machine Learning Lab Collective Intelligence 46

Mohammad-Ali Abbasi (Ali), Ali, is a Ph.D student at Data Mining and Machine Learning Lab, Arizona State University. His research interests include Data Mining, Machine Learning, Social Computing, and Social Media Behavior Analysis.

http://www.public.asu.edu/~mabbasi2/