56
+ Towards Web 3.0: Harnessing Collective Intelligence of Humans for Knowledge Acquisition and Web Accessibility Presenter: Deepti Aggarwal Advisors: Prof. Venkatesh Choppella, & Prof. Vasudeva Varma Reviewers: Dr. Raghu Reddy, & Dr. Priyanka Srivastava 1

Masters thesis defense talk

Embed Size (px)

Citation preview

Page 1: Masters thesis defense talk

+

Towards Web 3.0: Harnessing Collective Intelligence of Humans for Knowledge Acquisition and Web Accessibility Presenter: Deepti Aggarwal Advisors: Prof. Venkatesh Choppella, & Prof. Vasudeva Varma Reviewers: Dr. Raghu Reddy, & Dr. Priyanka

Srivastava

1

Page 2: Masters thesis defense talk

+Evolution of the Web

Web 1.0 Web 2.0 Web 3.0

Web as an information portal

Web as a social platform

Web as a personalized portable web

Focus on ownership Focus on community Focus on individual

Static web pages User generated content

Semantic and portable pages

Meaning is dictated Meaning is socially constructed

Meaning is socially constructed and

contextually reinvented.

1990 2003 2010 2020

2

World wide web is a larger collection of interconnected documents

Page 3: Masters thesis defense talk

+Getting to Web 3.0 Major hurdles

1. Scattered data

2. Excess of data

3. Understanding data

Where should I look for the data?

Which data is best for me?

How can I understand the available data?

3

Page 4: Masters thesis defense talk

+

n  The web where every data owns its semantics and context of the content is defined by the data.

n  The web which is capable of reading and understanding the user context.

n  CONTEXT refers to why the content is relevant and to whom.

n  SEMANTICS refers to the meaning of data and how it is relevant to a given context.

4

Getting to Web 3.0 Through context and semantics

Page 5: Masters thesis defense talk

+

n Personalized web: content and advertising that match user preferences and choices.

n Data on demand: no need for browsing when all databases are semantically connected to each other.

n Multi-lingual web: easy access of sources available in varied languages.

5

Web 3.0 Possible applications

Knowledge acquisition

(Extraction and Validation)

Accessibility (Re-narration)

Page 6: Masters thesis defense talk

+Getting to Web 3.0: Methodology & contributions

Methodology: (Human Computer Interaction)

n  Research through design (Zimmerman 2007).

n  Prototyping – User studies – Analysis – Discussions.

Contributions:

n  Three prototypes, and their studies.

(Power of Friends, uPick ) : Extracting and validating information (Alipi) : Making the web more accessible through re-narration

6

Page 7: Masters thesis defense talk

+

Exploration: Power of Friends, an online friend sourcing game.

Problem: Extract and validate information

7

Page 8: Masters thesis defense talk

+Problem: Extracting & Validating Community related Information

Friends on social networks possess a variety of information about each other.

Applications: to personalize one’s browsing and targeted advertising.

Issues: information is scattered, and no one is an expert.

8

Page 9: Masters thesis defense talk

+Existing approaches

Task: Extract information about a person X.

Approach 1: Ask X. (21 questions)

Approach 2: Ask X’s friends. (Bernstein et al. 2008)

Problems: involves social awkwardness of revealing the truth.

9

Page 10: Masters thesis defense talk

+Motivation

Looking Glass-Self Theory

Cultural Consensus Theory

Secure Multi-party computation

Power of Ten

Make it fun

Ensure Privacy

Ask everyone

Get opinion of friends

10

Page 11: Masters thesis defense talk

+Our approach: Crowd Consensus

Our approach: Ask X’s friend to guess the opinion of X’s other friends.

Benefits: Tackles social awkwardness in an engaging and fun way.

11

!

Page 12: Masters thesis defense talk

+Power of Friends: Our Proposed game

A single player, and asynchronous social game.

12

Page 13: Masters thesis defense talk

+User study of Power of Friends

²  Seven communities, 67 participants (40 female).

²  Questions related to community members: 10 in each game play.

²  Questions related to the likes, hobbies and daily activities of community members.

² Task: play the game online.

² Four sessions: demographic information and questions about bonding, game demonstration, game play and interview.

13

Page 14: Masters thesis defense talk

+Results of the study

Community Id

Number of questions correctly identified

C1 6/10

C2 8/10

C3 5/10

C4 7/10

C5 6/10

C6 8/10

C7 7/10

Results of the study: Communities C2 and C6 were more accurate.

Correlation between the performance of a community and the bonding level within its members.

14

Page 15: Masters thesis defense talk

+Study Findings

n  It is challenging: “It requires a lot of thinking. I wish I knew my coworkers better”.

n  It creates a social impact: “It is not possible that my friend … knows cooking, I think she hates it. I have to ask her.”

n  It explores social awkwardness of answering a given question: “It is a cool way of giving my answer ... No one knows my answer except me.”

15

Page 16: Masters thesis defense talk

+Study Findings (contd.)

n  It creates a sense of connectedness among people: “Its kind of fun to see how accurately my thinking aligns with my friends.”

n  25% of the participants got confused while playing and thus needed help to remind them the game strategy.

n  30% recommended for multi-player settings; 10% for time-based challenge,60% for publishing the game on Facebook.

16

Page 17: Masters thesis defense talk

+Design Themes

n  Identify the level of bonding among friends as it impacts their performance in the game.

n  Include questions of every group member.

n  Select the questions carefully keeping interests of the members in mind.

n  Allow participants to generate questions.

17

Page 18: Masters thesis defense talk

+Discussions and Future Work

n  Exploring indirect mode of interaction for larger communities. (IRB approved)

n  A comparative study between direct and indirect mode of answering questions is planned.

n  Publishing game on Facebook. (Social media interaction)

18

Personalized web: content and advertising that match user preferences and choices.

Page 19: Masters thesis defense talk

+

Exploration 2: uPick, a crowdsourcing system for extracting Named Entities.

Problem: Extracting and validating information

19

Page 20: Masters thesis defense talk

+Problem scenario: Acquiring accurate and up-to-date information about Sachin from various web sources.

20

Page 21: Masters thesis defense talk

+Problem: Extracting useful data on demand

21

Page 22: Masters thesis defense talk

+Difficulty in Processing English language

“You see sir, I can talk English, I can walk English, I can laugh English, I can run English, because English

is such a funny language. Amitabh Bachhan in the movie Namak Halal

22

Page 23: Masters thesis defense talk

+Other Problems

Sachin Tendulkar was born in Bombay. He studied in Sharadashram...

Sachin Tendulkar was born in Bombay. Master Blaster is …

Sachin remembered his father last night … He said he loved poems.

Sachin Tendulkar was born in Bombay. Tendlya is …

Co- reference Ambiguity

Acronym Abbreviations

23

Page 24: Masters thesis defense talk

+Constructs of a sentence: Named Entity and relations

n  It is an atomic element in a body of text.

n  Types: person, organization, location etc.

n  Different named entities when linked together, form a relation.

Sachin Tendulkar was born in Bombay

Subject NE of type

‘Person’

Relation NE of type

‘Verb’

Object NE of type ‘Location’

24

Page 25: Masters thesis defense talk

+Extracting relationships among NEs: Required process

n  Identify part of speech constructs: noun, verb, adjective etc.

n  Determine co-references, abbreviations and acronyms.

n  Connect them together to form a relationship.

25

Page 26: Masters thesis defense talk

+Existing approach: Automated techniques

n  Natural Language Processing based: rule based.

n  Machine Learning based: supervised and unsupervised learning.

n  Other methods: Vocabulary based.

n  Hybrid: NLP and vocabulary based.

n  Issues: Dependency, Scalability.

26

Page 27: Masters thesis defense talk

+uPick : Our Proposed System

27

A crowdsourcing system to extract Named Entity relationship from the documents.

Page 28: Masters thesis defense talk

+uPick Working

n Step 1: Extract NEs and relations by using POS Tagger and relation extraction rules proposed by Chen.

n Step 2: Present the extracted relations to a crowd in the form a game (challenge).

n Step 3: Collect the generated responses.

n Step 4: Filter the relations by collecting the majority votes and comparing against the expert filtered relations.

28

Page 29: Masters thesis defense talk

+Processing of the generated data

n  With the help of human experts, we collected valid relations for each document from automatically generated relations (step 1). These relations form a ground truth dataset for further validation.

n  We compare the collected responses from each game against the expert corrected facts stored in the database and filter out erroneous response data.

n  The relation instances receiving a majority are taken as true facts corresponding to the document.

29

Page 30: Masters thesis defense talk

+

User study of

uPick

n  Supervised laboratory study, 12 participants (8 females).

n  Three sessions: training, game play and interview.

n  Four documents: Ashoka Maurya, Sachin Tendulkar, Shahrukh Khan, and Sonia Gandhi.

n Procedure: Read the given text and select the relations from the given list.

30

Page 31: Masters thesis defense talk

+Study Results

D1 D2 D3 D4

Total number of presented relations

37 39 40 33

Correctly identified valid relations

19 18 19 15

Incorrectly identified valid relations as invalid

5 6 4 1

Correctly identified invalid relations

12 12 16 15

Incorrectly identified invalid relations as valid

1 3 1 2

Accuracy (Correctly identified relations / total relations)

84% 77% 87% 91%

Accuracy using automated techniques only (Valid relations / total relations)

65% 61% 57% 49%

31

Page 32: Masters thesis defense talk

+Discussions and future work

n  Helpful in remembering facts related to a text, so could be used in online education systems.

n  Turn it into an engaging game play.

n  Leaderboards and persistent scoring.

32

Data on demand: no need for browsing when all databases are semantically connected to each other.

Page 33: Masters thesis defense talk

+

Exploration: Alipi, an online crowdsourcing system for re-narration.

Problem: Making the web accessible

33

Page 34: Masters thesis defense talk

+Problem scenario: Accessibility of the web content

A webpage on Fire Safety is re-narrated in Hindi

34

How can a person who do not know English, understand web pages on fire safety ? Solution : Re - narration

Page 35: Masters thesis defense talk

+Why are the existing approaches not sufficient?

n  Single point of control and authority.

n  Author forced to anticipate target audience.

n  Transferring authorship is difficult.

35

Page 36: Masters thesis defense talk

+

n  User rewrites different sections of a web-page.

n  Distribution of the point of control from author to users.

n  A step from target audience to target communities.

n  Follows the principle of “the best content for each one”.

36

Alipi: A re-narration framework (Dinesh et al. 2012)

Page 37: Masters thesis defense talk

+Alipi Architecture

37

Page 38: Masters thesis defense talk

+Alipi Architecture: Creating and Storing the re-narrations

38

Page 39: Masters thesis defense talk

+Alipi Architecture: Displaying a re-narrated page to the user

39

Page 40: Masters thesis defense talk

+Alipi Prototype

1.Open the website http:://alipi.us. Enter the page of interest, here, http://iiit.ac.in

2. Click on the button “Re-narrate”

40

Page 41: Masters thesis defense talk

+Alipi Prototype: Steps to re-narrate a page

3. Select a section of the web-page. Re-narrate the element.

4. Publish your re-narration by providing the target community.

41

Page 42: Masters thesis defense talk

+Alipi Prototype: Steps to see the available re-narrations

3. After clicking the “Re-narrations” button, choose a re-narration from the available list.

4. The queried page will change with the re-narrated element.

42

Page 43: Masters thesis defense talk

+My contribution: Testing feasibility of alipi

q  IIIT-H R&D showcase: 70 participants (45 male)

q Objective: to find out motivation of the user behind using Alipi, and for what sorts of tasks.

q Task: to re-narrate a web-page: IIIT-H webpage, Indian culture or any other page and later to check the available re-narrations.

q Four phases: demographics, training, system experience and questionnaire.

43

Page 44: Masters thesis defense talk

+Findings of the study

q  Participants appreciated both the roles of re-narrator and reader: vary for known and unknown domain.

q Re-narrators preferred text based re-narrations over video and audio re- narrations: to escape from setting the camera, and bandwidth issues.

q Readers preferred re-narrations in mixed media: to get a rich experience.

q Majority wanted to re-narrate for their friends and see re-narrations from known people: preferences known.

q  Participants found the interface design as non-intuitive and uneasy to follow but the system very useful to share information.

44

Page 45: Masters thesis defense talk

+My contribution 2: Alipi browser plugin

§  Allowing dynamic filtration based on user profile.

§  By-passes the URL http://alipi.us

§  Decentralize and editable user profile.

45

Page 46: Masters thesis defense talk

+Discussions and future work

n  How can we check the credibility of a re-narration: filtration of noisy re-narrations, ranking based on public voting?

n  How can we improve our selection algorithm to incorporate: rapidly growing online communities, dialects of a geographical location, vicinity of user mentioned region?

n  What could be the security implications of Alipi architecture?

46

Multi-lingual web: easy access and interoperability among contents between different languages.

Page 47: Masters thesis defense talk

+Summary and the way ahead

47

Page 48: Masters thesis defense talk

+

n Personalized web: content and advertising that match user preferences and choices.

n Data on demand: no need for browsing when all databases are semantically connected to each other.

n Multi-lingual web: easy access and interoperability among contents between different languages.

48

Summary of my work:

Knowledge acquisition

(Extraction and Validation)

Accessibility (Re-narration)

uPick

Alipi

Power of Friends

Page 49: Masters thesis defense talk

+Future Work

n  Can the proposed Crowd Consensus framework be useful to reduce the number of iterations required for crowdsourcing tasks?

n  Using the belief modality, can we develop a mathematical model to check the accuracy of answer generated by using the Crowd Consensus approach and to determine various related conditions where the accuracy may deviate?

n  Can the proposed uPick approach be useful in enhancing the experience of students while reading textbooks?

n  How to check the relatedness of a re-narration (generated with Alipi tool) with the original document as well as with other available re-narrations for the same web-page?

49

Page 50: Masters thesis defense talk

+References

n  C. Cooley. Human Nature & Social Order - Ppr. Social Science Classics Series. Transaction Pub, 1964.

n  M. S. Bernstein, D. Tan, G. Smith, M. Czerwinski, and E. Horvitz. Personalization via friendsourcing. ACM Trans. Comput.-Hum. Interact., 17(2):6:1–6:28, May 2008.

n  P.-S. Chen. English sentence structure and entity-relationship diagrams. Information Sciences, 29(2- 3):127 – 149, 1983.

n  S. C. Weller. Cultural consensus theory: Applications and frequently asked questions. Field Methods, 19(4):339–368, 2007.

50

Page 51: Masters thesis defense talk

+References (contd.)

n  I. Tuomi. Data is more than knowledge: implications of the reversed knowledge hierarchy for knowledge management and organizational memory. J. Manage. Inf. Syst., 16(3):103–117, Dec. 1999.

n  S. Sekine. Named Entity: History and Future. 2004.

n  W. Du and M. J. Atallah. Secure multi-party computation problems and their applications: a review and open problems. In Proceedings of the 2001 workshop on New security paradigms, NSPW ’01, pages 13–22, New York, NY, USA, 2001. ACM.

n  Z. Syed, E. Viegas, and S. Parastatidis. Automatic discovery of semantic relations using mindnet. LREC, 2010.

51

Page 52: Masters thesis defense talk

+References (contd.)

n  21 Questions. http://apps.facebook.com/twentyoneq/.

n  Mindnet. http://research.microsoft.com/apps/pubs/default.aspx?id=69647.

n  Power of 10. http://en.wikipedia.org/wiki/Power of 10.

n  Stanford pos tagger. http://nlp.stanford.edu/software/tagger.shtml.

52

Page 53: Masters thesis defense talk

+Related Publications

n  D. Aggarwal, R. A. Khot, and V. Choppella. Power of Friends: When Friends Guess About their Friends’ Guess. In Proceedings of the SIGCHI conference on Human factors in computing systems, CHI ’13, Paris, France, 2013, ACM.

n  D. Aggarwal, R. A. Khot, V. Varma, and V. Choppella. UPICK: Crowdsourcing Based Approach to Extract Relations Among Named Entities. In Proceedings of IndiaHCI, Pune, India, 2012 (Accepted as full paper).

n  T. B. Dinesh, S. Uskudrali, S. Sastry, D. Aggarwal, and V. Choppella. Alipi: A framework for re-narrating web pages. In Proceedings of the International Cross- Disciplinary Conference on Web Accessibility, W4A ’12, pages 22:1-4, Lyon, France, 2012, ACM.

n  D. Aggarwal, R. A. Khot, A. K. Dey, and V. Choppella. Crowd Consensus: Friendsourcing based approach to generate cultural beliefs. In preparation.

53

Page 54: Masters thesis defense talk

+Public Demonstrations

n  Presented “Alipi: Making the web Inclusive and Accessible for All” in IIIT-Hyderabad R&D Showcase, Hyderabad, India, 2013.

n  Presented “Crowdsourcing Based Approach to Extract Relations Among Named Entities” in OpenData Camp-Hyderabad Meet, Hyderabad, India, 2012.

n  Poster presentations on “Power of Friends: Rethinking Games With a Purpose”, and “Alipi: A renarration Web” in IIIT-Hyderabad R&D Showcase, Hyderabad, India, 2012.

54

Page 55: Masters thesis defense talk

+

My family

Special Thanks

Reviewers

Prof. Anind Dey Prof. Vasudeva Varma

IIIT-H Faculty

Study participants

Prof. Venkatesh Choppella

Friends

55

Dr. T. B. Dinesh

Page 56: Masters thesis defense talk

Thank you!

For more details: [email protected]

http://pascal.iiit.ac.in/~deepti.aggarwal

Web 3.0… Web of opportunities! This is just the beginning!

56