Upload
deepti-aggarwal
View
829
Download
4
Tags:
Embed Size (px)
Citation preview
+
Towards Web 3.0: Harnessing Collective Intelligence of Humans for Knowledge Acquisition and Web Accessibility Presenter: Deepti Aggarwal Advisors: Prof. Venkatesh Choppella, & Prof. Vasudeva Varma Reviewers: Dr. Raghu Reddy, & Dr. Priyanka
Srivastava
1
+Evolution of the Web
Web 1.0 Web 2.0 Web 3.0
Web as an information portal
Web as a social platform
Web as a personalized portable web
Focus on ownership Focus on community Focus on individual
Static web pages User generated content
Semantic and portable pages
Meaning is dictated Meaning is socially constructed
Meaning is socially constructed and
contextually reinvented.
1990 2003 2010 2020
2
World wide web is a larger collection of interconnected documents
+Getting to Web 3.0 Major hurdles
1. Scattered data
2. Excess of data
3. Understanding data
Where should I look for the data?
Which data is best for me?
How can I understand the available data?
3
+
n The web where every data owns its semantics and context of the content is defined by the data.
n The web which is capable of reading and understanding the user context.
n CONTEXT refers to why the content is relevant and to whom.
n SEMANTICS refers to the meaning of data and how it is relevant to a given context.
4
Getting to Web 3.0 Through context and semantics
+
n Personalized web: content and advertising that match user preferences and choices.
n Data on demand: no need for browsing when all databases are semantically connected to each other.
n Multi-lingual web: easy access of sources available in varied languages.
5
Web 3.0 Possible applications
Knowledge acquisition
(Extraction and Validation)
Accessibility (Re-narration)
+Getting to Web 3.0: Methodology & contributions
Methodology: (Human Computer Interaction)
n Research through design (Zimmerman 2007).
n Prototyping – User studies – Analysis – Discussions.
Contributions:
n Three prototypes, and their studies.
(Power of Friends, uPick ) : Extracting and validating information (Alipi) : Making the web more accessible through re-narration
6
+
Exploration: Power of Friends, an online friend sourcing game.
Problem: Extract and validate information
7
+Problem: Extracting & Validating Community related Information
Friends on social networks possess a variety of information about each other.
Applications: to personalize one’s browsing and targeted advertising.
Issues: information is scattered, and no one is an expert.
8
+Existing approaches
Task: Extract information about a person X.
Approach 1: Ask X. (21 questions)
Approach 2: Ask X’s friends. (Bernstein et al. 2008)
Problems: involves social awkwardness of revealing the truth.
9
+Motivation
Looking Glass-Self Theory
Cultural Consensus Theory
Secure Multi-party computation
Power of Ten
Make it fun
Ensure Privacy
Ask everyone
Get opinion of friends
10
+Our approach: Crowd Consensus
Our approach: Ask X’s friend to guess the opinion of X’s other friends.
Benefits: Tackles social awkwardness in an engaging and fun way.
11
!
+Power of Friends: Our Proposed game
A single player, and asynchronous social game.
12
+User study of Power of Friends
² Seven communities, 67 participants (40 female).
² Questions related to community members: 10 in each game play.
² Questions related to the likes, hobbies and daily activities of community members.
² Task: play the game online.
² Four sessions: demographic information and questions about bonding, game demonstration, game play and interview.
13
+Results of the study
Community Id
Number of questions correctly identified
C1 6/10
C2 8/10
C3 5/10
C4 7/10
C5 6/10
C6 8/10
C7 7/10
Results of the study: Communities C2 and C6 were more accurate.
Correlation between the performance of a community and the bonding level within its members.
14
+Study Findings
n It is challenging: “It requires a lot of thinking. I wish I knew my coworkers better”.
n It creates a social impact: “It is not possible that my friend … knows cooking, I think she hates it. I have to ask her.”
n It explores social awkwardness of answering a given question: “It is a cool way of giving my answer ... No one knows my answer except me.”
15
+Study Findings (contd.)
n It creates a sense of connectedness among people: “Its kind of fun to see how accurately my thinking aligns with my friends.”
n 25% of the participants got confused while playing and thus needed help to remind them the game strategy.
n 30% recommended for multi-player settings; 10% for time-based challenge,60% for publishing the game on Facebook.
16
+Design Themes
n Identify the level of bonding among friends as it impacts their performance in the game.
n Include questions of every group member.
n Select the questions carefully keeping interests of the members in mind.
n Allow participants to generate questions.
17
+Discussions and Future Work
n Exploring indirect mode of interaction for larger communities. (IRB approved)
n A comparative study between direct and indirect mode of answering questions is planned.
n Publishing game on Facebook. (Social media interaction)
18
Personalized web: content and advertising that match user preferences and choices.
+
Exploration 2: uPick, a crowdsourcing system for extracting Named Entities.
Problem: Extracting and validating information
19
+Problem scenario: Acquiring accurate and up-to-date information about Sachin from various web sources.
20
+Problem: Extracting useful data on demand
21
+Difficulty in Processing English language
“You see sir, I can talk English, I can walk English, I can laugh English, I can run English, because English
is such a funny language. Amitabh Bachhan in the movie Namak Halal
22
+Other Problems
Sachin Tendulkar was born in Bombay. He studied in Sharadashram...
Sachin Tendulkar was born in Bombay. Master Blaster is …
Sachin remembered his father last night … He said he loved poems.
Sachin Tendulkar was born in Bombay. Tendlya is …
Co- reference Ambiguity
Acronym Abbreviations
23
+Constructs of a sentence: Named Entity and relations
n It is an atomic element in a body of text.
n Types: person, organization, location etc.
n Different named entities when linked together, form a relation.
Sachin Tendulkar was born in Bombay
Subject NE of type
‘Person’
Relation NE of type
‘Verb’
Object NE of type ‘Location’
24
+Extracting relationships among NEs: Required process
n Identify part of speech constructs: noun, verb, adjective etc.
n Determine co-references, abbreviations and acronyms.
n Connect them together to form a relationship.
25
+Existing approach: Automated techniques
n Natural Language Processing based: rule based.
n Machine Learning based: supervised and unsupervised learning.
n Other methods: Vocabulary based.
n Hybrid: NLP and vocabulary based.
n Issues: Dependency, Scalability.
26
+uPick : Our Proposed System
27
A crowdsourcing system to extract Named Entity relationship from the documents.
+uPick Working
n Step 1: Extract NEs and relations by using POS Tagger and relation extraction rules proposed by Chen.
n Step 2: Present the extracted relations to a crowd in the form a game (challenge).
n Step 3: Collect the generated responses.
n Step 4: Filter the relations by collecting the majority votes and comparing against the expert filtered relations.
28
+Processing of the generated data
n With the help of human experts, we collected valid relations for each document from automatically generated relations (step 1). These relations form a ground truth dataset for further validation.
n We compare the collected responses from each game against the expert corrected facts stored in the database and filter out erroneous response data.
n The relation instances receiving a majority are taken as true facts corresponding to the document.
29
+
User study of
uPick
n Supervised laboratory study, 12 participants (8 females).
n Three sessions: training, game play and interview.
n Four documents: Ashoka Maurya, Sachin Tendulkar, Shahrukh Khan, and Sonia Gandhi.
n Procedure: Read the given text and select the relations from the given list.
30
+Study Results
D1 D2 D3 D4
Total number of presented relations
37 39 40 33
Correctly identified valid relations
19 18 19 15
Incorrectly identified valid relations as invalid
5 6 4 1
Correctly identified invalid relations
12 12 16 15
Incorrectly identified invalid relations as valid
1 3 1 2
Accuracy (Correctly identified relations / total relations)
84% 77% 87% 91%
Accuracy using automated techniques only (Valid relations / total relations)
65% 61% 57% 49%
31
+Discussions and future work
n Helpful in remembering facts related to a text, so could be used in online education systems.
n Turn it into an engaging game play.
n Leaderboards and persistent scoring.
32
Data on demand: no need for browsing when all databases are semantically connected to each other.
+
Exploration: Alipi, an online crowdsourcing system for re-narration.
Problem: Making the web accessible
33
+Problem scenario: Accessibility of the web content
A webpage on Fire Safety is re-narrated in Hindi
34
How can a person who do not know English, understand web pages on fire safety ? Solution : Re - narration
+Why are the existing approaches not sufficient?
n Single point of control and authority.
n Author forced to anticipate target audience.
n Transferring authorship is difficult.
35
+
n User rewrites different sections of a web-page.
n Distribution of the point of control from author to users.
n A step from target audience to target communities.
n Follows the principle of “the best content for each one”.
36
Alipi: A re-narration framework (Dinesh et al. 2012)
+Alipi Architecture
37
+Alipi Architecture: Creating and Storing the re-narrations
38
+Alipi Architecture: Displaying a re-narrated page to the user
39
+Alipi Prototype
1.Open the website http:://alipi.us. Enter the page of interest, here, http://iiit.ac.in
2. Click on the button “Re-narrate”
40
+Alipi Prototype: Steps to re-narrate a page
3. Select a section of the web-page. Re-narrate the element.
4. Publish your re-narration by providing the target community.
41
+Alipi Prototype: Steps to see the available re-narrations
3. After clicking the “Re-narrations” button, choose a re-narration from the available list.
4. The queried page will change with the re-narrated element.
42
+My contribution: Testing feasibility of alipi
q IIIT-H R&D showcase: 70 participants (45 male)
q Objective: to find out motivation of the user behind using Alipi, and for what sorts of tasks.
q Task: to re-narrate a web-page: IIIT-H webpage, Indian culture or any other page and later to check the available re-narrations.
q Four phases: demographics, training, system experience and questionnaire.
43
+Findings of the study
q Participants appreciated both the roles of re-narrator and reader: vary for known and unknown domain.
q Re-narrators preferred text based re-narrations over video and audio re- narrations: to escape from setting the camera, and bandwidth issues.
q Readers preferred re-narrations in mixed media: to get a rich experience.
q Majority wanted to re-narrate for their friends and see re-narrations from known people: preferences known.
q Participants found the interface design as non-intuitive and uneasy to follow but the system very useful to share information.
44
+My contribution 2: Alipi browser plugin
§ Allowing dynamic filtration based on user profile.
§ By-passes the URL http://alipi.us
§ Decentralize and editable user profile.
45
+Discussions and future work
n How can we check the credibility of a re-narration: filtration of noisy re-narrations, ranking based on public voting?
n How can we improve our selection algorithm to incorporate: rapidly growing online communities, dialects of a geographical location, vicinity of user mentioned region?
n What could be the security implications of Alipi architecture?
46
Multi-lingual web: easy access and interoperability among contents between different languages.
+Summary and the way ahead
47
+
n Personalized web: content and advertising that match user preferences and choices.
n Data on demand: no need for browsing when all databases are semantically connected to each other.
n Multi-lingual web: easy access and interoperability among contents between different languages.
48
Summary of my work:
Knowledge acquisition
(Extraction and Validation)
Accessibility (Re-narration)
uPick
Alipi
Power of Friends
+Future Work
n Can the proposed Crowd Consensus framework be useful to reduce the number of iterations required for crowdsourcing tasks?
n Using the belief modality, can we develop a mathematical model to check the accuracy of answer generated by using the Crowd Consensus approach and to determine various related conditions where the accuracy may deviate?
n Can the proposed uPick approach be useful in enhancing the experience of students while reading textbooks?
n How to check the relatedness of a re-narration (generated with Alipi tool) with the original document as well as with other available re-narrations for the same web-page?
49
+References
n C. Cooley. Human Nature & Social Order - Ppr. Social Science Classics Series. Transaction Pub, 1964.
n M. S. Bernstein, D. Tan, G. Smith, M. Czerwinski, and E. Horvitz. Personalization via friendsourcing. ACM Trans. Comput.-Hum. Interact., 17(2):6:1–6:28, May 2008.
n P.-S. Chen. English sentence structure and entity-relationship diagrams. Information Sciences, 29(2- 3):127 – 149, 1983.
n S. C. Weller. Cultural consensus theory: Applications and frequently asked questions. Field Methods, 19(4):339–368, 2007.
50
+References (contd.)
n I. Tuomi. Data is more than knowledge: implications of the reversed knowledge hierarchy for knowledge management and organizational memory. J. Manage. Inf. Syst., 16(3):103–117, Dec. 1999.
n S. Sekine. Named Entity: History and Future. 2004.
n W. Du and M. J. Atallah. Secure multi-party computation problems and their applications: a review and open problems. In Proceedings of the 2001 workshop on New security paradigms, NSPW ’01, pages 13–22, New York, NY, USA, 2001. ACM.
n Z. Syed, E. Viegas, and S. Parastatidis. Automatic discovery of semantic relations using mindnet. LREC, 2010.
51
+References (contd.)
n 21 Questions. http://apps.facebook.com/twentyoneq/.
n Mindnet. http://research.microsoft.com/apps/pubs/default.aspx?id=69647.
n Power of 10. http://en.wikipedia.org/wiki/Power of 10.
n Stanford pos tagger. http://nlp.stanford.edu/software/tagger.shtml.
52
+Related Publications
n D. Aggarwal, R. A. Khot, and V. Choppella. Power of Friends: When Friends Guess About their Friends’ Guess. In Proceedings of the SIGCHI conference on Human factors in computing systems, CHI ’13, Paris, France, 2013, ACM.
n D. Aggarwal, R. A. Khot, V. Varma, and V. Choppella. UPICK: Crowdsourcing Based Approach to Extract Relations Among Named Entities. In Proceedings of IndiaHCI, Pune, India, 2012 (Accepted as full paper).
n T. B. Dinesh, S. Uskudrali, S. Sastry, D. Aggarwal, and V. Choppella. Alipi: A framework for re-narrating web pages. In Proceedings of the International Cross- Disciplinary Conference on Web Accessibility, W4A ’12, pages 22:1-4, Lyon, France, 2012, ACM.
n D. Aggarwal, R. A. Khot, A. K. Dey, and V. Choppella. Crowd Consensus: Friendsourcing based approach to generate cultural beliefs. In preparation.
53
+Public Demonstrations
n Presented “Alipi: Making the web Inclusive and Accessible for All” in IIIT-Hyderabad R&D Showcase, Hyderabad, India, 2013.
n Presented “Crowdsourcing Based Approach to Extract Relations Among Named Entities” in OpenData Camp-Hyderabad Meet, Hyderabad, India, 2012.
n Poster presentations on “Power of Friends: Rethinking Games With a Purpose”, and “Alipi: A renarration Web” in IIIT-Hyderabad R&D Showcase, Hyderabad, India, 2012.
54
+
My family
Special Thanks
Reviewers
Prof. Anind Dey Prof. Vasudeva Varma
IIIT-H Faculty
Study participants
Prof. Venkatesh Choppella
Friends
55
Dr. T. B. Dinesh
Thank you!
For more details: [email protected]
http://pascal.iiit.ac.in/~deepti.aggarwal
Web 3.0… Web of opportunities! This is just the beginning!
56