12
A Game for Crowdsourcing Data to Build, Correct , and Label Ontologie s Group 5: Simin A. Karvigh Sasan Tavakkol Omid Davtalab Instructor: Professor Dennis McLeod Spring 2015

Final Project Report_Group 5

Embed Size (px)

Citation preview

Page 1: Final Project Report_Group 5

A Game for

Crowdsourcing Data to

Build, Correct, and Label

Ontologies

Group 5:

Simin A. Karvigh

Sasan Tavakkol

Omid Davtalab

Instructor:

Professor Dennis McLeod

Spring 2015

Page 2: Final Project Report_Group 5

CS 586 – Final Project Report – Group 5

1 | P a g e

A Game for Crowdsourcing Data to Build,

Correct, and Label Ontologies

Introduction The increasing need for constructing ontologies in specific fields of interest has always attracted the

researchers and scientists to come up with a reliable, fast and cost effective approaches. Despite many

existing autonomous algorithms for this aim, the human knowledge has been always an invaluable source

for ontology learning and validation of computations. However, employing experts for this purpose is not

only expensive, but also it is time consuming and error prone. Moreover, the reliability of the product

would be unknown. In the present research, we want to propose an android game which employs micro-

task crowdsourcing in order to construct ontologies. Similar approaches have been conducted by

Markotschi and Volker (2010) and Thaler et. al (2011); however, our approach has unique features such

as labeling the gathered relationships to the age, gender, location and other information of users. In the

next section we elaborate the problem and our proposed methodology.

Problem Statement The contribution of human knowledge for constructing the ontologies can lead to precise and valuable

models. For instance, an automatic ontology building may suggest a strong relationship between a general

word such as “have” and any other specific word. Although this obvious example will not happen by

removal of stop-words, but yet there exists similar errors in automatic construction of ontologies.

Moreover the relationship structure in an ontology may require modification over time. The other issue

about the current methods in constructing ontologies is ignoring the profile of the users such as their age

and location. For instance the meaning of football is different in the USA and Europe, and a word from an

industry like music may have a different meaning among teenagers and adults. These kind of errors and

limitation can be eliminated by our approach. We want to employ our Android app to correct the current

ontologies, construct ontologies in specific domains, and correlate meanings with features such as age

and location of people.

Page 3: Final Project Report_Group 5

CS 586 – Final Project Report – Group 5

2 | P a g e

Figure 1 - Login Scene of ONTOOWL Game - Designed in Unity Game Engine

Methodology Our methodology is based on an Android game, in which the users must login (as shown in Figure 1) with

their Facebook account so we can have access to their general profile information such as age, gender,

location, occupation, etc.

The user’s profile is then considered for future analysis of the data generated by each user. The

information of all users are added into a database. Table1 illustrates a prototype of the discusses

database.

Key# Name Age Gender Location …

User1 Simin 27 F CA

User2 Omid 27 M CA

User3 Sasan 27 M CA

Table 1 – User Information Database Sample

Page 4: Final Project Report_Group 5

CS 586 – Final Project Report – Group 5

3 | P a g e

Data from Crowdsourcing

(GAME)

Analysis

(XML – OWL)

Figure 3 - Unity Game Engine Interface - ONTOOWL Main Scene

The game suggests different available game worlds (ontology areas) such as Electrical Engineering,

Basketball and Psychology to the user to choose. The user’s Facebook account information can be used

to calculate the reliability ratio of that specific user for a chosen domain. For instance, a user who is a

basketball coach is more reliable than an Engineer to obtain information about Basketball ontology.

Finally, each user can save his/her name which is resulted in saving an XML file including all those

information that the user has created. The final goal is to analyze all those XML file and combine them as

an OWL file to construct the ontology.

Figure 2 – Data Analysis Process Overview

Each ontology is constructed based on a set of relevant word called “seeds”. Seeds can be derived via

search engines such as Google API or Encyclopedias such as Wikipedia as relative terms of a specific

ontology. We introduce a feature to our game that grows a tree from these seeds with contributions from

users.

Game Scheme The team has decided to develop the game in a powerful game engine called Unity in which so many well-

known games such as Angry Birds are designed. Unity Technologies developed this game engine platform

for 2D/3D video games proper for different platforms such as PCs, mobile devices and websites. In Figure

3, the main scene of ONTOOWL game is shown in Unity interface.

Page 5: Final Project Report_Group 5

CS 586 – Final Project Report – Group 5

4 | P a g e

The game world consists of nodes as concepts (e.g. FC Barcelona) and edges as relationships (e.g. plays-

in). At the first stage nodes, each representing a concept, are spread out on the game map (Figure ). The

user must connect the related concepts with an appropriate relationship. In order to build this graph,

she/he can take a few actions on the map. He can remove a concept by striking through its node. It means

that this concept does not belong to this ontology. For instance user should remove node “Laptop” from

soccer world. He can drag and drop a concept, say A, onto another one, say B. This means that concept A

is part of concept B, and does not have a relationship with any other concept in the same level as B. For

instance, user should drag L. Messy and drop it on Barcelona. He can move a concept upward in the world

by drag and dropping on the up arrow sign on the upper right corner of the game. This means that this

concept does not belong to this subpage of the world. For instance if C. Ronaldo appears in Barcelona

page, the user must move Ronaldo upward, because he does not play for Barcelona. User can later drag

and drop Ronaldo on Real Madrid. The idea of subpages is very similar to organizing files into folders and

subfolders. It helps keeping the number of nodes in a map small enough. Figure 3 is screenshot of the

game developed for this project.

Figure 3 - ONTOOWL Game Screenshot -Spheres are representing the Seeds/Nodes, Red Ropes are representing "IS-PART-OF" Relations/Edges and Yellow ropes are representing “IS-A” relations/Edges

Each node has a list of properties. User can tap on each concept and fill in its properties. The first property

is the concepts type. For example, user can tap on the concept Atletico Madrid and set its type to “team”.

He can also add or remove synonyms to a concept. For instance, a user can tap on C. Ronaldo and add

CR7 to its synonyms but remove “The Phenomenon”. Because the later nickname is for the Brazilian

Ronaldo. The synonyms property of each concepts actually handles the “same-as” relationship. When a

user taps on a concept, he has the option to look up that term either in Google or in Wikipedia, which will

be opened in his browser. This helps users to easily get more knowledge on a concept, and add that

knowledge to our ontology. The initial game map prototype is shown in Figure 4.

Page 6: Final Project Report_Group 5

CS 586 – Final Project Report – Group 5

5 | P a g e

Figure 4 - Initial game map for soccer world.

Users must connect concepts on the same page with appropriate edges (relationships). User can choose

the relationship he wants to add from the right sliding menu. The relationships are different for each

world in the game. For soccer ontology, relationships can be “plays for”, “is rival of”, “is in the league”.

These relations can be selected from a list provided in the game as shown in Figure 5.

Figure 5 - ONTOOWL Game Feature: Various Relationships available to choose from

After selecting a relationship type, the user can connect two concepts on the same page by tapping on

them. In early days of an ontology world there will be also an “other” option for the relationship name. If

the user selects “other” relationship he must appropriately label it. The labels for the “other” relationship

from all users in a specific ontology world will be analyzed and if appropriate they will be added to the

available options. Suppose that the relationship “is the coach of” is not available in the beginning of the

game but many users suggest this relationship between a few concepts in the soccer world. This

relationship will be added to the options based on analyzing the labels of the “other” relationship. When

the ontology world becomes mature enough, the “other” option for relationships will be removed and

only a few pre-defined options will be available. This feature enables the framework to identify existing

relationships in a specific ontology world automatically. Users can remove edges by tapping twice on that

edge.

The user can also suggest adding a certain word to the map. The suggested words will not be added to

the ontology world immediately, but they will be inserted into a pool. The words in the pool can be

Upvoted/Downvoted by the users. If a word achieves enough positive votes it will be added to the

Page 7: Final Project Report_Group 5

CS 586 – Final Project Report – Group 5

6 | P a g e

ontology world as a new concept. If the word does not achieve enough positive votes or gains many

negative votes in a certain time (say one week) it will be removed from the pool.

As mentioned before, the coding part of the game has been done using c# Scripts available in Unity game-

engine. For each component existing in the game, such as different scenes, seeds/nodes, relations, the

environment, main camera being moved by keyboard and mouth directions and data related parts, there

is a built-in c# script that has been implemented by our team. All the scripts are written using

MonoDevelop which is an open source integrated development environment for different operating

systems. One example that has been written for user mouse clicks in MonoDevelop is shown in Figure 6.

Figure 6 - C# Script written in MonoDevelop for ONTOOWL in Unity

There is also another feature added to the game in case a user is not familiar with a certain node of the

game. Suppose that a user does not have any information about a specific player such as “Neymar” who

plays in Barcelona. The user can simply double click on the node and gain proper information either from

google by doing the double left click or from Wikipedia by doing the double right click. The logo of the

selected search engines are shown in figure 7.

Figure 7 - Logo of Search Engines available in ONTOOWL Game to search for Specific Seeds/Nodes Automatically

Page 8: Final Project Report_Group 5

CS 586 – Final Project Report – Group 5

7 | P a g e

As mentioned before, the final output of each game that a user plays for a concept is an XML file which is

read and analyzed using a Java code implemented originally by our team. Below is an example of an XML

game which is derived from the game automatically by C# Scripts written by our team in Unity Game

Engine.

Figure 8 shows an example of a XML file saved after one game played by a user.

Figure 8 - XML File generated from ONTOOWL Automatically for a game played by a user

After this step, all the XML files are combined into one Hash-Table (as shown in Figure 9) in which Start

node (e.g. Messi), Relation (e.g. IS A), End Node (Player) are considered as Key and the frequency as the

key are saved.

Figure 9 - Data flow overview - Combining all XML Files into a Hash-Table using a Java code implanted originally by the Authors

Page 9: Final Project Report_Group 5

CS 586 – Final Project Report – Group 5

8 | P a g e

The next step (as shown in Figure 10) is to convert the Hash-Table into the OWL file using the implemented

code by our team in Java and construct the Ontology.

Figure 10 - Data Flow overview: Converting the Hash-Table into the OWL file

Based on the data in Hash-Table the java code automatically inferred the classes, instances and data types

and object type relationships among them. This is considered as one of the novelties of our project that

all the classes, instances and the relations are inferred from the hash-table automatically. The final

overview of the Ontology made through the whole process can be observed and retrieved in different

interfaces and database graphs such as Protégé and Neo4jeg.

Protégé Protégé is an open source software providing different services such as ontology editing with a graphic

user interface and also a knowledge acquisition system. The graph presented below is from Protégé in

which our own ontology in soccer has been implemented and shown in Figure 11. Protégé can also analyze

the imported ontologies based on the existing deductive classifiers in which consistent models are

validated.

Figure 11 – Protégé Ontology Graphic Editor Interface

Page 10: Final Project Report_Group 5

CS 586 – Final Project Report – Group 5

9 | P a g e

Neo4j Neo4j is another database graph that we imported our constructed sample ontology into it for retrieving

its graph and test several queries to retrieve filtered data based on different constraints. We deployed

our data on Neo4j which is powerful graph database. This database makes an easy and fast way to make

queries on the graph and store all related information. Each relationship has its reliability saved as an

attribute. We will also add the newly defined values, Contribution Coefficient for nodes and Local

Reliability for edges. Figure 12 - shows the constructed ontology based on 15 players who volunteered to

play our game.

Figure 12 - Constructed ontology for the soccer world.

The query language of Neo4j is called Cypher, which is SQL-like and powerful query language. For instance

the query “MATCH n1-[r1:isA]->n2 WHERE r1.reliability>50 RETURN n1,r1,n2” returns all the nodes like

n1 which has a relationship like r1 with another node like n2, where reliability of r1 is greater than 50%

(Figure 13).

Figure 13 - Example of a query using Cypher language on Neo4j

Page 11: Final Project Report_Group 5

CS 586 – Final Project Report – Group 5

10 | P a g e

Rewarding Scheme Rewarding the users based on their constructed ontology is a challenging part of the game. Because we

do not have the “correct” ontology and therefore we cannot compare the work of a specific user with the

“correct” answers. We propose a dynamic scoring scheme for this game. Performance of each user is

based on the answers from all other users. It means that score of a user for a specific world, such as the

soccer world. In this way the user may notice a drop in his score, and therefore will try to correct his

mistakes in order to increase his score. Therefore each user gets an indirect feedback from all other users

on his performance. We can look into this feature as a kind of Swarm Intelligence, where the particles are

real humans. We hope this feature help the game to convergence to a single ontology.

As mentioned before it is very difficult to distinguish “correct” relationship from “incorrect” ones in order

to score the performance of the player. In the early prototype of the game which was presented in the

class, we relied on the reliability of each relationship to reward the player. Reliability of a relationship was

defined as the ratio of number of players suggested that relationship to total number of players. So if 90

players out of 100 suggested “Messy, Plays in, Barcelona”, the reliability of this relationship is 90% . We

decided to reward any player who suggests a relationship with reliability more than 50% but give a penalty

to players who added a relationship with reliability less than 50%. After gathering the data from 15 players

as the first experiment on the game, we noticed that this rewarding policy will not work correctly. For

instance there was a relationship in our data like “Buffon, is part of, Juventus” with reliability of 44%.

Based on the aforementioned policy we should give a penalty to anyone who suggests this relationship,

however Buffon does play for Juventus and is part of that team! The problem is that not all the players

contribute in all concepts. Many of our players (56%) of them did not know Buffon or for any other reason

did not contribute to find the relationship of Buffon with other concepts. We solved this problem by using

a new variable instead of the total players for the denominator of the ratio.

For each node we take the average of reliabilities of relationships coming out of that node, and call it as

the contribution coefficient (CC) of the node. Then we divide the reliability of each relationship by the

CC of its start node and call it as local reliability (LR). Players who add a relationship with LR>0.5 are

rewarded and players who add relationships with LR<0.5 are penalized, proportionally. For instance 93%

of players suggested that “Buffon, is a, soccer player” and 44% suggested that “Buffon, is part of,

Juventus”. Therefore CC for Buffon is (93%+44%)/2=68% and the LR of the relationship “Buffon, is part

of, Juventus” is now 0.64 which is greater than 0.5. As another example, 81% players suggested “Tevez,

is a, soccer player”, 31% suggested “Tevez, is part of, Juventus”, and 6% suggested that “Tevez, is part

of, Real Madrid”. The CC value for Tevez becomes 39%. The LC for “Tevez, is part of, Juventus” is now

Page 12: Final Project Report_Group 5

CS 586 – Final Project Report – Group 5

11 | P a g e

0.79 but the LC for “Tevez, is part of, Real Madrid” is 0.15. We can now reward players suggesting the

former relationship and give a penalty to players suggesting the latter one. Tevez does play for Juventus.

Feature Steps Our current focus is on adding different features to game to make it a more powerful app to be able to

construct different ontologies by means of crowdsourcing. As of now, the game has been played by 20

different students and their data has been analyzed to help us come up with pros and cons of the current

implemented game. Our main goal is to publish a paper for our proposed framework and developed game

with professor and instructor’s help and contribution. We believe this methodology can help to construct

a lot of different ontologies in various areas in a cost and time efficient way.

References

Markotschi, T., & Völker, J. (2010). Guesswhat?!–human intelligence for mining linked data.

Thaler, S., Simperl, E. P. B., & Siorpaes, K. (2011). SpotTheLink: A Game for Ontology

Alignment. Wissensmanagement, 182, 246-253.

Dragan Gašević; Dragan Djurić; Vladan Devedžić (2009). Model Driven Engineering and Ontology

Development (2nd ed.). Springer. p. 194. ISBN 978-3-642-00282-3.

Todd Hoff (June 13, 2009). "Neo4j - a Graph Database that Kicks Buttox". High Scalability. Possibility

Outpost. Retrieved February 17, 2010.