A Game for
Crowdsourcing Data to
Build, Correct, and Label
Ontologies
Group 5:
Simin A. Karvigh
Sasan Tavakkol
Omid Davtalab
Instructor:
Professor Dennis McLeod
Spring 2015
CS 586 – Final Project Report – Group 5
1 | P a g e
A Game for Crowdsourcing Data to Build,
Correct, and Label Ontologies
Introduction The increasing need for constructing ontologies in specific fields of interest has always attracted the
researchers and scientists to come up with a reliable, fast and cost effective approaches. Despite many
existing autonomous algorithms for this aim, the human knowledge has been always an invaluable source
for ontology learning and validation of computations. However, employing experts for this purpose is not
only expensive, but also it is time consuming and error prone. Moreover, the reliability of the product
would be unknown. In the present research, we want to propose an android game which employs micro-
task crowdsourcing in order to construct ontologies. Similar approaches have been conducted by
Markotschi and Volker (2010) and Thaler et. al (2011); however, our approach has unique features such
as labeling the gathered relationships to the age, gender, location and other information of users. In the
next section we elaborate the problem and our proposed methodology.
Problem Statement The contribution of human knowledge for constructing the ontologies can lead to precise and valuable
models. For instance, an automatic ontology building may suggest a strong relationship between a general
word such as “have” and any other specific word. Although this obvious example will not happen by
removal of stop-words, but yet there exists similar errors in automatic construction of ontologies.
Moreover the relationship structure in an ontology may require modification over time. The other issue
about the current methods in constructing ontologies is ignoring the profile of the users such as their age
and location. For instance the meaning of football is different in the USA and Europe, and a word from an
industry like music may have a different meaning among teenagers and adults. These kind of errors and
limitation can be eliminated by our approach. We want to employ our Android app to correct the current
ontologies, construct ontologies in specific domains, and correlate meanings with features such as age
and location of people.
CS 586 – Final Project Report – Group 5
2 | P a g e
Figure 1 - Login Scene of ONTOOWL Game - Designed in Unity Game Engine
Methodology Our methodology is based on an Android game, in which the users must login (as shown in Figure 1) with
their Facebook account so we can have access to their general profile information such as age, gender,
location, occupation, etc.
The user’s profile is then considered for future analysis of the data generated by each user. The
information of all users are added into a database. Table1 illustrates a prototype of the discusses
database.
Key# Name Age Gender Location …
User1 Simin 27 F CA
User2 Omid 27 M CA
User3 Sasan 27 M CA
Table 1 – User Information Database Sample
CS 586 – Final Project Report – Group 5
3 | P a g e
Data from Crowdsourcing
(GAME)
Analysis
(XML – OWL)
Figure 3 - Unity Game Engine Interface - ONTOOWL Main Scene
The game suggests different available game worlds (ontology areas) such as Electrical Engineering,
Basketball and Psychology to the user to choose. The user’s Facebook account information can be used
to calculate the reliability ratio of that specific user for a chosen domain. For instance, a user who is a
basketball coach is more reliable than an Engineer to obtain information about Basketball ontology.
Finally, each user can save his/her name which is resulted in saving an XML file including all those
information that the user has created. The final goal is to analyze all those XML file and combine them as
an OWL file to construct the ontology.
Figure 2 – Data Analysis Process Overview
Each ontology is constructed based on a set of relevant word called “seeds”. Seeds can be derived via
search engines such as Google API or Encyclopedias such as Wikipedia as relative terms of a specific
ontology. We introduce a feature to our game that grows a tree from these seeds with contributions from
users.
Game Scheme The team has decided to develop the game in a powerful game engine called Unity in which so many well-
known games such as Angry Birds are designed. Unity Technologies developed this game engine platform
for 2D/3D video games proper for different platforms such as PCs, mobile devices and websites. In Figure
3, the main scene of ONTOOWL game is shown in Unity interface.
CS 586 – Final Project Report – Group 5
4 | P a g e
The game world consists of nodes as concepts (e.g. FC Barcelona) and edges as relationships (e.g. plays-
in). At the first stage nodes, each representing a concept, are spread out on the game map (Figure ). The
user must connect the related concepts with an appropriate relationship. In order to build this graph,
she/he can take a few actions on the map. He can remove a concept by striking through its node. It means
that this concept does not belong to this ontology. For instance user should remove node “Laptop” from
soccer world. He can drag and drop a concept, say A, onto another one, say B. This means that concept A
is part of concept B, and does not have a relationship with any other concept in the same level as B. For
instance, user should drag L. Messy and drop it on Barcelona. He can move a concept upward in the world
by drag and dropping on the up arrow sign on the upper right corner of the game. This means that this
concept does not belong to this subpage of the world. For instance if C. Ronaldo appears in Barcelona
page, the user must move Ronaldo upward, because he does not play for Barcelona. User can later drag
and drop Ronaldo on Real Madrid. The idea of subpages is very similar to organizing files into folders and
subfolders. It helps keeping the number of nodes in a map small enough. Figure 3 is screenshot of the
game developed for this project.
Figure 3 - ONTOOWL Game Screenshot -Spheres are representing the Seeds/Nodes, Red Ropes are representing "IS-PART-OF" Relations/Edges and Yellow ropes are representing “IS-A” relations/Edges
Each node has a list of properties. User can tap on each concept and fill in its properties. The first property
is the concepts type. For example, user can tap on the concept Atletico Madrid and set its type to “team”.
He can also add or remove synonyms to a concept. For instance, a user can tap on C. Ronaldo and add
CR7 to its synonyms but remove “The Phenomenon”. Because the later nickname is for the Brazilian
Ronaldo. The synonyms property of each concepts actually handles the “same-as” relationship. When a
user taps on a concept, he has the option to look up that term either in Google or in Wikipedia, which will
be opened in his browser. This helps users to easily get more knowledge on a concept, and add that
knowledge to our ontology. The initial game map prototype is shown in Figure 4.
CS 586 – Final Project Report – Group 5
5 | P a g e
Figure 4 - Initial game map for soccer world.
Users must connect concepts on the same page with appropriate edges (relationships). User can choose
the relationship he wants to add from the right sliding menu. The relationships are different for each
world in the game. For soccer ontology, relationships can be “plays for”, “is rival of”, “is in the league”.
These relations can be selected from a list provided in the game as shown in Figure 5.
Figure 5 - ONTOOWL Game Feature: Various Relationships available to choose from
After selecting a relationship type, the user can connect two concepts on the same page by tapping on
them. In early days of an ontology world there will be also an “other” option for the relationship name. If
the user selects “other” relationship he must appropriately label it. The labels for the “other” relationship
from all users in a specific ontology world will be analyzed and if appropriate they will be added to the
available options. Suppose that the relationship “is the coach of” is not available in the beginning of the
game but many users suggest this relationship between a few concepts in the soccer world. This
relationship will be added to the options based on analyzing the labels of the “other” relationship. When
the ontology world becomes mature enough, the “other” option for relationships will be removed and
only a few pre-defined options will be available. This feature enables the framework to identify existing
relationships in a specific ontology world automatically. Users can remove edges by tapping twice on that
edge.
The user can also suggest adding a certain word to the map. The suggested words will not be added to
the ontology world immediately, but they will be inserted into a pool. The words in the pool can be
Upvoted/Downvoted by the users. If a word achieves enough positive votes it will be added to the
CS 586 – Final Project Report – Group 5
6 | P a g e
ontology world as a new concept. If the word does not achieve enough positive votes or gains many
negative votes in a certain time (say one week) it will be removed from the pool.
As mentioned before, the coding part of the game has been done using c# Scripts available in Unity game-
engine. For each component existing in the game, such as different scenes, seeds/nodes, relations, the
environment, main camera being moved by keyboard and mouth directions and data related parts, there
is a built-in c# script that has been implemented by our team. All the scripts are written using
MonoDevelop which is an open source integrated development environment for different operating
systems. One example that has been written for user mouse clicks in MonoDevelop is shown in Figure 6.
Figure 6 - C# Script written in MonoDevelop for ONTOOWL in Unity
There is also another feature added to the game in case a user is not familiar with a certain node of the
game. Suppose that a user does not have any information about a specific player such as “Neymar” who
plays in Barcelona. The user can simply double click on the node and gain proper information either from
google by doing the double left click or from Wikipedia by doing the double right click. The logo of the
selected search engines are shown in figure 7.
Figure 7 - Logo of Search Engines available in ONTOOWL Game to search for Specific Seeds/Nodes Automatically
CS 586 – Final Project Report – Group 5
7 | P a g e
As mentioned before, the final output of each game that a user plays for a concept is an XML file which is
read and analyzed using a Java code implemented originally by our team. Below is an example of an XML
game which is derived from the game automatically by C# Scripts written by our team in Unity Game
Engine.
Figure 8 shows an example of a XML file saved after one game played by a user.
Figure 8 - XML File generated from ONTOOWL Automatically for a game played by a user
After this step, all the XML files are combined into one Hash-Table (as shown in Figure 9) in which Start
node (e.g. Messi), Relation (e.g. IS A), End Node (Player) are considered as Key and the frequency as the
key are saved.
Figure 9 - Data flow overview - Combining all XML Files into a Hash-Table using a Java code implanted originally by the Authors
CS 586 – Final Project Report – Group 5
8 | P a g e
The next step (as shown in Figure 10) is to convert the Hash-Table into the OWL file using the implemented
code by our team in Java and construct the Ontology.
Figure 10 - Data Flow overview: Converting the Hash-Table into the OWL file
Based on the data in Hash-Table the java code automatically inferred the classes, instances and data types
and object type relationships among them. This is considered as one of the novelties of our project that
all the classes, instances and the relations are inferred from the hash-table automatically. The final
overview of the Ontology made through the whole process can be observed and retrieved in different
interfaces and database graphs such as Protégé and Neo4jeg.
Protégé Protégé is an open source software providing different services such as ontology editing with a graphic
user interface and also a knowledge acquisition system. The graph presented below is from Protégé in
which our own ontology in soccer has been implemented and shown in Figure 11. Protégé can also analyze
the imported ontologies based on the existing deductive classifiers in which consistent models are
validated.
Figure 11 – Protégé Ontology Graphic Editor Interface
CS 586 – Final Project Report – Group 5
9 | P a g e
Neo4j Neo4j is another database graph that we imported our constructed sample ontology into it for retrieving
its graph and test several queries to retrieve filtered data based on different constraints. We deployed
our data on Neo4j which is powerful graph database. This database makes an easy and fast way to make
queries on the graph and store all related information. Each relationship has its reliability saved as an
attribute. We will also add the newly defined values, Contribution Coefficient for nodes and Local
Reliability for edges. Figure 12 - shows the constructed ontology based on 15 players who volunteered to
play our game.
Figure 12 - Constructed ontology for the soccer world.
The query language of Neo4j is called Cypher, which is SQL-like and powerful query language. For instance
the query “MATCH n1-[r1:isA]->n2 WHERE r1.reliability>50 RETURN n1,r1,n2” returns all the nodes like
n1 which has a relationship like r1 with another node like n2, where reliability of r1 is greater than 50%
(Figure 13).
Figure 13 - Example of a query using Cypher language on Neo4j
CS 586 – Final Project Report – Group 5
10 | P a g e
Rewarding Scheme Rewarding the users based on their constructed ontology is a challenging part of the game. Because we
do not have the “correct” ontology and therefore we cannot compare the work of a specific user with the
“correct” answers. We propose a dynamic scoring scheme for this game. Performance of each user is
based on the answers from all other users. It means that score of a user for a specific world, such as the
soccer world. In this way the user may notice a drop in his score, and therefore will try to correct his
mistakes in order to increase his score. Therefore each user gets an indirect feedback from all other users
on his performance. We can look into this feature as a kind of Swarm Intelligence, where the particles are
real humans. We hope this feature help the game to convergence to a single ontology.
As mentioned before it is very difficult to distinguish “correct” relationship from “incorrect” ones in order
to score the performance of the player. In the early prototype of the game which was presented in the
class, we relied on the reliability of each relationship to reward the player. Reliability of a relationship was
defined as the ratio of number of players suggested that relationship to total number of players. So if 90
players out of 100 suggested “Messy, Plays in, Barcelona”, the reliability of this relationship is 90% . We
decided to reward any player who suggests a relationship with reliability more than 50% but give a penalty
to players who added a relationship with reliability less than 50%. After gathering the data from 15 players
as the first experiment on the game, we noticed that this rewarding policy will not work correctly. For
instance there was a relationship in our data like “Buffon, is part of, Juventus” with reliability of 44%.
Based on the aforementioned policy we should give a penalty to anyone who suggests this relationship,
however Buffon does play for Juventus and is part of that team! The problem is that not all the players
contribute in all concepts. Many of our players (56%) of them did not know Buffon or for any other reason
did not contribute to find the relationship of Buffon with other concepts. We solved this problem by using
a new variable instead of the total players for the denominator of the ratio.
For each node we take the average of reliabilities of relationships coming out of that node, and call it as
the contribution coefficient (CC) of the node. Then we divide the reliability of each relationship by the
CC of its start node and call it as local reliability (LR). Players who add a relationship with LR>0.5 are
rewarded and players who add relationships with LR<0.5 are penalized, proportionally. For instance 93%
of players suggested that “Buffon, is a, soccer player” and 44% suggested that “Buffon, is part of,
Juventus”. Therefore CC for Buffon is (93%+44%)/2=68% and the LR of the relationship “Buffon, is part
of, Juventus” is now 0.64 which is greater than 0.5. As another example, 81% players suggested “Tevez,
is a, soccer player”, 31% suggested “Tevez, is part of, Juventus”, and 6% suggested that “Tevez, is part
of, Real Madrid”. The CC value for Tevez becomes 39%. The LC for “Tevez, is part of, Juventus” is now
CS 586 – Final Project Report – Group 5
11 | P a g e
0.79 but the LC for “Tevez, is part of, Real Madrid” is 0.15. We can now reward players suggesting the
former relationship and give a penalty to players suggesting the latter one. Tevez does play for Juventus.
Feature Steps Our current focus is on adding different features to game to make it a more powerful app to be able to
construct different ontologies by means of crowdsourcing. As of now, the game has been played by 20
different students and their data has been analyzed to help us come up with pros and cons of the current
implemented game. Our main goal is to publish a paper for our proposed framework and developed game
with professor and instructor’s help and contribution. We believe this methodology can help to construct
a lot of different ontologies in various areas in a cost and time efficient way.
References
Markotschi, T., & Völker, J. (2010). Guesswhat?!–human intelligence for mining linked data.
Thaler, S., Simperl, E. P. B., & Siorpaes, K. (2011). SpotTheLink: A Game for Ontology
Alignment. Wissensmanagement, 182, 246-253.
Dragan Gašević; Dragan Djurić; Vladan Devedžić (2009). Model Driven Engineering and Ontology
Development (2nd ed.). Springer. p. 194. ISBN 978-3-642-00282-3.
Todd Hoff (June 13, 2009). "Neo4j - a Graph Database that Kicks Buttox". High Scalability. Possibility
Outpost. Retrieved February 17, 2010.