Upload
ricardo-acosta
View
217
Download
5
Embed Size (px)
Citation preview
Eliciting ontology components from semantic specific-domain maps: towards thenext generation web
JRG PulidoTelematics Faculty
University of ColimaColima, [email protected]
SBF FloresTelematics Faculty
University of ColimaColima, Mexico
RCM RamırezElectrical Engineering Department
UAM-IztapalapaDF, Mexico
RA DıazTelematics Faculty
University of ColimaColima, [email protected]
Abstract—The present-day web can be broken into smallerpieces that slowly but surely will be transformed into semanticweb pieces. This paper describes an approach for elicitingontology components by using specific-domain maps. Theknowledge contained in a particular domain, any kind ofdigital archive, is portrayed by assembling and displayingits ontology components. The novelty of our approach isto offer clustering and visualization features not present inother techniques helping in the semi-automatic constructionof ontologies by identifying components from digital archives.We describe here a case study that applies our approachon an academic domain. The specific-domain maps generatedare presented and a conceptualized ontology created with theontology componentes is introduced. Further processing maybe carried out on the extracted knowledge to be embeddedon the semantic web pages for software agents to use. Theultimate kind of semantic web software applications that wewill be seeing are intelligent software agents searching the webto infer and extract new knowledge from the digital archives.
Keywords-Semantic web; ontology learning; self-organizingmaps
I. INTRODUCTION
The present-day web and its always growing amount ofweb pages [1], [2] must be broken into smaller pieces thatslowly but surely will be transformed into semantic webpieces. Reaching specific web pages is a massive challengehaving into account that current search engines only containa small percentage of the total of documents in the web.Furthermore, this small amount of reachable documentsis in an unstructured way, meaning that software agentsunderstand actually nothing about the actual content ofthem. In other words, these documents can be read but notundestood [3]. It would be useful to develop representationsof the information contained in digital archives and createintelligent systems to expose pieces of ontology components[4], [5], [6], [7]. In this paper we describe an approach forhelping in the eliciting process of ontology components forsuch web sites.The remainder of this paper is organized as follows. Insection II some related work is introduced. Our approachis outlined in section III. Results are presented in sectionIV, and conclusions and further work in section V.
II. RELATED WORK
One of the most important challenges that the semanticweb poses to the research community is the mapping oflarge amounts of on-line unstructured knowledge, suitablefor humans, to formal representation of knowledge [10],[11]. In the next subsections we have a brief look at somework done on ontology as well as the so called semanticmaps.
A. Knowledge representation
Representing knowledge about a domain as an ontologyis a challenging process, not only on account of requiringa lot of computing power, but also because it is difficult toachieve in a consistent and rigorous way. It is easy to loseconsistency and to introduce ambiguity and confusion [12].Nevertheless, ontologies are a useful form of knowledgerepresentation which may be used to support the design anddevelopment of intelligent software applications and expertsystems. Recently, ontology has been used for enhancinglearning objects [14], and helping with instructional de-sign tasks [15]. However, web ontologies can take ratherdifferent forms. In [16] the use of the so-called SimpleHTML Ontology Extension (SHOE) in a real world internetapplication was described. This approach allowed authorsto add semantic content to web pages, relating the contextto common ontologies that provide contextual informationabout the domain. Most tag-annotated web pages tend tocategorize concepts, therefore there is no need for complexinference rules to perform automatic classification. One ofthe most exciting uses of an ontology, in the context ofthe semantic web, is to support the development of agent-based systems for web searching [17], [18]. New approaches,including advanced ontology languages have been proposed,such as OIL, DAML which later evolved into OWL. Thelatter now an standard [19], [20], [21], [22], [23].
B. Towards specific-domain maps
Self-organizing maps (SOM) is the artificial neural net-work technique we have used to produce specific-domain
2009 Latin American Web Congress
978-0-7695-3856-3/09 $26.00 © 2009 IEEE
DOI 10.1109/LA-WEB.2009.27
222
2009 Latin American Web Congress
978-0-7695-3856-3/09 $26.00 © 2009 IEEE
DOI 10.1109/LA-WEB.2009.27
224
University
School
Researcher
Publication
part-of
has
Researchgroup
ArticleBook
has
part-ofhas
...is-a
has
...First name
...
is-a
Person
Figure 1. (Left) Basic taxonomy for an academic domain [8]. (Right) Embedded knowledge into a web page using an early ontology approach [9].
maps. SOM are of benefit to quite a few areas. For in-stance, they have been used to carry out mapping genomesequence tasks [24], and classify volcanic events [25], anddocument organization [26]. Perhaps the most well-knownSOM project is [27], [28], where the results of applyingthe WEBSOM2, a document organization, searching andbrowsing system, to a set of about 7 million electronicpatent abstracts is described. In this case, a document map ispresented as a series of HTML pages facilitating exploration.In [29] a distributed architecture for the extraction of meta-data from WWW documents was proposed and is particu-larly suited for repositories of historical publications. Thisinformation extraction system is based on semi-structureddata analysis. The system output is a meta-data objectcontaining a concise representation of the correspondingpublication and its components. In that research gatherershave been designed as a combination of a parser, based ona context-free grammar, and a web robot, which navigatesthe links contained in the basic document type to inferthe document structure of the entire site. These meta-dataobjects can be interchanged with other web agents, thenclassified and organized.
III. METHODS
The idea of combining ontologies and knowledge mapshas motivated our work. For the semantic web to becomea reality, we need to transform the current web into a webwhere software agents are able to negotiate and carry outtrivial tasks for us. Doing this manually, would mean abottleneck for the semantic web. We need software toolsthat help us accomplish this enterprise. Our software iswritten in Java, which offers robust, multiplatform, and easynetworking functionalities. Being a object-oriented program-ming language, it also facilitates reuse as well. Java and its
various APIs are powerful enough for constructing ontologysoftware systems.
Our suite of ontology learning tools consists basically oftwo applications: Spade and Grubber [8]. The former pre-processes html pages and creates a document space. The lat-ter is fed with the document space and produces knowledgemaps that allow us visualize ontology components containedfrom a digital archive. They may later be organized as a setof Entities, Relations, and Functions. Semantic problemsolvers use this triad for inferring new data from knowledgebases [30], [31], [32], [33].
A. The Algorithm
SOM can be viewed as a model of unsupervised learningand an adaptive knowledge representation scheme. Adaptivemeans that at each iteration a unique sample is taken intoaccount to update the weight vector of a neighbourhood ofneurons [34], [35]. Adaptation of the model vectors takeplace according to the following equation:
mi(t + 1) = mi(t) + hci(t)[x(t) − mi(t)] (1)
where t ∈ N is the discrete time coordinate, mi ∈ �n is anode, and hci(t) is a neighbourhood function. The latter hasa central role as it acts as a smoothing kernel defined overthe lattice points and defines the stiffness of the surface tobe fitted to the data points. This function may be constantfor all the cells in the neighbourhood and zero elsewhere.A common neighbourhood kernel that describes a naturalmapping and that is used for this purpose can be written interms of the Gaussian function:
hci(t) = α(t) exp(−||rc − ri||22σ2(t)
) (2)
where rc, ri ∈ �2 are the locations of the winner and aneighbouring node on the grid, α(t) is the learning rate (0 ≤
223225
tutorial
deparment
people
modules
slides
RRVlectures
comittee
research
modules
projects
staff
papers dissertations
coursework
research
tutorials
theses
lectures
meetings
department
people
teaching
researchprojects
proposals
surveys
lecturesmodules
people
publications
labs
Figure 2. Entity maps. (Left) Full entity map. (Right) Reduced entity map
α(t) ≤ 1), and σ(t) is the width of the kernel. Both α(t)and σ(t) decrease monotonically. The major phases of ourapproach are as follows:
1) Produce a document space A document space(docuspace) of feature vectors is created by1spade.This docuspace is produced with the data coming fromthe specific domain digital archive [8].
2) Construct the SOM A second software2tool,grubber, is fed and trained with the docuspaces andknowledge maps are then created.
Ontology components can be visualized clustered togetherfrom the knowledge maps created. Preliminary results weresurprisingly close to our intuitive expectations. After this,some other ontology tools such as editors can be used toorganize this knowledge. Then, it can be embedded into thedigital archive (fig.1) where it was extracted from by meansof any of the ontology languages that exist [20].
IV. THE ACADEMIC DOMAIN CASE
In this section the results of applying our approach toan academic3domain are presented. It also includes a briefdiscussion of the entity maps and the attribute maps createdby grubber. Later, an example of how to use the ontologycomponents identified is outlined.
A. Entity map
These maps are created with the docuspace transposedview. In this view, the columns (URLs of the domain) ofthe dataset determine the dimensions of the maps. In figure2 the entity maps, full and reduced, created by grubber arepresented.
1http://docente.ucol.mx/˜jrgp/en/ol-approach.html2http://docente.ucol.mx/˜jrgp/en/ol-approach.html3http://www.nottingham.ac.uk/cs
The full entity map: A few clusters can be identifiedat a glance by means of the main feature and subfeaturessection of grubber and background knowledge. For instance:department, lectures, modules, research, tutorial, papers, andsome other clusters related to the domain. A deeper analysisreveals that some other clusters exist within the maps. Butthere is a problem, we strive to find them for the maps do notexhibit a significant coloring level. Dark areas in the mapsare the result of training the artificial neural network withfeature input vectors containing a large number of zeros.These do not contribute to an informative coloring of themaps.
B. The attribute map
These maps are created with the docuspace front view. Inthis view, the columns (relevant terms from the domain) ofthe dataset determine the dimensions of the maps. In figure3 the entity maps, full and reduced, created by grubber arepresented.
The full attribute map: A few clusters again can beidentified at a glance by means of the main feature andsubfeatures section of grubber and background knowledge.For instance: people, department, research, teaching, andother belonging to this domain. Note that the coloring ofthese maps is more informative in comparison with theentity maps. Keep in mind that for the attribute maps we areusing the columns of the front view of the docuspace whosedimensionality is smaller than the transposed view used bythe entity maps earlier. In other words, there are less URLsin this case than terms in the dataset. A significant coloringhas to do with the number of zeros in the datasets used totrain the network.It must be said that identifying these clusters has not beeneasy. In order to obtain better coloring from the knowledgemaps, a dimensionality reduction was also carried out. Theresults are described as in turn.
224226
capabilitbool
bowlannual
cartoonburchet
attainberm
deltacumul
commercertif
countdanger
developdiscuss
britainchow
disposcongress
datasetcylind
amazabba
atgaxc
creaturdead
dmtargu
biggestawar
cursandrew
birtwhistashle
computerscience
functionalprogramminglanguages
visualimagepattern
headconsultorassistant
industryeurope
academicinstitution
artificialintelligence
groupsupervisormember
studentdevelop
timetablescheduling
planschool
professorphd
unix
nottinghamuniversity
binarysearchtree
activitiesgoals
java
math
documenttexteducation
facultydepartment
guideroadmap
conferencejournal
database
methodclassobject
lecturesprojectstutorialsteaching
(a) Full attribute map (b) Reduced attribute map
Figure 3. Attribute maps.
C. Ontology components from reduced docuspace maps
The following clusters can be identified from the mapsby means of the main feature and subfeatures bothfrom grubber, domain experts and background knowledge(figs.2,3). It is clearly seen that an improvement on knowl-edge map coloring has been achived by reducing the di-mensionality of the docuspace. The clusters of the specific-domain semantic maps are described in turn.
Department: There are a number of clusters on themap about this topic. For example terms such as science,computer make up one cluster. University, Nottingham formanother cluster on the map. Education, faculty, departmentconstitute another cluster. Academic, institution constituteanother cluster. Other related clusters include terms such asguide, road, map, and turn, plan, school.
Roles: Some clusters about Roles that people playin the school are bracketed together. Terms such as head,consultor, assistant compose a cluster. Group, supervisor,member constitute another cluster. Professor, phd formanother cluster. The terms student, develop form anothercluster.
Teaching: A number of clusters on this topic have beenidentified. For example clusters about Java, programming,data structures, mathematics, databases, artificial intelli-gence have been spotted all over the map. As mentioned,teaching relates to a number of other clusters such as:modules, lectures, coursework, labs, projects.
Research: Terms related to some of the interests ofthe research groups of the school have been identified:Visual, image, pattern, functional, programming, languages,scheduling, timetable, heuristics, document, text, xml, inter-face.
Publications: Two nearby clusters about Publicationshave been spotted. One of them includes terms such asconference, international. The other cluster includes termssuch as journal, heuristics.
Industry: A cluster with the terms industry, europe,management has been also identified.The experiments we have presented in this section showthat ours is an effective approach to analyse domains andto identify ontology components. With the aid of domainexperts, then we are able to conceptualize domain-specificontologies. We have reported the use of our approach onother domains [36]. The next step is to use other ontologytools to organize and embed this knowledge into web pages.Figure 4 shows a basic taxonomy conceptualized with theontology components that have been identified from theactual domain by means of the specific-domain semanticmaps, background knowledge, and most importantly, domainexperts.
V. CONCLUSIONS
The present-day web can be broken into smaller piecesthat slowly but surely will be transformed into semantic webpieces. Should that be done manually, then the semantic webwill not become a really in the next couple of decades dueto this bottleneck. Ontology learning tools are essential forthe realization of the semantic web for the job to be doneis quite complex [37], [38]. This paper is an example ofhow ontology learning tools, along with some other ontologytools, can be used to construct specific-domain ontologies.Experts do not have to start constructing specific-domainontologies from scratch anymore. Rather, they can now useappropriate ontology learning tools such as the knowledgemaps we have introduced to help in the eliciting process ofspecific-domain ontology components. This kind of softwaretools help speed up the ontology creation process and othersoftware tools such as ontology editors help in validating theontology created always under the supervision of experts andthe ontology engineer [39]. Must be said that domain expertsare always required in order to obtain a desirable level ofaccuracy in the ontology.
225227
academicinstitution
nottinghamuniversity
people
professorstudent
researchgroup
IPI FOP
computerscience
... ...
...
...
is-a school
is-a
Attributes maybe shared betweeninstances
part-of
is-apart-ofhasother relationships
instances
attributes
part-of
part-of
has
instances
attributes
part-of
part-of
has
instances
attributes
part-of
part-of
has
is-a
is-a
is-a
is-a
Other academicinstitutions exist
Other schools exist
Instances may belongto several groups
Figure 4. Conceptualized ontology.
Further research avenues we are working on involve theuse of hybrid systems in such a way that by combining clus-tering techniques with the already trained feature vectors wemay refine the classification of the knowledge componentesfrom the domain [40], [41]. The ultimate kind of softwareapplications that we will be seeing are intelligent softwareagents searching the semantic web to infer and extract newknowledge from the digital archives.
REFERENCES
[1] P. Schneider and D. Fensel, “Layering the Semantic Web:Problems and directions,” in 1st Int.Semantic Web Conf., ser.LNCS, I. Horrocks and J. Hendler, Eds., vol. 2342. Springer-Verlag, Berlin, 2002, pp. 16–29.
[2] Y. Yang et al., “A study of approaches to hypertext catego-rization,” J.Intelligent Information Systems, vol. 18, no. 2/3,pp. 219–241, 2002.
[3] T. Berners-Lee et al., “The Semantic Web,” Scientific Amer-ican, vol. 284, no. 5, pp. 34–43, May 2001.
[4] JRG Pulido et al., “Identifying ontology components fromdigital archives for the semantic web,” in IASTED Advancesin Computer Science and Technology (ACST), 2006, pp. 1–6,CD edition.
[5] B. Berent et al., “A roadmap for web mining: from webto semantic web,” in First European Web Mining Forum(EWMF), ser. LNCS, B. Berent et al., Eds., vol. 3209.Springer, 2004, pp. 1–22.
[6] JRG Pulido et al., “Semi-automatic derivation of specific-domain ontologies for the semantic web,” in 5th MexicanIntl Conf on Artificial Intelligence, Gelbukh and Reyes-Garca,Eds. IEEE Computer Soc.Press, Los Alamitos, 2006, pp.253–261.
[7] F. Ciravegna et al., “Learning to harvest information for thesemantic web,” in The Semantic Web: Research and Applica-tions: First European Semantic Web Symposium, ESWS 2004,ser. LNCS, C. Bussler et al., Eds., vol. 3053. Springer, 2004,pp. 312–326.
[8] D. Elliman and JRG Pulido, “Visualizing ontology com-ponents through self-organizing maps,” in 6th InternationalConference on Information Visualization (IV02), London, UK,D. Williams, Ed. IEEE Computer Soc.Press, Los Alamitos,2002, pp. 434–438.
[9] R. Benjamins et al., “(KA)2: Building ontologies for theinternet: A midterm report,” Int.J.Human-Computer Studies,vol. 51, no. 3, pp. 687–712, 1999.
[10] L. Crow and N. Shadbolt, “Extracting focused knowledgefrom the Semantic Web,” Int.J.Human-Computer Studies,vol. 54, pp. 155–184, 2001.
[11] D. Mladenic and M. Grobelnick, “Mapping documents ontoweb page ontology,” in Web Mining: From Web to Seman-tic Web: First European Web Mining Forum, ser. LNCS,B. Berendt et al., Eds., vol. 3209. Springer, 2004, pp. 77–96.
[12] R. Brachman, “What is-a and isn’t: An analysis of taxonomiclinks in semantic networks,” IEEE Computer, vol. 16, no. 10,pp. 10–36, 1983.
226228
[13] M. Uschold and M. Gruninger, “Ontologies: Principles,methods, and applications,” Knowledge Engineering Review,vol. 11, no. 2, pp. 93–155, 1996.
[14] A. Zouaq and R. Nkambou, “Enhancing learning objects withan ontology-based memory,” IEEE Trans.on Knowledge AndData Engineering, vol. 21, no. 6, pp. 881–893, 2009.
[15] W. Wei et al., “Probabilistic topic models for learningterminological ontologies,” IEEE Trans.on KnowledgeAnd Data Engineering, 2009. [Online]. Available:http://doi.ieeecomputersociety.org/10.1109/TKDE.2009.122
[16] J. Heflin and J. Hendler, “Dynamic ontologies on the web,” inAmerican Association For Artificial Intelligence Conf. AAAIPress, California, 2000, pp. 251–254.
[17] S. Geffner et al., “Browsing large digital library collectionsusing classification hierarchies,” in 8th Int. Conf. on Infor-mation Knowledge Management, CIKM’99, S. Gauch, Ed.ACM, New York, 1999, pp. 195–201.
[18] J. McCormack and B. Wohlschlaeger, “Harnessing agenttechnologies for data mining and knowledge discovery,” inData Mining and Knowledge Discovery: Theory, Tools andTechnology II, vol. 4057, 2000, pp. 393–400.
[19] L. Lacy, OWL: Representing Information Using the WebOntology Language. Trafford Publishing, USA, 2005.
[20] JRG Pulido et al., “Ontology languages for the semanticweb: A never completely updated review,” Knowledge-BasedSystems, vol. Elsevier volume 19, issue 7, pp. 489–497, 2006.
[21] I. Horrocks et al., “From SHIQ and RDF to OWL: The mak-ing of a web ontology language,” Journal of web semantics,vol. 1, no. 1, pp. 7–26, 2003.
[22] A. Gomez and O. Corcho, “Ontology languages for theSemantic Web,” IEEE Intelligent Systems, 2002.
[23] D. Fensel et al., “OIL in a nutshell,” in Proc.EuropeanKnowledge Acquisition Conf., ser. LNAI, R. Ding et al., Eds.Springer-Verlag, Berlin, 2000.
[24] H. Dozono and T. Takahashi, “Mapping of the genomesequence using two-stage self organizing maps,” in 6th IntlWorkshop on Self-Organizing Maps (WSOM’07), H. Ritteret al., Eds. Neuroinformatics group, Bielefeld University,Germany, 2007, pp. 1–7, CD edition.
[25] JRG Pulido et al., “A novel approach to the analysis ofvolcanic-domain data using self-organizing maps: A prelimi-nary study on the volcano of colima,” Research in computerscience, IPN, Mexico, vol. 40, pp. 49–59, 2008.
[26] V. Guerrero et al., “Document organization using Kohonen’salgorithm,” Information Processing and Management, vol. 38,pp. 79–89, 2002.
[27] T. Kohonen et al., “Self organization of a massive textdocument collection,” in Kohonen Maps, E. Oja and S. Kaski,Eds. Elsevier Sci, Amsterdam, 1999, pp. 171–182.
[28] S. Kaski et al., “WEBSOM - Self-organizing maps of doc-ument collections,” Neurocomputing, vol. 6, pp. 101–117,1998.
[29] I. Sanz et al., “Gathering metadata from web-based repos-itories of historical publications,” in 9th Int. Workshop onDatabase and Expert Systems Apps, A. Tjoa and R. Wagner,Eds. IEEE Computer Soc.Press, Los Alamitos, 1998, pp.473–478.
[30] A. Gomez et al., “Knowledge maps: An essential tech-nique for conceptualisation,” Data & Knowledge Engineering,vol. 33, pp. 169–190, 2000.
[31] J. Gordon, “Creating knowledge maps by exploiting depen-dent relationships,” Knowledge-Based Systems, pp. 71–79,2000.
[32] A. Waterson and A. Preece, “Verifying ontological commit-ment in knowledge-based systems,” Knowledge-Based Sys-tems, vol. 12, pp. 45–54, 1999.
[33] E. Motta et al., “Ontology-driven document enrichment:principles, tools and applications,” Int.J.Human-ComputerStudies, vol. 52, pp. 1071–1109, 2000.
[34] T. Kohonen, Self-Organizing Maps, 3rd ed., ser. InformationSciences Series. Springer-Verlag, Berlin, 2001.
[35] H. Ritter and T. Kohonen, “Self-organizing semantic maps,”Biological Cybernetics, vol. 61, pp. 241–254, 1989.
[36] JRG Pulido et al., “On the finding process of volcano-domainontology components using self-organizing maps,” in 7th IntlWorkshop on Self-Organizing Maps (WSOM’09), ser. LNCS,J. Principe and R. Miikkulainen, Eds., vol. 5629. Springer-Verlag, Berlin, 2009, pp. 255–263.
[37] JRG Pulido et al., “Artificial learning approaches for thenext generation web: part II,” Ingenierıa Investigacion yTecnologıa, UNAM (CONACyT), Mexico, vol. 9, no. 2, pp.161–170, 2008.
[38] JRG Pulido et al., “Artificial learning approaches for thenext generation web: part I,” Ingenierıa Investigacion y Tec-nologıa, UNAM (CONACyT), Mexico, vol. 9, no. 1, pp. 67–76,2008.
[39] JRG Pulido et al., “In the quest of specific-domain ontol-ogy components for the semantic web,” in 6th Intl Work-shop on Self-Organizing Maps (WSOM’07), H. Ritter et al.,Eds. Neuroinformatics group, Bielefeld University, Germany,2007, pp. 1–7, CD edition.
[40] S. Legrand and JRG Pulido, “A hybrid approach to wordsense disambiguation: Neural clustering with class labeling,”in Workshop on knowledge discovery and ontologies, 15thEuropean Conference on Machine Learning (ECML), Pisa,Italy, P. Buitelaar et al., Eds., September 2004, pp. 127–132.
[41] A. Maedche and V. Zacharias, “Clustering ontology-basedmetadata in the semantic web,” in Principles of Data Miningand Knowledge Discovery: 6th European Conference, PKDD2002, ser. LNCS, T. Elomaa et al., Eds., vol. 2431. Springer,2002, pp. 348–360.
227229