29
The Second Joint International Semantic Technology Conference, JIST 2012 Nara, Japan, December 2012 Poster and Demonstration Proceedings Ryutaro Ichise Seokchan Yun (Eds.)

JIST 2012 Poster and Demonstration Proceedings

  • Upload
    others

  • View
    5

  • Download
    0

Embed Size (px)

Citation preview

The Second Joint International Semantic TechnologyConference, JIST 2012

Nara, Japan, December 2012

Poster and Demonstration Proceedings

Ryutaro IchiseSeokchan Yun (Eds.)

Table of Contents

Poster Paper

1. Semi-automatic Ontology Integration Framework . . . . . . . . . . . . . . . . . . 1Lihua Zhao and Ryutaro Ichise

2. Development of Ontology for the Radiological Imaging ProcedureInformation System - Adaptation of Disease and AnatomicalStructure - . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

Tatsuaki Kobayashi, Tokuo Umeda, Tsutomu Gomi and Akiko Okawa

3. Reasoning Approaches for Nominal Schemas . . . . . . . . . . . . . . . . . . . . . . . 5Cong Wang, Adila Krisnadhi, David Carral Martınez and PascalHitzler

4. A Logical Model for Taxonomic Concepts for Expanding Knowledgeusing Linked Open Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

Rathachai Chawuthai, Hideaki Takeda, Vilas Wuwongse and UtsugiJinbo

5. Earthquake Ontology and Researches on Earthquake Prediction . . . . . . 9David Ramamonjisoa

6. Building an RDFized Life Science Dictionary . . . . . . . . . . . . . . . . . . . . . . 11Yasunori Yamamoto and Shoko Kawamoto

7. Multilayer of Ontology-Based Floor Plan Representation forOntology-Based Indoor Emergency Simulation . . . . . . . . . . . . . . . . . . . . . 13

Chaianun Damrongrat, Hideaki Kanai and Mitsuru Ikeda

8. An Ontology Model and Service for Managing Scientific andCommon Names of Plants . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

Jouni Tuominen, Nina Laurenne and Eero Hyvonen

Demonstration Paper

9. PromoterCAD: Data Driven Design of Plant Regulatory DNA . . . . . . . 17Robert Cox, Koro Nishikata, Minami Matsui and Tetsuro Toyoda

10. LinkData.org Synergistically Associating RDF Data Repositoryand Application Repository Stimulates Positive Feedback of MutualDeveloping Data Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

Sayoko Shimoyama, David Gifford, Yuko Yoshida and Tetsuro Toy-oda

11. Qualitative Description of Time based on Ontology in the Domainof Apoptosis Signaling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

Chihiro Yamauchi, Kazuaki Kojima and Tatsunori Matsui

i

12. Generating LOD from Web: A Case Study on Building IntegratedMuseum Collection Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

Fuyuko Matsumura, Fumihiro Kato, Tetsuro Kamura, Ikki Ohmukaiand Hideaki Takeda

13. User-Adaptive Technology Intelligence System . . . . . . . . . . . . . . . . . . . . . 25Seungwoo Lee, Do-Heon Jeong, Jinhyung Kim, Myunggwon Hwang,Minhee Cho, Sa-Kwang Song, Soon-Chan Hong and Hanmin Jung

ii

Semi-automatic Ontology Integration Framework

Lihua Zhao and Ryutaro Ichise

National Institute of Informatics, Tokyo, Japan,{lihua, ichise}@nii.ac.jp

Abstract. A semi-automatic ontology integration framework is intro-duced in this paper which can discover related ontology schema by an-alyzing SameAs graphs and retrieve important and frequent propertiesfrom core classes using a machine learning method. The framework canconstruct a high-quality integrated ontology from the linked data sets.

Keywords: linked data, semantic web, ontology integration

1 Introduction

With the growth of the Linked Open Data (LOD) cloud, many Semantic Webapplications have been developed by accessing the linked data sets [1]. However,in order to use the data sets, we have to understand the heterogeneous ontologiesof the data sets in advance. The ontology heterogeneity problem has becomea popular issue, which can be solved by constructing a global ontology thatintegrates ontologies from various data sets.

In this paper, we propose a framework that can semi-automatically integrateheterogeneous ontologies by merging the ontologies created from two frameworks.The first framework can retrieve related properties and classes among differentontologies by applying ontology matching methods on SameAs graph patterns.The second framework can find important and frequent properties from coreclasses of each data set. By combining these two framework, we can construct aconcrete global ontology for various linked data sets.

2 Semi-automatic Ontology Integration Framework

Constructing a global ontology by integrating heterogeneous ontologies of linkeddata can effectively integrate various data resources. The ontology integrationframework that can solve the ontology heterogeneity problem is shown in Figure1, which consists of two frameworks that generate ontologies from liked data setsand an ontology merger.

The first framework semi-automatically finds related properties and classesby analyzing SameAs graph patterns in the linked data sets. It consists of theSameAs graph pattern extractor, <Predicate, Object> (PO) pairs collector, on-tology matching system, and ontology aggregator. This framework can create ahigh-quality integrated ontology with minor manual revision [3].

JIST 2012 Poster and Demonstration 1

Fig. 1: Framework for Ontology Integration

However, the first framework only integrates related properties and classesbetween different ontologies. Therefore, we need another framework to find im-portant properties and classes that frequently appear in the linked data sets.The second framework consists of two functions, where one is to find importantproperties from core classes, and the other is to find frequently used propertiesand classes. By applying machine learning methods, we can find important prop-erties that are used to describe instances of a specific class. The weight of eachproperty in an instance can be calculated as the product of property frequency(PF) and the inverse instance frequency (IIF) in a similar way as the TF-IDF [2].The PF is the frequency of a property in an instance and the IIF is the logarithmof the ratio between the number of instances in a data set and the number ofinstances that contain the property. The frequent properties and classes can befound by analyzing the distribution of the usage in the instances.

With the ontology merger, we can merge two ontologies created with theabove two frameworks, and construct a global integrated ontology that can helpus easily access various data sets.

3 Conclusion

In this paper, we proposed a semi-automatic ontology integration frameworkthat can integrate heterogeneous ontologies. The integrated ontology consistsof important and frequent properties and classes that can help Semantic Webapplication developers easily find related instances and query on various datasets. With the ontology, we can also detect misuses of ontologies in the data setsand can recommend important properties for describing instances.

References

1. Bizer, C., Heath, T., Berners-Lee, T.: Linked data - the story so far. InternationalJournal on Semantic Web and Information Systems 5(3), 1–22 (2009)

2. Manning, C.D., Raghavan, P., Schtze, H.: Introduction to Information Retrieval.Cambridge University Press (2008)

3. Zhao, L., Ichise, R.: Graph-based ontology analysis in the linked open data. In:Proceedings of the Eighth International Conference on Semantic Systems. pp. 56–63 (2012)

JIST 2012 Poster and Demonstration 2

Development of Ontology for the Radiological Imaging Procedure Information System

- Adaptation of Disease and Anatomical Structure -

Tatsuaki Kobayashi1,*, Tokuo Umeda1, Tsutomu Gomi1, and Akiko Okawa2

1 Kitasato University, Graduate school of Medical Imaging Analysis, Kanagawa, Japan 2 Nagoya University, Graduate school of Nursing, Aichi, Japan

Abstract.Radiological imaging procedure information has been managed by the

radiological inspection master code (e.g., the JJ1017-32 code for classification of

radiological imaging procedure information in Japan). This code can classify abstract

anatomical structures. However, it cannot classify diseases and detailed anatomical

structures. If radiological imaging procedure information can be associated with such

data, medical technologists can use digital radiological imaging procedure information.

In this study, by using an ontology, we examined the CT (Computed Tomography)

imaging procedure concepts associated with the ICD-10 code (International

Classification of Diseases-10 code Ver.2003) for classification of diseases, and the

FMAID (Foundational Model of Anatomy Identifier) for classification of anatomical

structures. In addition, we incorporated this concept into the ontology-driven application,

and tried to access the ontology on CT imaging procedures.

Keywords: Ontology, CT, ICD, FMA, JJ1017

1 Introduction

Recently, radiological medical information has been managed by machine-readable codes, which has

promoted sharing and reusing of knowledge resources. However, it is still difficult to share/reuse radiological

imaging procedure information. In this study, we used an ontology to share and reuse such information. An

ontology can be used to clarify relationships between concepts and digitize them.

2 Materials and Methods

Among CT (Computed Tomography) ontologies that we have previously constructed, an ontology for CT

imaging procedure concepts was constructed based on the contents of the CT imaging guidelines in Japan. We

constructed CT imaging procedure ontology structure of the ICD-10 code (International Classification of

Diseases-10 code Ver.2003) and the FMAID (Foundational Model of Anatomy Identifier) as an attribute of the

corresponding knowledge structure of this concept. In addition, we created the ICD-10 code ontology in the hozo

ontology xml format. In this case, we used the Hozo ontology editor (http://www.ei.sanken.osaka-u.ac.jp/hozo/).

Next, we compared information granularity of the previous CT imaging procedure concepts classified by the

JIST 2012 Poster and Demonstration 3

JJ1017-32 code (Ver.3.2) with that of the new CT imaging procedure concepts.

Further, we verified the CT imaging procedure concepts in the CT ontology (e.g., CT imaging procedure

concepts targeting the primary hepatocellular carcinoma) using the ontology-driven application (Hozo-OAT

expansion program).

3 Results

The number of the anatomical structural concepts of the JJ1017-32 code was 423. On the other hand, the total

number of the concepts of the FMA (Foundational Model of Anatomy) ontology was 84009. The disease concept

could not be classified using the JJ1017-32 code. The number of the ICD-10 code concepts that may be adapted

to radiological inspection in Japan was 7223. Therefore, the granularity of radiological imaging procedure

information of new CT imaging procedure concepts greatly increased from that of previous CT imaging

procedure concepts. In addition, we identified CT imaging procedure information from these codes, using the

ontology-driven application.

4 Discussion

We were able to expand the semantic space of this ontology by incorporating other ontologies. Thus,

radiological imaging procedure information can be classified by diseases and detailed anatomical structures more

efficiently with the ontology than with the JJ1017-32 code. Therefore, we expect improved access to medical

information, using the radiological imaging procedure ontology developed by this research.

5 Conclusions

In order to manage radiological imaging procedure information, we proposed the CT imaging procedure

concept ontology associated with diseases and anatomical structures. This ontology has strengthened the

semantics, compared to the previous CT imaging procedure concepts. Moreover, we incorporated this ontology

into the ontology-driven application and demonstrated access to it.

References

1. Mejino JL, Rubin DL, Brinkley JF. FMA-RadLex: An application ontology of radiological anatomy derived

from the foundational model of anatomy reference ontology. AMIA Annu Symp Proc 2008. 2008;465–469.

2. Rubin DL, Flanders A, Kim W, Siddiqui KM, Kahn CE Jr. Ontology-Assisted Analysis of Web Queries to

Determine the Knowledge Radiologists Seek. J Digit Imaging. 2011;24 (1):160–164.

[November 20, 2012]

JIST 2012 Poster and Demonstration 4

Reasoning Approaches for Nominal Schemas?

Cong Wang, Adila Krisnadhi, David Carral Martınez and Pascal Hitzler

Kno.e.sis Center, Wright State University, Dayton OH 45435, USA

Abstract. Nominal schemas are a new DL constructor which can beused like ”variable nominal classes” within axioms. This feature allowsDL languages to express arbitrary DL-safe rules in their native syntax. Inthis paper we summarize several reasoning approaches recently devisedto reason over nominal schemas. Although we have made some progress,there are still some interesting challenges yet to be solved.

1 Introduction

Nominal schemas are a new constructor that enhances the expressivity of the DLparadigm. Extended with nominal schemas, the DLs fragments can encompassDL-safe SWRL, arbitrary Datalog rules and even some non-monotonic rules [1]expressivity. Although the inclusion of this new constructor does not worsenthe complexities of the DL languages, the development of practical reasoningalgorithms for nominal schemas is a challenging task.

Many traditional algorithms require normalization of DL axioms, but nomi-nal schemas can be used to represent arbitrary complex rules and hence preventus from achieving normal forms. A naive full-grounding algorithm can solve thisissue by replacing nominal schemas in an axiom with all the possible combina-tions of named individuals contained in a given ontology. Although this approachis complete, it usually results in a huge increase of the number of axioms mak-ing reasoning unpractical. We have therefore started to study some ”smart” ap-proaches that could potentially limit the grounding of nominal schemas reducingthe overhead of full grounding.

2 New Algorithms

Tableau Based Algorithms. We have defined a modification [2] which ex-tends standard tableau algorithms with grounding rules in such a way thatgrounding can be delayed until required. For example, considering C v ∃R.∃S.{z},we can delay grounding of z after applying ∃-rule on ∃R.∃S.{z}. While this newalgorithm provides a more flexible way of grounding, it needs good heuristicsthat could speed up the grounding in practice (See details in [2]).

? This work was supported by the National Science Foundation under award 1017225III: Small: TROn – Tractable Reasoning with Ontologies.

JIST 2012 Poster and Demonstration 5

Resolution Based Algorithms. As opposed to the previous approach, theresolution calculus for algorithmization, where grounding is handled on the flyvia unification, can potentially reduce the amount of groundings. The proof oftermination usually relies on the saturation on limited types of clauses, but inmany cases nominal schemas can lead to many complicated clauses. The reso-lution based algorithm successfully addresses this problem problem by using alifting lemma showing resolution on nominal schema axioms takes fewer resolu-tion steps than performing resolution on fully grounded knowledge bases. (Seedetails in [4]). But, it is believed that resolution procedure is hardly optimizedto reduce the number of many irrelevant clauses produced. Moreover, we haveno clue to handle with role chain.

Hypertableau Based Algorithms. The tableau and resolution calculi bothmay not be efficient enough to deal with nominal schemas. The Hypertableau,combining the mechanism of the tableau and resolution calculi, can reduce non-determinism and the size of the constructed models. It takes clauses into whichnominal schemas can be easily translated and all such clauses are Horn-likesuch that nondeterminism can be reduced. However, the blocking condition and∃-rule still have to be carefully modified in order to ensure termination.

More. One may also consider the approach in [3] by translating DLs axiomsinto a set of datalog rules. The remaining challenge still lies on the normalizationissue. Consider such an axiom with complex concept at right head, ∃R.{z} v∃S.(∃T.{z} u C). It is not straightforward to see how to translate it.

3 Conclusions

We show an interesting and challenging reasoning problem in this paper. Ourprevious attempts are not fully satisfying, but we foresee some more promis-ing approaches. One can proceed in addressing the problem by borrowing somethoughts.

References

1. Knorr, M., Martınez, D.C., Hitzler, P., Krisnadhi, A.A., Maier, F., Wang, C.: Recentadvances in integrating owl and rules (technical communication). In: Krotzsch, M.,Straccia, U. (eds.) RR. Lecture Notes in Computer Science, vol. 7497, pp. 225–228.Springer (2012)

2. Krisnadhi, A., Hitzler, P.: A tableau algorithm for description logics with nomi-nal schema. In: Krotzsch, M., Straccia, U. (eds.) RR. Lecture Notes in ComputerScience, vol. 7497, pp. 234–237. Springer (2012)

3. Krotzsch, M.: Efficient rule-based inferencing for OWL EL. In: Walsh, T. (ed.)Proceedings of the 22nd International Joint Conference on Artificial Intelligence(IJCAI’11). AAAI Press/IJCAI (2011), 2668–2673

4. Wang, C., Hitzler, P.: A resolution procedure for description logics with nominalschemas. In: Proceedings of The 2nd Joint International Semantic Technology Con-ference, JIST 2012, Nara, Japan, Dec 2012. Lecture Notes in Computer Science,Springer, Heidelberg (2012), to appear

JIST 2012 Poster and Demonstration 6

A Logical Model for Taxonomic Concepts for Expanding

Knowledge using Linked Open Data

Rathachai Chawuthai1, Hideaki Takeda

2, Vilas Wuwongse

3, and Utsugi Jinbo

4

1 Asian Institute of Technology, Prathumtani, Thailand

[email protected] 2 National Institute of Informatics, Tokyo, Japan

[email protected] 3 Thammasat University, Prathumtani, Thailand

[email protected] 4 National Museum of Nature and Science, Tokyo, Japan

[email protected]

Abstract. The variety of classification systems and the new discovery of

taxonomists lead to the diversity of biological information, especially taxon

concepts. The association among taxon concepts across research institutes is

very difficult to establish, because there is no single interpretation of the name

of a taxon concept. Owing to this difficulty, further expansion of more

biological knowledge is very complicated for taxonomists when they deal with

many sources of data or ambiguous concepts. In order to link some relevant

taxon concepts across research repositories, it is necessary to consider the

precise context of bioloical data also. As a result, we propose a logical model

for taxon concepts in Resource Description Framework (RDF). Moreover, we

implement a prototype to demonstrate the feasibility of our approach. It has

been found that our model can publish taxon information as linked data and,

hence, with additional benefits from Linked Open Data (LOD) cloud.

Keywords. Logical model, Linked data, Ontology, Taxon Concept

1 Background

More than 1.4 million species throughout the world have been truly described and

classified with appropriate naming depended upon their characteristics; such as, mor-

phological characters, living behaviors, DNA sequences, etc. [1-2]. Many taxonomists

have studied living things, research, and publish their knowledge for over hundred

years. However, before the information age, their researches were not completely

shared across all researchers around the world. Some researchers might have different

perspectives to classify and name their specimens. It is possible that the same species

may be classified and named differently [2]. For example, Papilio xuthus, Chinese

yellow swallowtail butterfly, has been given at least four names by four taxonomists.

As a result, scholars and researchers are inconvenient to study all information about

one living thing completely from a single taxonomic name.

JIST 2012 Poster and Demonstration 7

2 The Proposed Logical Model

Our paper presents a logical model and ontology for linking taxon concepts which

comprises a series of changes, the diversity of taxonomic classifications, and the vari-

ety of naming. For the purpose of linking data, we have developed our model by em-

ploying ontology of contextual knowledge evolution together with some widely ac-

cepted ontology such as LODAC [3] and SKOS [4]. This research proposed some

useful operations that specify the changes of taxon concepts; for instance, change

taxonomic hierarchy, rename, merge, replace, split, etc. We also introduced some

kinds of links between taxon concepts, for example, common name, correct spelling,

homonym, junior synonym, senior synonym, etc. The model is expressed in an ontol-

ogy named Linked Taxonomic Knowledge (LTK). Moreover, we enhance the Con-

textual Knowledge for Archives (CKA) ontology in order to deal with both dynamic

and static information represented in RDF and hence the history of the taxon concept

can be traced back [5]. For example, two genera of owls named Nyctea and Bubo

have been merged into the latter genus Bubo. Following the change of genera, the

scientific name of a snowy owl Nyctea scandiaca has been replaced by Bubo scandi-

acus in order to satisfy the convention of scientific name [6-7]. These facts will be

presented in RDF that satisfies the logical model from the CKA approach as follows:

ex:even1999 cka:interval [tl:beginAtDateTime “1999”] ;

cka:assure ex:mg1, ex:rp1 .

ex:mg1 rdf:type ltk:MergingTaxonConcept ;

cka:conceptBefore genus:Bubo, genus:Nyctea ;

cka:conceptAfter genus:Bubo_1999 .

ex:rp1 rdf:type ltk:ReplacingTaxonConcept ;

cka:conceptBefore species:Nyctea_scandiaca ;

cka:conceptAfter species:Bubo_scandiacus .

In practice, we implement prototype by having inference rule, so it can trace back to

the changes of these species’ names that are caused from the merging of the two gene-

ra. In addition, the application provides links to the relevant data with the data in

LODAC. As a result, we found that our LTK model is feasible and suitable for col-

lecting the change of taxon concepts and establishing links to relevant concepts across

research institutes in order to expand more knowledge about taxon concepts.

References

1. Darwin, C., Peckham, M.: On the Origin of Species by Means of Natural Selection, or the

Preservation of Favoured Races in the Struggle for Life. Penn Press, Philadelphia (1959)

2. Winston, J. E.: Describing Species: Practical Taxonomic Procedure for Biologists. Colum-

bia University Press, New York (1999)

3. Linked Open Data for Academia, http://lod.ac/

4. http://www.w3.org/2009/08/skos-reference/skos.html

5. Chawuthai, R., Wuwongse, V., Takeda, H.: A Formal Approach to the Modeling of Digital

Archives. In: ICADL’12, Taipei (2012)

6. Wink, M., Heidrich, P.: A Guide to Owls of the World. Yale University Press (1999)

7. International Code of Zoological Nomenclature, http://www.iczn.org

JIST 2012 Poster and Demonstration 8

EARTHQUAKE ONTOLOGY AND RESEARCHES ON EARTHQUAKE

PREDICTION

David Ramamonjisoa Faculty of Software and Information Science, IPU, 152-52 Sugo Takizawa Iwate, Japan

ABSTRACT

This paper describes the method and development of earthquake pre-diction ontology. In my previous work, I developed ontologies for earth-quake. I extend that work to construct the earthquake prediction on-tology. The aim is to build the ontology as complete and correct as pos-sible from freely available databases or textbooks on the top of an ex-isting middle layer ontologies SWEET. The current status of the pro-ject and a preliminary result of the model are presented. 1. INTRODUCTION Ontologies can be defined as machine interpretable definitions of domain

concepts and the interrelation between those concepts representing do-main knowledge. I am following the ontology methodology called V-model and the ontology

building life-cycle (Stevens 2001 [1]) to build the ontologies. The domain studied in this paper concerns the earthquakes (EQs) and

their prediction knowledge. The purpose is to acquire the best knowledge by analyzing available expert research publications and web data. The earthquake ontology I developed was constructed semi-autonomously

using text mining techniques and natural language processing [2]. 2. APPROACH In this paper, I present the earthquake prediction ontology construction

based on SWEET ontologies (http://sweet.jpl.nasa.gov/ontology/) (Raskin and Pen, 2005 [3]) and web data mining. 2.1. Methods of ontologies development a) Find ontologies concerning the domains faults, earthquakes, earth, as-tronomy, tectonic plates b) Encode the prediction part within those ontologies c) Conceptualize terms within earthquake prediction research publications d) Validate the ontology e) Share to the community

JIST 2012 Poster and Demonstration 9

I also follow the method developed by the Ontologist Prof. Barry Smith [5] such that every node in the ontology should represent both universals (terms used in a plurality of sciences to designate entities) and the corre-sponding instances in reality and the Ontology Development 101 [6].

3. EARTHQUAKE PREDICTION ONTOLOGY EQ Predictions can be classified into three categories: long-term predic-tion, medium-term prediction and imminent/short term prediction. Long and medium term predictions are based on geological data and mo-tions of tectonic plates. Imminent predictions are focusing on the cloud patterns, animal behaviors, P-waves and weather anomalies [4]. The fol-lowing is an example of the ontology: Disjoint classes:Long-term prediction/Medium-term prediction/Imminent prediction EQ Prediction has_a latitude, longitude, time, magnitude range, number of EQ in the given range, depth range, focal mechanism, percentage of success rate, heuristics New Slots for subclasses: Time-frame, fault type, geological data, recurrence time, foreshocks, aftershocks, possible_intensity,.. Long-term prediction has_a { Time_frame = [2years~100+years], recurrence_time = 150years, fault_type = active, foreshocks =..., geological_data=Interplates, possible_intensity=M>8.0,..} Medium-term prediction has_a { Time_frame = [month~2years], fault_type = active, foreshocks=..., geo-logical_data=high strained rock strata,…} Imminent prediction has_a { Time_frame = [seconds~month], fault_type = active, foreshocks =..., P_waves, clouds formation, wheater anomalies, animals behavior anomalies,…} Other terms used on EQ predictions are ‘elastic rebound theory’, ‘slow strain monitoring’, co-seismic changes, slip predictable, non-Poissonians, and ‘seismic gap hypothesis’, GPS-monitoring, Lithosphere, Ionosphere, tides by Moon phases, Sun CME, volcanic activity . 4. DISCUSSION Concepts in SWEET2.3 cover partially the EQ ontology and slots in EQ

Prediction ontology. I used phenGeoSeismicity.owl, phenGeolTectonic.owl, phenVolcanic.owl, phenStar.owl and phenGeolFault.owl. Earthquake hazard maps created by the www.usgs.org or www.aist.go.jp

indicate the location of the active faults and their percentage of chance to generate a megaquake or great earthquake M6+. The most reliable parameter for predicting the earthquakes used by the experts and validated is the foreshocks. However, every foreshock did not induce the big one. Experts are still investigating the phenomena. REFERENCES

1. R. Stevens et al.: Ontology-based Knowledge Representation for Bioinformatics, Brief. Bioinformatics, 2000

2. D. Ramamonjisoa: Development of earthquake ontology. In proc. of ANLP March 2012. 3. B. Raskin and Pen: SWEET ontologies, JPL and NASA internal report, 2005 4. F.T. Freund: Rocks That Crackle and Sparkle and Glow: Strange Pre-Earthquake Phe-

nomena, Journal of Scientific Exploration, Vol. 17, No. 1, pp. 37–71, 2003. 5. B. Smith: Ontology (Science). Nature Precedings. hdl.handle.net/10101/npre.2008.2027.2 6. N. Noy et al.:Ontology Development 101. protege.stanford.edu

JIST 2012 Poster and Demonstration 10

Building an RDFized Life Science Dictionary

Yasunori Yamamoto and Shoko Kawamoto

Database Center for Life Science, Bunkyo, Tokyo, Japan {yayamamo,shoko}@dbcls.rois.ac.jp

Abstract. There is a growing need for efficient and integrated access to data-bases provided by diverse institutions. Using Resource Description Framework (RDF) to publish a dataset makes it more reusable. Furthermore, providing a dictionary to translate words into another language in RDF is useful when we want to access datasets across a language barrier. Here, we built an RDF ver-sion of the Life Science Dictionary (LSD). LSD consists of various lexical re-sources including English-Japanese / Japanese-English dictionaries and a the-saurus. Since we believe that LSD is a useful language resource in the life sci-ence domain to use multilingual data seamlessly, we assumed that its RDF ver-sion enables us to make LSD more reusable and therefore contributes to the life science research community.

Keywords. Multi-lingual language resource, Dictionary, RDF

1 Background

To link heterogeneous databases and provide users with access to them in an integrat-ed manner, publishing datasets using Resource Description Framework (RDF) has increasing appeal to database developers and users. It enables us to access raw data using the World Wide Web approach such as Uniform Resource Identifier (URI) and Hypertext Transfer Protocol (HTTP). In addition, the number of non-English RDF datasets is increasing [1]. Therefore, there are growing needs of cross language RDF resources to link monolingual RDF data sets of different languages [1]. An example is DBpedia [2], which has made the contents of Wikipedia available in RDF. Wikipedia is the largest open, collaboratively developed encyclopedia project, but it is not neces-sarily reliable to use it as a translation dictionary in a specific domain. For example, Wikipedia has 149 pages in the category of "World Health Organization essential medicines" in English, but has only 56 in Japanese [3].

The Life Science Dictionary (LSD) [4] consists of some lexical resources including English-Japanese / Japanese-English dictionaries, and a thesaurus using the Medical Subject Headings (MeSH) vocabulary, the NLM controlled vocabulary thesaurus used for indexing articles for PubMed. LSD has been edited and maintained by the LSD project since 1993. Project members are experts in the domain. To be used as a com-plement to DBpedia in the life science domain, we built an RDFized LSD.

JIST 2012 Poster and Demonstration 11

2 Methods and Results

We used the latest version (Mar. 2011) of LSD that contains 110k English and 120k Japanese terms, which consists of several tab-delimited plain text files. We developed an ontology of LSD to express its schema in RDF using the Protege ontology editor [5]. There are ambiguous column names across the tables and the relationships among them are not clear. The second author knows LSD well, and therefore her knowledge was used to disambiguate them. As a result, the numbers of the classes and the prop-erties we created are eight and 16, respectively. We also used some Simple Knowledge Organization System (SKOS) terms in addition to basic ones from the Web Ontology Language (OWL) vocabulary.

Using this ontology, we built the RDF version of LSD, which has about 5.6M tri-ples. We loaded them into the OWLIM triple store. To improve the reliability, we iterated the development cycle: building an ontology, RDFizing LSD, loading them into the triple store, and evaluating it to see if there are any undesirable or unexpected semantic relationships among terms. For example, we verified that two semantically different terms are not connected by a SKOS predicate.

3 Conclusions

We built an RDFized LSD, which can be freely accessible from http://purl.jp/bio/10/lsd/sparql under the license of Creative Commons Attribution-NoDereivs 2.0 Generic (CC BY-ND 2.0). We are using this dictionary to provide our cross language search service. We hope it to be used widely to utilize multilingual resources in life science seamlessly.

Acknowledgements. We thank Dr. Shuji Kaneko for permitting us to release LSD under CC BY-ND 2.0. This work is funded by the Integrated Database Project, Minis-try of Education, Culture, Sports, Science and Technology of Japan and National Bioscience Database Center (NBDC) of Japan Science and Technology Agency (JST).

References 1. Gracia, J., et al.: Challenges for the multilingual Web of Data. Web Semantics: Science,

Services and Agents on the World Wide Web. 11, 63–71 (2012) 2. Lehmann, J., et al.: DBpedia—a crystallization point for the web of data. Journal of Web

Semantics, 7 (3), 154–165 (2009) 3. http://en.wikipedia.org/wiki/Category:World_Health_Organization_essential_medicines 4. Kawamoto, T.,et al.: Life Science Dictionary: statistical and collocational analyses of life

science English. 20th IUBMB International Congress of Biochemistry and Molecular Bi-ology and 11th FAOBMB Congress, Kyoto (2006)

5. Protégé project, http://protege.stanford.edu

JIST 2012 Poster and Demonstration 12

Multilayer of Ontology-Based Floor PlanRepresentation for Ontology-Based Indoor

Emergency Simulation

Chaianun Damrongrat, Hideaki Kanai, and Mitsuru Ikeda

School of Knowledge ScienceJapan Advanced Institute of Science and Technology, Ishikawa, Japan 923-1211

{chaianun.d,hideaki,ikeda}@jaist.ac.jp

http://www.springer.com/lncs

Abstract. We propose a multilayered ontology-based floor plan repre-sentation. It suports a study of indoor emergency via ontology-based sim-ulation. Our approach uses ontology to model floor plan into various per-spectives e.g., structure, accessibility. These perspectives are representedby multilayer of graphs. There are two main advantages in this model.First is capability of handling dynamic situation and consequences ofemergency using ontology and reference rules. Second is the use of mul-tilayered graph-based representation in describing how the simulationgoes with the incident scenarios.

Keywords: Multilayered floor plan representation, ontology based mod-eling, emergency situation

1 Methodology

Modeling a floor plan with ontology is useful for sharing, reuse and meaning-fulness. OntoNav [1] modeled a floor plan in navigation perspective. It is notenough for OntoNav to handle dynamic situation caused by incidents. This re-search wants to attack this issue. Ontology is used for capturing floor plan’s con-cepts in various “Perspective” classes. Other concepts such as classes of “Struc-tureComponent”, “FunctionalRequirement” are also defined. Then we use thisontology and a set of reference rules in order to describe relationships amongthese perspectives and link them. We also use them in order to detect dynamicsituations and consequences. Fig.1 shows some parts of our ontology design. Toevaluate our proposed idea, we set up an example scenario as followed: “When anunexpected situation happens, a power control room cannot provide the electric-ity to any appliance and other places. Consequencely, elevators cannot operateto any purpose, including use for escape purpose.” In this scenario, we can seehow “PowerControl perspective” causes consequences on “Accessibility perspec-tive.” Our method is not only able to deal with this kind of consequence, butalso represent the current situation of the simulation with layers of graph. One

JIST 2012 Poster and Demonstration 13

Fig. 1. Example of ontology design describing a floor plan.

graph represents one perspective. This lets us to notice what happens in whichperspectives in a glance. It makes the monitoring of simulation result more con-venient than text-based one. Our method overcomes the exitsting graph-basedrepresentation [2] which represented all perspectives with a single graph. Only afew perspectives may cause too overwhelming information to represent in a singlegraph. With this reason, our method is better to understand the situation.

2 Discussion and future work.

This work proposes a multilayer of ontology-based floor plan representation forontology-based indoor emergency simulation. Ontology is used to capture a floorplan’s perspectives and to link relationship among them. Perspectives are repre-sented by multilayer of graphs. From the scenario given in the previous section,our research shows two advantages — dynamic-situation handling and betterrepresentation. For the future work, we plan to use this idea concept to combinewith human ontology and a simple simulation of emergency situation.

References

1. V. Tsetsos, C. Anagnostopoulos, P. Kikiras, P. Hasiotis, and S. Hadjiefthymiades. Ahuman-centered semantic navigation system for indoor environments. In PervasiveServices, 2005. ICPS’05. Proceedings. International Conference on, pages 146–155.IEEE, 2005.

2. Lisa Walton and Michael F. Worboys. An algebraic approach to image schemas forgeographic space. In COSIT, pages 357–370, 2009.

JIST 2012 Poster and Demonstration 14

An Ontology Model and Service for ManagingScientific and Common Names of Plants

Jouni Tuominen, Nina Laurenne, and Eero Hyvonen

Semantic Computing Research Group (SeCo)Aalto University School of Science, Dept. of Media Technology, and

University of Helsinki, Dept. of Computer Sciencehttp://www.seco.tkk.fi, [email protected]

1 Introduction

Scientific names of plants and animals have a major role when indexing, query-ing, and integrating information about species. Biologists use scientific nameswhile the vast majority of people use common name equivalents. Contrary tocommon belief, neither the scientific or common names do not identify organ-isms unambiguosly as one name may point to multiple species and one speciesmay have multiple names. This is a problem when combining data from hetero-geneous sources covering applied biological sciences and cultural contents. Thescientific name system differs significantly from common names but they bothchange over time.

Machine-processable ontologies provide a solution for managing parallel views,multiple meanings, and changing information of biological names. They allowunambiguous referring to organisms and semantic enrichment of biological con-tents. We present an ontology model for managing common names of organismsand linking them to scientific names, and a use case of the model for maintainingand publishing Finnish vascular plant names as Linked Open Data.

2 Results

The ontology model for the common names is based on the TaxMeOn meta-ontology [1] for biological names. The simplified structure of the model is pre-sented in Fig. 1, where the core classes are Scientific name, Common name andtheir statuses. The status of a Scientific name indicates if the name is an acceptedor a synonymous one, etc. The Common names (in one or more languages) thatrefer to the same species are connected through a Scientific name.

The model supports the approval process of the common names: 1) a newname is proposed; 2) the name is accepted and its usage becomes recommended;3) the name may become an alternative, if a more suitable name is introduced.The temporal management of the names is based on time stamps created in theapproval process of the names.

We applied the ontology model to a database of the Finnish names of plantscreated by the Finnish biology association Vanamo1. The database contains ca.

1 http://www.vanamo.fi/

JIST 2012 Poster and Demonstration 15

Fig. 1. The ontology model for the common names of organisms. The ellipses representclasses and the arrows depict relations between the classes.

26,000 common names of plants and it was originally only in internal use of theassociation. The database was converted into RDF format based on the ontologymodel.

The resulting ontology of common names of plants is managed in the SAHAmetadata editor2 for collaborative content creation. SAHA also provides a SPARQLendpoint for using the plant names as a service. The ontology is published asLinked Open Data in the Finnish Ontology Library Service ONKI3 [2], whichprovides user interfaces and APIs for accessing and using the plant names inapplications.

The ontology is used by several cultural museums and libraries for annotatingcollections. The model has been adopted as a use case in the research programENVIROFI4.

Acknowledgments This work is part of the National Semantic Web Ontologyproject in Finland FinnONTO5 (2003-2012), funded mainly by the National Technol-ogy and Innovation Agency (Tekes) and a consortium of 38 public organizations andcompanies, and the EU funded research program ENVIROFI. We thank Leo Junikkaand Arto Kurtto for their collaboration.

References

1. Tuominen, J., Laurenne, N., Hyvonen, E.: Biological names and taxonomies on thesemantic web – managing the change in scientific conception. In: Proceedings of theESWC 2011, Heraklion, Greece. pp. 255–269. Springer–Verlag (2011)

2. Viljanen, K., Tuominen, J., Hyvonen, E.: Ontology libraries for production use:The Finnish ontology library service ONKI. In: Proceedings of the ESWC 2009,Heraklion, Greece. pp. 781–795. Springer–Verlag (2009)

2 http://www.seco.tkk.fi/services/saha/3 http://onki.fi/en/browser/overview/kassu4 http://www.envirofi.eu/5 http://www.seco.tkk.fi/projects/finnonto/

JIST 2012 Poster and Demonstration 16

PromoterCAD: Data Driven Design of Plant Regulatory DNA

Robert Sidney Cox III1, Koro Nishikata1, Minami Matsui2, Tetsuro Toyoda1

1RIKEN Bioinformatics and Systems Engineering Division, RIKEN Yokohama Institute 2RIKEN Plant Science Center, Functional Genomics Unit, RIKEN Yokohama Institute

1-7-22, Suehiro-cho, Tsurumi-ku, Yokohama, 230-0045 JAPAN Tel: +81-45-503-9111 ext. 4252 [email protected]

We have collected regulatory sequence and gene expression data for Ara-bidopsis thaliana into a linked data system hosted by LinkData.org, then created a CAD environment for using this data to design synthetically reg-ulated plant promoters. This environment is written in JavaScript and hosted by the companion site App.LinkData.org. There are several ad-vantages of this system. (1) Non-experts can use the CAD environment and linked data to design functional DNA sequences with a fast learning cycle. (2) JavaScript programmers can readily fork and extend functions of the CAD environment with modular functions that perform queries on the da-ta and editing operations on the DNA sequences. Each module searches for different gene properties such as gene expression level in a particular plant tissue or tissues, or phase and amplitude of circadian oscillations. (3) Users can easily upload additional promoter data for use in the CAD environ-ment. Biologists familiar with particular regulatory sequences can easily add them to the design. Researchers can upload additional gene expression and regulatory motif data to generate alternate hypothesis designs. (4) Re-searchers can perform novel analyses on the linked data, and add deriva-tive data for use by the function modules. Mashups of cis-regulatory se-quence databases and gene expression databases (e.g. AtGenExpress, DIURNAL) allow the user to perform advanced queries. Several example promoter designs are being characterized experimentally by Firefly Lucif-erase expression in Arabidopsis.

1. Regulatory DNA design for plant biomass production

A major obstacle for producing biomass from transgenic plants is growth inhibition from recombinant gene expression. This could be overcome by limiting gene expression to a specific tissue type or time of growth. We demonstrate a CAD system for linked genomic and expression databases, to design synthetic plant promoters for the control of gene expression in Arabidopsis thaliana.

JIST 2012 Poster and Demonstration 17

2. LinkData allows mashups between different data sources

We collated previously published genomic and transcriptomic data, including information on 21,000 genes from Arabidopsis thaliana, and 1,410,000 microarray data measurements in 20 growth conditions and 79 tissue organs and developmental stages. These data included gene expression measurements for both plant tissue development and circadian gene response. To each gene’s expression data, we added predicted regulatory sequences in the promoter from two bioinformatic methods.

3. PromoterCAD software incorporates LinkData and visualization into the DNA design process

PromoterCAD provides a user interface for plant promoter design. The user selects a plant tissue and developmental stage or circadian expression condition. Then, the user selects a function module which performs a search across all genes for a particular experimental property of the data, such as high expression in a specific tissue or time of day. More complicated function modules allow the user to search for genes displaying the largest circadian amplitude within a given environmental growth condition (such as light cycle, temperature, nutrition, etc.), or to find genes displaying large relative differences in expression between multiple tissues. Since LinkData is a rapid development Javascript environment, our functions can also be easily modified to create a new search function for the PromoterCAD system. The function modules serve as a data interface for the user to find genes with useful expression profiles.

Once a gene of interest is selected, regulatory motifs can be compiled into an appropriate position of a synthetic promoter. These can be placed in the same position as the natural gene (default), or the user can select a custom position. The cycle of function query, gene and regulatory sequence retrieval, and regulatory motif replacement can be continued to add more functions to the synthetic promoter design.

Features of LinkData.org enhance the DNA design process by helping the user understand the results of each design operation. Embedded visualization tools such as HighChart (http://highsoft.com) and the Arabidopsis eFP browser (http://bar.utoronto.ca/efp/cgi-bin/efpWeb.cgi) allow the user to readily understand gene expression in the different plant tissues and environments. Hyperlinks are provided to the original data sources, including explanations of experimental and analytical methods. Our data processing operations are documented on the LinkData work pages, allowing users to follow and extend our analysis. For users unfamiliar with plant genetics, additional information about the genes, including known functions and associations, are linked to from PromoterCAD.

JIST 2012 Poster and Demonstration 18

LinkData.org synergistically associating RDF data repository and Application repository stimulates positive

feedback of mutual developing data applications

Sayoko Shimoyama, David Gifford, Yuko Yoshida, Tetsuro Toyoda

Bioinformatics And Systems Engineering Division, RIKEN Yokohama Institute 1-7-22, Suehiro-cho, Tsurumi-ku, Yokohama, 230-0045 JAPAN

Tel: +81-45-503-9111 ext. 4252 [email protected]

New linked open data (LOD) techniques make more original scientific and research data available, and modifiable “Forkable Apps” on shared plat-forms make analysis programs ready to be re-used. However, data re-sources and analysis programming are hidden from each other and exter-nal use by separation of publication and access to data repositories. We de-signed the LinkData.org platform so anyone can recognize and test new ways to recombine LOD RDF data and Apps created by others with their original data and developed analysis tools. Because LinkData.org combines both data and apps together, GenoCon.org synthetic biology design contest participants can freely utilize the most optimal prior knowledge and prop-erties, as well as recombine previous programming with their own pro-gramming originally customized and created to match their design con-cepts. The GenoCon.org open data/open design activity demonstrates that this technique allows novel collaboration of data creators and application developers to discover and generate knowledge synergistically using LinkData.org as a linked integrated LOD/App web repository, creation, publication and sharing platform.

1. Mutual visibility enables recognition of new combinations

The Resource Description Framework (RDF) graph structure data format is the standard for sharing data on the web. Scientists and others need a place to publish RDF data for active reuse and development, so it is important to provide tools to transform data to RDF and search RDF data so they can find the best models created by others using the data’s ontological semantic relationships and modify them to cre-ate RDF models optimally suitable for their own data and research. We constructed the LinkData.org (http://linkdata.org) platform to teach anyone to publish, collaborate, and design analysis programing tools using RDF data. LinkData.org uses tables as an easily understood format to input and create RDF data, and simple forkable program-ming techniques to create applications. Hosting both data and apps together promotes useful public RDF data exchange between fields and the creation of new interdiscipli-

JIST 2012 Poster and Demonstration 19

nary fields. This spreads technology for scientific data to other fields, and educates about RDF techniques for any field. Using LinkData.org, researchers can make better apps that benefit from the best data and applications, and by feedback and synergy create the data appropriate for new applications and their own research.

2. LinkData mutual visibility facilitates GenoCon synthetic biology

LinkData.org functions (1) To enter original data, beginning users first enter min-imum needed metadata; then LinkData.org generates and downloads a customized table format template Excel file with RDF properties as column names. The template helps users create RDF using underlying of triple set data relationships (subject, prop-erty, object) defined in an easy to understand table. Users input their data to the tem-plate to create their own data table to upload. (2) LinkData.org suggests standard property options such as Dublin Core, RDFS, FOAF and OBO. LinkData.org also automatically creates URI from any user entered text string. (3) LinkData.org auto-matically converts subjects and properties chosen as a literal to URI even if the user does not have any knowledge of URI. (4) Users easily compare table data format to visualize RDF data graph format structure, and re-use pre-existing schema when op-timal to define their own data constructions. GenoCon.org Functions Useful capa-bilities make the GenoCon.org (http://genocon.org) synthetic design activity possible for participants from High School to Researchers. (5) LinkData.org provides a public space online to enter the GenoCon.org contest where they convert and publish data sets as RDF format works. Anyone can download previous data, templates, and RDF format. (6) Data connection visualization helps active use of others’ data via linked data. By representing RDF Works as nodes, published RDF Works can be analyzed and connected to strongly related domains to construct an automatic LOD Cloud with LinkData.org and external domains. (7) The LinkData.org API function makes it easy for developers to access data in LinkData.org externally. External applications can access the system to receive data as TSV, RDF, RDF/ JSON, and RSS. (8) A public App place http://app.linkdata.org provides for people with applications that use RDF but without a server, easy to use App creation tools for input RDF, and a forum of published useful forkable example programs to analyze and process GenoCon.org data that participants can easily copy and modify. LinkData.org integrates reports explaining how programs and designs work for sharing and collaborating with other GenoCon.org community members. LinkData.org provides a new framework for collaboration in which the process for optimizing of invention is carried out by nu-merous participants in an open manner, rather than by members of closed groups.

Reference

[Toyoda 2011] T. Toyoda, et al.: Methods for Open Innovation on a Genome – Design Plat-form Associating Scientific, Commercial, and Educational Communities in Synthetic Biol-ogy, Methods in Enzymology., Vol. 498, D189-203, (2011)

JIST 2012 Poster and Demonstration 20

Qualitative Description of Time based onOntology in the Domain of Apoptosis Signaling

Chihiro Yamauchi, Kazuaki Kojima, and Tatsunori Matsui

Graduate School of Human Science, Waseda University,Mikajima 2-579-15, Tokorozawa, SAITAMA, 359-1192, JAPAN

Keywords: apoptosis, ontology, qualitative simulation, causality, time concept

1 Introduction

Adapting the ontological method enables translation of qualitative knowledgein the biological literature to computationally readable data. Some ontologieshave been proposed and implemented in the biology domains[1], any method toconstruct simulations driven by biological knowledge in the ontologies has notbeen established yet. De Beule[2] has proposed an ontology for temporal con-cepts, which aims to investigate syntax and semantics of linguistic constructionsabout time. However, this ontology does not include concepts such as time scalesneeded in describing chains of events involved in different layers.. To overcomethe issue, this study proposed a method to implement an ontological simula-tion of a biological phenomenon called apoptosis. In building of our ontologyfor apoptosis, we adopted qualitative physics to represent the concepts of timewhich enables to understand causal relationships and temporal intervals betweenconcepts in specific contexts. We then performed small simulations to reproducean instance of apoptotic signal transduction by using our method.

2 Qualitative Description of Time for Ontology-basedSimulation

To begin with, we built an ontology for one type of apoptosis signaling(Fig.1).We incorporated knowledge collected from the literature on biology studies intoour ontology by using an ontology building tool Hozo.

In ontologies, concepts and the relationships among those concepts are de-scribed without any quantitative values. Thus, to construct an ontology-basedsimulation, it is necessary to qualitative description of time based on ontologyto introduce time concepts such as lengths of time and temporal transitionsrepresenting relationships of concepts in the ontology. Therefore, we define cat-egories of time flows, which indicate flows of actions in events in different layers.Our apoptosis signaling ontology identifies four time-scales in indexing causal-ity among concepts: (1)Simultaneous time scales indicate relationships betweenconcepts of minimal events, in which the events simultaneously occur but causal-ities among them are assumed; (2)Local Intra-Component time scales indicate

JIST 2012 Poster and Demonstration 21

Fig. 1. The ontology in the domain of apoptosis

relationships in minimal events, in which each pair of events has an explicitcausality; (3)Intra-Component time scales indicate relationships in local events;and (4)Inter-Component time scales indicate relationships in global events. Eachevent involved in a system plays one of two causal roles: a cause or an effect.However, because it is impossible to identify causality by only the causal spec-ification, we adopt correspondence relations among class constraints to identifycausality.

To verify our method, we performed two sets of simulations that reproducethe Fas signaling pathway, which is one type of apoptosis, based on the ontol-ogy we developed. One of the sets was performed by using the complete versionof our ontology, whereas the other used a defective one which was provided byintentionally removing some pieces of knowledge from our ontology. The resultsdemonstrated the validity of our method by successfully reproducing the chemi-cal chain reactions. It is also indicated the possibility that simulations based onour framework can predict unknown factors and their relationships in apoptosis.

3 Ongoing Works

Our simulations succeeded in the small domain of apoptosis, however, the gen-erality of our method is limited. It cannot pose any hypothesis about unknownfactors rising in a simulation. Therefore, we are planning to expand our methodin order to compose more compotent simulations. Thus, we are expanding theontology into programmed cell death from apoptosis. We are altering our ontol-ogy to a device ontology by incorporating functional concepts.

References

1. Takai, T., Mizoguchi, R.: Ontological Integration of Data Models for Cell SignalingPathways by Defining a Factor of Causality Called ‘Signal’: Genome Informatics.vol. 15, pp. 255–265 (2004)

2. De Beule, Joachim.: Simulating the syntax and semantics of linguistic constructionsabout time: Evolutionary Epistemology, Language and Culture Theory and DecisionLibrary A: vol. 39, pp. 407–428 (2006)

JIST 2012 Poster and Demonstration 22

Generating LOD from Web: A Case Study onBuilding Integrated Museum Collection Data

Fuyuko Matsumura1, Fumihiro Kato2, Tetsuro Kamura3,4,Ikki Ohmukai1,3, and Hideaki Takeda1,3

1 National Institute of Informatics, {fuyuko, i2k, takeda}@nii.ac.jp2 Research Organization of Information and Systems, [email protected]

3 The Graduate University for Advanced Studies4 Tokyo University of the Arts, [email protected]

Abstract. In this paper, a workflow was developed to enable efficientdata extraction from web and integration them with the cooperation ofweb developers and data professionals who specialized in a certain field.This paper introduces how we applied the workflow to build Linked Datafor “LODAC Museum”, a dataset on museum collection data in Japan.

1 Introduction

The Linked Open Data for ACademia (LODAC)1 project has started to integrateand publish academic information of Japan as Linked Open Data (LOD) datasetsto enhance interdisciplinary sharing and reuse of various datasets. Linked DataIntegration Framework (LDIF)[2] aims to automatically integrate and map datato common vocabularies using RDF data included in websites; however, most ofwebsites do not have RDF data and how to generate LOD from websites writtenin HTML is important to increase useful LOD. Therefore, a workflow to generateLOD from web is developed in this paper. It is characterized by separating ofweb programming and metadata mapping. This paper introduces the workflowto generate LOD consisting of key-value pairs taking “LODAC Museum”[1], adataset on museum collection data in Japan as an example.

2 The Workflow for Generation of Linked Data onMuseum Collection

Museum collection data are transformed into LODAC Museum by the followingsteps. The original collection data are obtained by web scraping for museumsites and translated into Resource Description Framework (RDF).

1. Extracting data from web pages: Collect key-value pairs data from web pagesof artworks in multiple sources using Apache Nutch and Solr.

2. Mapping vocabularies: Map keys in extracted data to a common schema bymuseum professionals.

3. Integrating unique items: Identify the same items (artwork, creator, museum)across museum collections and associate them to single identifiers.

1 http://lod.ac

JIST 2012 Poster and Demonstration 23

4. Publishing: Publish data as Linked Data with permalinks that work as iden-tifiers for items, accessible through the SPARQL endpoint.

5. Versioning: Store snapshots of crawled original HTML files, extracted key-value pairs, and transformed RDF files into the git repository to enable totrace the process of data transformation and retry data generation fromoriginal files.

The main feature of our system is that the key-value pairs extraction frommuseum websites and metadata mapping are completely divided and can beindependently processed, because it is assumed that two different types of pro-fessionals are needed. Prior to the data extraction from HTML, it is needed tofind parts which include key-value pairs from HTML and write XPath for theparts into the setting files of Nutch by web developers. On the other hand, whenextracted key-value pairs are mapped to RDF, it is desired that the professionalsin the target domain handle metadata mapping because vocabularies and sen-tences of websites are possibly quite hard to understand for people who are notmuseum professionals.

When the construction of LODAC Museum was started, this workflow hadnot implemented yet and scraped data were manually transformed into RDF inprograms. After the workflow was implemented, it additionally extracted 578,500key-value pairs from websites of 38 institutions. Since some of them were mappedto multiple properties, those data were converted into 890,588 RDF triples ac-cording to the mapping rule, and appended to the RDF store.

3 Discussion

Some methods have been proposed to generate RDF from websites and some ofthem are trying to automatically extract desired information from websites usingmachine learning techniques. However, expressions of collection data are differentbetween each website of museums in Japan. For instance, when collection datais expressed as a table, some sites display the creator’s name with the title of awork in the same cell while others display the birth year with the title. Moreover,each museum uses different terms to describe each attribute. Therefore, it seemssuitable that museum professionals manually implement mapping rules and thosegenerated rules can also be used as training data for automatic extraction ofmuseum collection data from websites in the future.

Furthermore, we plan to develop a user interface for metadata mapping andsemi-automatic data integration based on text matching; moreover, further ver-ification of our data conversion system are needed using other datasets.

References1. Kamura, T., Takeda, H., Ohmukai, I., Kato, F., Takahashi, T., and Ueda, H.: Study Support and

Integration of Cultural Information Resources with Linked Data, Proc. of the 2nd InternationalConference on Culture and Computing, pp.177–178 (2011)

2. Schultz, A., Matteini, A., Isele, R., Bizer, C. and Becker, C.: LDIF - Linked Data IntegrationFramework, Proc. of the 2nd International Workshop on Consuming Linked Data (COLD2011)(2011)

JIST 2012 Poster and Demonstration 24

User-Adaptive Technology Intelligence System

Seungwoo Lee1, Do-Heon Jeong1, Jinhyung Kim1, Myunggwon Hwang1, Minhee Cho1, Sa-Kwang Song1, Soon-Chan Hong1, Hanmin Jung1

1 Department of S/W Research, Korea Institute of Science and Technology Information,

245 Daehak-ro, Yuseong-gu, Daejeon, 305-806, Korea {swlee, heon, jinhyung, mgh, mini, esmallj, schong, jhm}@kisti.re.kr

Abstract. In this paper, we present a new technology intelligence system enhanced with mobility and user-adaptation and guidance to support technology strategy establishment of small and medium-sized companies. The system has major approaches of catching users’ intention, giving intuitive insight to users, supporting mobile environment in the technology intelligence field.

Keywords: Business Intelligence, Technology Intelligence, Mobile BI, User Adaptation, User Guidance, Information Analysis

1 Introduction

As the business and industry grows, the importance of technology strategy also grows bigger. Especially, to establish technology strategy, each organization should perform technology planning through active technology intelligence. Technology intelligence refers to activities for supporting an organization’s decision-making process by collecting and forwarding information on new technologies [1]. To support technology planning, especially, decision-making of executives, we have developed InSciTe Adaptive, which is a technology intelligence service enhanced with mobility and user-adaption, by discovering knowledge resources and information analysis services with text mining and Semantic Web technologies.

2 Basic Approaches

There are three major concepts for developing InSciTe Adaptive: user-adaptation/guidance, insight, and mobility. ▪ User-adaption/guidance: Conventional information systems have fixed service flow

and support only static services. Whereas the proposed system provides dynamic service flow with different start point by understanding user’s intention and guides users to the final goal – i.e., report – of the system [2].

▪ Insight: Current analytics only focuses on scientometrics but the proposed system provides insight to directly give any conclusions from the analysis.

JIST 2012 Poster and Demonstration 25

▪ Mobility: Mobile business intelligence (BI) can bring competitive advantage and the broad adoption of mobile BI appears inevitable [3].

3 Technology Intelligence Services

Technology planning is generally performed in five steps [4]. Among them, the first four steps require information to support decision-making by executives. We examined what information is required in each step and designed eight services to provide such information. The followings are some of the major services. ▪ Technology trend [5]: This service explores technology growth level and speed, like

Gartner’s Hype Cycle. To analyze the time series information from source documents more precisely, a function for time calculation has been improved by considering discontinuous problem of time series information between the last and first month of years and overlapping half of each year with half of the next year.

▪ Convergence technology: This service aims to recommend any pairs of technologies which could be converged. Convergence technology can be defined as a pair of technologies having more than two elementary technologies in common and creating a synergy effect through converging. For example, automobile industry combined with augmented reality technology can create new values and increase the company’s market share more than ever before. This service analyzes very large amount of data and helps companies finding some novel item for their future growth.

▪ Agent level: This aims to compare current technology levels among countries or companies. It further analyzes the levels with multi-dimensional views such as academy and business.

▪ Agent partner: This aims to find and recommend current or potential competitive or collaborative agents that conduct research and development in similar domains. This information is obtained by analyzing semantic relationships among agents from technology literatures, not measuring co-occurrence, expecting more accurate analysis results.

References 1. Mortara, L., Kerr, C., Probert, D., Phaal, R.: Technology Intelligence-Identifying threats and

opportunities from new technologies, University of Cambridge Institute for Manufacturing, ISBN-978-1902546513 (2007)

2. Jeong, D.H., Kim, J., Hwang, M., Lee, S., Jung, H.: User-centered Mobile Technology Intelligence System. Proceedings on Advanced Computer Science and Technology (AST) 2012, Beijing, China (2012)

3. Mobile BI 2012 : Accelerating Business on the Move, Home page at, http://blog.bellasolutions.com/2012/07/03/mobile-bi-2012-accelerating-business-on-the-move/ (last accessed at August 25, 2012)

4. Lee, S., Cho, M., Song, S.K., Hong, S.C., and Jung, H.: Strategy for Developing Technology Planning Support System, Proceedings of International Conferences on AST, EEC, MMHS, and AIA 2012, pp.190-193, China, 2012.

5. Kim, J., Hwang. M., Jeong. D.H., Jung. H.: Technology Trends Analysis and Forecasting Application based on Decision Tree and Statistical Feature Analysis. Expert Systems with Applications, 39, 12618-12625 (2012)

JIST 2012 Poster and Demonstration 26