View
6
Download
1
Category
Tags:
Preview:
DESCRIPTION
Citation preview
1
Practical semantic web mining platform
2
What is?
SWM includes: Semantic Web and RDF Regular Expressions, Web Agents HMMs and Information Extraction Rule Mining, F-Logic, Description Logic Information Integration Planning for Data Gathering Ontologies, Learning, Editing Text Classification Applications: E-Commerce Web services Semantic Web Browser etc
3
Some Background
4
Semantic Annotation
XML textMulti-Media
Deep Web
Web Services
Semantic Content
Semantic Indexing/
Integration
Indexing
Semantic Retrieval
Reason
Learn annotation
Efficient indexing engine
Learn Reasoning
rule
Semantic Retrieval
Ontology base
Ontology base
Learn Mapping/
Link
Semantic Crawler
texttextWWW
??
Active Learning Machine Learning: BC, NN, GA, AR
Multi-view Learning Multi-view detection
Application
Make it efficient in
domainActive
learning driven OM/
OL
Ontology based
Summary
5
Algorithm/theory of ML
Techniques of Machine Learning /Data Mining Bayesian classification/NN/GA Statistical technique Active Learning, Multi-View Learning Risk Minimization/Maximum Entropy Model
6
Annotation
Multiple Sources Annotation tools Using ML to automate the process
Learn annotation rule Active Learn Driven (reduce training sample) Multi-view (improve performance) Multi-view detection (improve again)
7
Mapping & Link
Mapping Find mapping points Find Complex mapping points (subof, superof, 5*(a+b), even conj
unct of, etc) Translate instances based on Mapping
Link Find Link Points Find Complex Links Integrate Ontology
Mapping/Link detection.
8
9
Mapping & Link
Multi-view name Instance Relationship, etc
Active learning. Ask the user to specify the most confused mapping/link
Multi-view detection. Improve the performance
10
Indexing
What is the difference between SI and Text indexing/XML indexing?
How to define the data structure of SI? (note that such structure should represent the characters of SW & Ontology)
How to make it efficient? (how to compare to others work? Are there some works on it?)
11
Semantic Retrieval
Domain vs. General Make use of SI & Ontology to improve the
performance. Make use of reasoning technique to improve.
12
Reasoning
Reasoning rules learning Example: Resumes, Jobs
How to find the most appropriate job for individual? How to find the most appropriate person for specified job? Define the Rules: if Person.Age(x)<30 then Job(y).Salary>8
000 Rule Discovery
13
Applications
Jobs & Resumes E-Commerce. E.g. Travel, Tickets, etc. Personal Assistant. Track ones work and
interest to find new information automatically. Semantic Web Browser
14
Free discussion for the platform
15
Aspects
Data Content
what will to do, what can do, what not. Semantic web, semantic web services
Theory->>may be basic for SCI Practical application!!!! important Proposal & Schedule.
16
Data
Data preparation Domain: job&resume, software (from sourceforge), travel w
eb services. ontology. Metadata & instance
Works: metadata definitionintegrate a ontology editor (protégé or
ontoedit or orient) Instance database, use technique of annotation or IE to
extract information from specific web sites. How to save use jena to save the data in database and q
uery it by RQL indexing?
17
Content
Ontology building, knowledge base buildinguse wordnet to assist
Composition for web services. If not web services, what we can do, such as jobs & resumes.
Annotation & deep annotation. Web service annotation, text annotation, even image annotation.
Mapping. concept mapping, instance mapping. translation, merge, meaning negotiation(mapping representation)
Data Integration. Combine annotation and mapping
18
Content
Semantic search engine. Its definition? Simple search=data search, then how to make use of ontology. Reasoning? How to make it practical, that is, how to do it in our
domain. Shall it be a general one or domain one? Ontology summary. Need a better name.
output knowledge in ontology by NLP. Indexing? Tools integration
19
Theory
ML, data mining. Inductive learning: NN, Bayes, SVM, GA. Code them or on
e of them by ourselves. It will cost our time, but it doesn’t mean waste time.
Transductive learning. Selective learning. More general theory, risk minimization. Note that RM is an
algorithm. It is a framework for ML. Any learning algorithms can be used as its implementation.
Active learning + multi-view Reduce the samples of training. Improve the precision.
20
Practical application
Jobs & resumes Targets: to find the best qualified resumes/person
s for specified job or to find the best jobs for a person.
Software from sourceforge, etc. Aim at software composition. web service com
position. Software search
21
Practical application
more?
22
Proposal & schedule
Why proposal? Why schedule? Can we work together for the possible
platform?
23
Further Reading
24
Further reading on Semantic Annotation A. Kiryakov, B. Popov, et al. Semantic Annotation, Indexing, and Retrieval. 2nd International Semantic W
eb Conference (ISWC2003), http://www.ontotext.com/publications/index.html#KiryakovEtAl2003 [Alani, 2003] Alani, H., Kim, S., Millard, D., Weal, M., Hall, W., Lewis, P. and Shadbolt, N. Automatic Ontol
ogy-Based Knowledge Extraction from Web Documents. IEEE Intelligent Systems 18(1):pp. 14-21. [Bemjamins, 2002]Richard Benjamins, Jesus Contreras. White Paper Six Challenges for the Semantic W
eb. Intelligent Software Components. Intelligent software for the networked economy (isoco). April, 2002. [Berners-Lee, 1999] Tim Berners-Lee, Mark Fischetti (Contributor), Michael L. Dertouzos; “Weaving the
Web: The Original Design and Ultimate Destiny of the World Wide Web”; 1999. [Califf, 1998] Califf M. E. (1998), Relational Learning Techniques for Natural Language Information Extrac
tion, Ph.D. thesis, Univ. Texas, Austin, 1998 [Ciravegna, 2001] Fabio Ciravegna. (LP)2, an adaptive algorithm for information extraction from web-relat
ed texts. In Proceedings of the IJCAI-2001 Workshop on Adaptive Text Extraction and Mining held in conjunction with 17th International Joint Conference on Artificial Intelligence (IJCAI), Seattle, Usa, August 2001.
25
Further reading on Semantic Annotation [Cohen, 2001] W. Cohen, L. Jensen, A structured wrapper induction system for extracting i
nformation from semi-structured documents, in: Proceedings of the Workshop on Adaptive Text Extraction and Mining (IJCAI’01), 2001.
[Cunningham. 2002] H. Cunningham, D. Maynard, K. Bontcheva, and V. Tablan. GATE: A Framework and Graphical Development Environment for Robust NLP Tools and Applications. In Proceedings of the 40th Anniversary Meeting of the Association for Computational Linguistics, 2002.
[Czejdo, 2000] B. Czejdo, J. Dinsmore, C. H. Hwang, R. Miller, M. Rusinkiewicz. Automatic Generation of Ontology Based Annotations in XML and Their Use in Retrieval Systems. Proceedings of the First International Conference on Web Information Systems Engineering (WISE'00)-Volume 1. IEEE Computer Society Washington, DC, USA. 2000. 296-300
[Dhamankar, 2004] Robin Dhamankar, Yoonkyong Lee, AnHai Doan, Alon Halevy, Pedro Domingos. iMAP: Discovering Complex Semantic Matches between Database Schemas. SIGMOD 2004 June 1318, 2004, Paris, France.
26
Further reading on Semantic Annotation [Dill, 2003] Stephen Dill, Nadav Eiron, David Gibson, Daniel Gruhl, R. Guha, Anant Jhingran, Tapas Kanu
ngo, Kevin S. McCurley, Sridhar Rajagopalan, Andrew Tomkins, John A. Tomlin, Jason Y. Zien. A case for automated large-scale semantic annotation. Journal of Web Semantics: Science, Services and Agents on the World Wide Web. Published by Elsevier B.V. July, 2003:115-132
[Eriksson, 1999] H. Eriksson, R. Fergerson, Y. Shahar, and M. Musen. Automatic generation of ontology editors. In Proceedings of the 12th Banff Knowledge Acquisition Workshop, Banff Alberta, Canada, 1999.
[Handschuh, 2002] S. Handschuh, S. Staab, F. Ciravegna, S-CREAM—semi-automatic creation of metadata, in: Proceedings of the 13th International Conference on Knowledge Engineering and Management (EKAW 2002), Siguenza, Spain, 2002, pp. 358-372.
[Heflin, 2000] J. Heflin, J. Hendler, Searching the web with shoe, in: AAAI-2000 Workshop on AI for Web Search, Austin, Texas, 2000.
[Kahan, 2001] J. Kahan, M.-R. Koivunen, Annotea: an open RDF infrastructure for shared web annotations, in: World Wide Web, 2001, pp. 623-632.
27
Further reading on Semantic Annotation [Kogut, 2001] P. Kogut, W. Holmes, AeroDAML: applying information extraction to generate DAML annota
tions from web pages, 2001. [Kushmerick, 1997] N. Kushmerick, D.S. Weld, R.B. Doorenbos, Wrapper induction for information extract
ion, in: Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI), 1997, Nagoya, Japan, pp. 729-C737.
[Leonard, 2001] T. Leonard, H. Glaser, Large scale acquisition and maintenance from the web without source access, http://www. semannot2001.aifb.uni-karlsruhe.de/positionpapers/Leonard. pdf, 2001.
[Lerman, 2001] K. Lerman, C. Knoblock, S. Minton, Automatic data extraction from lists and tables in web sources, in: IJCAI-2001 Workshop on Adaptive Text Extraction and Mining, Seattle, WA, August 2001.
[Li, 2001] L.Z. Jianming Li, Y. Yu, Learning to generate semantic annotation for domain specific sentences, in: Knowledge Markup and Semantic Annotation Workshop in K-CAP 2001, Victoria, BC, 2001.
[Popov, 2003] Borislav Popov, Atanas Kiryakov, Dimitar Manov, Angel Kirilov, Damyan Ognyanoff, and Miroslav Goranov. Towards Semantic Web Information Extraction. In ISWC'03 Workshop on Human Language Technology for the Semantic Web and Web Services, 2003.1-21
28
Further reading on Semantic Annotation [Schaffer, 1993] Selecting a classification method by cross-validation. Machine Learning, 1
3(1):135-143 [Soderlan, 1999] Soderland, S. Learning information extraction rules for semi-structured an
d free text. Machine Learning. 1999,1. 1-44 [Soo, 2003] Von-Wun Soo, Chen-Yu Lee, Chung-Cheng Li, Shu Lei Chen and Ching-chih
Chen. Automated Semantic Annotation and Retrieval Based on Sharable Ontology and Case-based Learning Techniques. Proceedings of the 2003 Joint Conference on Digital Libraries. 2003 IEEE.
[Vargas-Vera, 2001] M. Vargas-Vera, E. Motta, J. Domingue, S. Buckingham Shum, and M. Lanzoni. Knowledge Extraction by using an Ontology-based Annotation Tool. In K-CAP 2001 workshop on Knowledge Markup and Semantic Annotation, Victoria, BC, Canada, October 2001.
[Vargas-Vera, 2002] M. Vargas-Vera, E. Motta, J. Domingue, M. Lanzoni, A. Stutt, F. Ciravegna, MnM: ontology driven semiautomatic and automatic support for semantic markup, in: Proceedings of the 13th International Conference on Knowledge Engineering and Management (EKAW 2002), Siguenza, Spain, 2002.
29
Further reading on Ontology Mapping [1] Berger, J. Statistical decision theory and Bayesian analysis. Springer-Verlag. 1985 [2] Calvanese, D.; De Giacomo, G.; and Lenzerini, M. 2002. A framework for ontology integration. In Cruz,
I.; Decker, S.; Euzenat, J.; and McGuinness, D., eds., The Emerging Semantic Web. IOS Press. 201-214. [3] H. Cunningham, D. Maynard, K. Bontcheva, and V. Tablan. GATE: A Framework and Graphical Devel
opment Environment for Robust NLP Tools and Applications. In Proceedings of the 40th Anniversary Meeting of the Association for Computational Linguistics, 2002.
[4] Robin Dhamankar, Yoonkyong Lee, AnHai Doan, etal. iMAP: Discovering Complex Semantic Matches between Database Schemas. Proceedings of the 2004 ACM SIGMOD International Conference on Management of Data, 2004. Paris, France: ACM Press.
[5] H. Do and E. Rahm. Coma: A system for flexible combination of schema matching approaches. In Proc. of VLDB-2002.
[6] Doan, A.H., P. Domingos, A. Halevy: Reconciling Schemas of Disparate Data Sources: A Machine-Learning Approach. SIGMOD 2001.
[7] A. Doan, J. Madhavan, P. Domingos, and A. Halevy. Learning to map between ontologies on the semantic web. In Proceedings of the World-Wide Web Conference (WWW-2002), pages 662-673. ACM Press, 2002.
30
Further reading on Ontology Mapping [8] J. Kang and J. Naughton. On schema matching with opaque column names and data values. In Proc.
of SIGMOD-2003. [9] W. Kim and J. Seo. Classifying schematic and data heterogeneity in multidatabase systems. IEEE Co
mputer, 1991, 24(12):12-18 [10] J. Madhavan, P. Bernstein, and E. Rahm. Generic schema matching with cupid. In Proc. of VLDB-20
01. [11] A. Maedche, B. Moltik, N. Silva and R. Volz. MAFRA -An Ontology MApping FRAmework in the Cont
ext of the Semantic Web. In Proceeding of the EKAW'2002, Siguenza, Spain. 2002. [12] Alexander Maedche, Steffen Staab: Ontology Learning for the Semantic Web. IEEE Intelligent Syste
ms 16(2): 72-79 (2001) [13] Jayant Madhavan, Philip Bernstein, Kuang Chen, Alon Halevy, and Pradeep Shenoy. Corpus based
schema matching. In Proc. of the IJCAI-03 Workshop on Information Integration on the Web (IIWeb-03), 2003.
[14] McGuinness D., Fikes R., Rice J., and Wilder S. :An environment for merging and testing large ontologies. Proceedings of the 7th International Conference on Principles of Knowledge Representation and Reasoning. Colorado, USA.
31
Further reading on Ontology Mapping [15] S. Melnik, H. Molina-Garcia, and E. Rahm. Similarity flooding: a versatile graph matching algorithm. I
n Proc. of ICDE-2002. [16] N. F. Noy and M. A. Musen. PROMPT: Algorithm and Tool for Automated Ontology Merging and Alig
nment. In Proc. of AAAI-2000, pages 450-455, 2000. [17] Nuno Silva and Joao Rocha. Semantic Web Complex Ontology Mapping. IEEE/WIC International Co
nference on Web Intelligence (WI'03) October 13-17, 2003 Halifax, Canada:82-100 [18] Omelayenko, B. RDFT: A Mapping Meta-Ontology for Business Integration; Workshop on Knowledge
Transformation for the Semantic Web (KTSW 2002) at ECAI'2002. Lyon, France; 2002:76-83 [19] Palopoli, L., G. Terracina, D. Ursino: The System DIKE: Towards the Semi-Automatic Synthesis of C
ooperative Information Systems and Data Warehouses. ADBIS-DASFAA 2000, 108¡§C117 [20] Park, J. Y., Gennari, J. H. and Musen, M. A.; "Mappings for Reuse in Knowledge-based Systems"; 11
th Workshop on Knowledge Acquisition, Modelling and Management (KAW 98); Banff, Canada; 1998. [21] Patrick. P, Dekang. L. Discovering Word Senses from Text. In Proceedings of ACM SIGKDD Confere
nce on Knowledge Discovery and Data Mining 2002:613-619.
32
Further reading on Ontology Mapping [22] Richard Benjamins, Jes¡§?s Contreras. White Paper Six Challenges for the Semantic Web. Intelligent
Software Components. Intelligent software for the networked economy (isoco). April, 2002. [23] E. Rahm and P. A. Bernstein. A survey of approaches to automatic schema matching. The VLDB Jou
rnal, 10:334-350, 2001. [24] Tim Berners-Lee, Mark Fischetti (Contributor), Michael L. Dertouzos; "Weaving the Web: The Original
Design and Ultimate Destiny of the World Wide Web"; 1999. [25] K. M. Ting and I. H. Witten. Issues in stacked generalization. Journal of Artificial Intelligence Researc
h, 10:271-289, 1999. [26] Wache, H.; Voegele, T.; Visser, U.; Stuckenschmidt, H.;Schuster, G.; Neumann, H.; and Huebner, S.
2001. Ontology-based integration of information - a survey of existing approaches. In Proc. of IJCAI 2001 Workshop on Ontologies and Information Sharing.
[27] Wiesman, F., Roos, N., and Vogt, P. (2001). Automatic ontology mapping for agent communication. Technical report.
[28] L. Xu and D. Embley. Using domain ontologies to discover direct and indirect matches for schema elements. In Proc. of the Semantic Integration Workshop at ISWC-2003.
33
Further Reading on Machine Learning Muslea. Multi-view plus active learning. (thesis) Tom M. Mitchell. Machine Learning. Richard O. Duda. Pattern Classification. (Second Ed
ition) Zhai-Xiang Chen. Risk Minimization based Informati
on Retrieval. (thesis) Wrapper Induction. Several thesis: rapier, etc Data Mining. Han,
Recommended