14
Jon Atle Gulla Språkteknologi og innovasjon 1 Språkteknologi i industrielle anvendelser Or: How we have commercialized linguistic technologies 1. Linguistics in search 2. Semantics for interoperability Jon Atle Gulla Norwegian University of Science and Technology, Trondheim, Norway Email: [email protected] 3. Ontologies in process mining 4. Linguistics in news reporting

Jon Atle GullaSpråkteknologi og innovasjon1 Språkteknologi i industrielle anvendelser Or: How we have commercialized linguistic technologies 1. Linguistics

Embed Size (px)

Citation preview

Page 1: Jon Atle GullaSpråkteknologi og innovasjon1 Språkteknologi i industrielle anvendelser Or: How we have commercialized linguistic technologies 1. Linguistics

Jon Atle Gulla Språkteknologi og innovasjon 1

Språkteknologi i industrielle

anvendelser

Or: How we have commercialized

linguistic technologies1. Linguistics in search1. Linguistics in search

2. Semantics for interoperability

2. Semantics for interoperability

Jon Atle Gulla

Norwegian University of Science and Technology, Trondheim, Norway

Email: [email protected]

3. Ontologies in process mining

3. Ontologies in process mining

4. Linguistics in news reporting

4. Linguistics in news reporting

Page 2: Jon Atle GullaSpråkteknologi og innovasjon1 Språkteknologi i industrielle anvendelser Or: How we have commercialized linguistic technologies 1. Linguistics

Who am I? Professor, Information Systems group, IDI/NTNU

Education:Siv.ing./dr.ing. (information systems, NTH)Cand.philol. (linguistics, AVH)MSc (management, London Business School)

Work experience:Fast Search & Transfer, Munich (linguistics in search) Norsk Hydro, Brussels (enterprise systems)GMD, Darmstadt (information retrieval)

Field of research: Search technologies Semantic Web Social Web Sentiment analysis and recommendations

Jon Atle Gulla ICEIS 2008 2

Page 3: Jon Atle GullaSpråkteknologi og innovasjon1 Språkteknologi i industrielle anvendelser Or: How we have commercialized linguistic technologies 1. Linguistics

1. The FAST Alltheweb.com site2000: Alltheweb.com was one of the largest search engines on the InternetFAST acquired Elexir Sprachtechnologie in MunichIntended to add linguistics to search engine

Query

Retrieved documents

Jon Atle Gulla Språkteknologi og innovasjon

Page 4: Jon Atle GullaSpråkteknologi og innovasjon1 Språkteknologi i industrielle anvendelser Or: How we have commercialized linguistic technologies 1. Linguistics

Linguistic Techniques in FAST Linguistics in search:

Documents Categories ofdocuments

<none>Searchoptions

Category-based selection

Allselected

Categorizing techniques

Reduced search space

Relevant documents

Transformeddocuments

Query Transformed query

Content-based search

Keyword-based search

Transformational techniques

Increased semantics

Presentational techniques

List of documents

Presentation ofdocument list

Content-based access

Title-based access

Improved transparency

Language identificationSpam detectionTopic categorization

LemmatizationPhrasingAnti-phrasing

Clustering

Jon Atle Gulla Språkteknologi og innovasjon

Page 5: Jon Atle GullaSpråkteknologi og innovasjon1 Språkteknologi i industrielle anvendelser Or: How we have commercialized linguistic technologies 1. Linguistics

The FAST Experience

Linguistics a small part of a large system Linguistics as behind-the-scene technology Linguistics not a major breakthrough

Linguistics is not easy: Data-intensive Only statistical approaches feasible at the time

Jon Atle Gulla ICEIS 2008 5

What happened to FAST?2003: Internet part sold to Overture (Yahoo)2009: Enterprise part sold to Microsoft

What happened to FAST?2003: Internet part sold to Overture (Yahoo)2009: Enterprise part sold to Microsoft

Page 6: Jon Atle GullaSpråkteknologi og innovasjon1 Språkteknologi i industrielle anvendelser Or: How we have commercialized linguistic technologies 1. Linguistics

2. Semantics in Interoperability Semantic Web:

Adding semantics to data/services for humans and computers to communicate better

Ontology: Explicit representation of a shared conceptualization (domain terminology model)

Semantic markup languages for ontology building (OWL, RDF)

2003: Petromax IIP project for construction of ontology for the oil & gas sector (based on ISO15926)

2011: EU LinkedDesign project for use of ontologies in manufacturing processes

Jon Atle Gulla ICEIS 2008 6

Page 7: Jon Atle GullaSpråkteknologi og innovasjon1 Språkteknologi i industrielle anvendelser Or: How we have commercialized linguistic technologies 1. Linguistics

Jon Atle Gulla ICEIS 2008 7

Silly Semantic Conflicts Prevent Data harmonizationMean time between failure

1 “A period of time which is the mean period of time interval between failures”

2 “The time duration between two consecutive failures of a repaired item” (International Electrotechnical Vocabulary online database)

3 “The expectation of the time between failures” (International Electrotechnical Vocabulary online database)

4 “The expectation of the operating time between failures” (MIL-HDBK-29612-4)

5 “Total time duration of operating time between two consecutive failures of a repaired item” (International Electrotechnical Vocabulary online database)

6 “Predicts the average number of hours that an item, assembly, or piece part will operate before it fails” (Jones, J. V. Integrated Logistics Support Handbook, McGraw Hill Inc, 1987)

7 “For a particular interval, the total functional life of a population of an item divided by the total number of failures within the population during the measurement interval. The definition hoolds for time, rounds, miles, events, or other measure of life units”. (MIL-PRF-49506, 1996, Performance Specification Logistics Management Information)

8 “The average length of time a system or component works without failure” (MIL-HDBK-29612-4)

Even simple terms aremisunderstood

Page 8: Jon Atle GullaSpråkteknologi og innovasjon1 Språkteknologi i industrielle anvendelser Or: How we have commercialized linguistic technologies 1. Linguistics

Jon Atle Gulla ICEIS 2008 8

<owl:Class rdf:about="#CHRISTMAS_TREE">…<dc:description rdf:datatype="http://www.w3.org/2001/XMLSchema#string"> An artefact that is an assembly of pipes and piping parts, with valves and associated control equipment that is connected to the top of a wellhead and is intended for control of fluid from a well.</dc:description><dc:title rdf:datatype="http://www.w3.org/2001/XMLSchema#string"> CHRISTMAS TREE</dc:title>…<rdfs:subClassOf rdf:resource="#ARTEFACT"/></owl:Class>

OWL petroleum ontology

Page 9: Jon Atle GullaSpråkteknologi og innovasjon1 Språkteknologi i industrielle anvendelser Or: How we have commercialized linguistic technologies 1. Linguistics

SemanticWeb Lessons Learned

Data integration and harmonization improved in sector

But: Demanding and complex technologies Semantic Web technologies still immature and expensive So far few commercial solutions using semantic technologies

(Some work on ontology-driven search applications)

Jon Atle Gulla ICEIS 2008 9

Page 10: Jon Atle GullaSpråkteknologi og innovasjon1 Språkteknologi i industrielle anvendelser Or: How we have commercialized linguistic technologies 1. Linguistics

3. Ontologies in Process Mining Process mining:

Techniques and tools for discovering process flow, control, data, organizational and social structures from enterprise systems’ event logs

Dynamic reporting for exposing real business flows and explaining interesting transaction patterns

Semantic process mining:Using ontologies to improve the interpretation of event logs and the construction of business flows

Jon Atle Gulla ICEIS 2008 10

Page 11: Jon Atle GullaSpråkteknologi og innovasjon1 Språkteknologi i industrielle anvendelser Or: How we have commercialized linguistic technologies 1. Linguistics

Semantic Process Mining

Jon Atle Gulla ICEIS 2008 11

Detected process flow

Formal definition ofprocessterminology

Ontology

Page 12: Jon Atle GullaSpråkteknologi og innovasjon1 Språkteknologi i industrielle anvendelser Or: How we have commercialized linguistic technologies 1. Linguistics

Commercialization of Technology 2004: Businesscape founded Ongoing work on Enterprise Visualization Suite:

Combines two challenging technologies (data mining and Semantic Web)

Substantial improvement from traditional process mining (and traditional reporting tools)

However: Difficult to explain the complexity and capability of solution to

customers Few customers competent enough to distinguish process

mining from traditional reporting

Jon Atle Gulla ICEIS 2008 12

Page 13: Jon Atle GullaSpråkteknologi og innovasjon1 Språkteknologi i industrielle anvendelser Or: How we have commercialized linguistic technologies 1. Linguistics

4. Linguistics in News Reporting Semantic approaches to

news reporting:

Extract content from news articles Validate content of articles Opinion mining from news articles

and social sites Model user preferences for news recommendation Combine/aggregate knowledge from heterogenous sources

Commercial potential uncertain

Jon Atle Gulla ICEIS 2008 13

Page 14: Jon Atle GullaSpråkteknologi og innovasjon1 Språkteknologi i industrielle anvendelser Or: How we have commercialized linguistic technologies 1. Linguistics

Conclusions

Linguistics often a supporting technology Good linguistic resources tedious and expensive

to develop Not always easy to justify inclusion of linguistics

Linguistics in our projects: Enable new services and products Enhance existing services and products

Jon Atle Gulla ICEIS 2008 14