32
Semantic integration of traditional and web-based information sources Gergely Lukácsy, BUTE Péter Szeredi, BUTE Péter Krauth, IQSYS Attila Bodnár, IQSYS

Semantic integration of traditional and web-based information sources    

  • Upload
    eithne

  • View
    36

  • Download
    0

Embed Size (px)

DESCRIPTION

Semantic integration of traditional and web-based information sources    . Gergely Lukácsy , BUTE Péter Szeredi , BUTE Péter Krauth , IQSYS Attila Bodnár , IQSYS. What is a mashup?. - PowerPoint PPT Presentation

Citation preview

Page 1: Semantic integration of traditional and web-based information sources    

Semantic integration of traditional and web-based

information sources    

Gergely Lukácsy, BUTEPéter Szeredi, BUTE Péter Krauth, IQSYSAttila Bodnár, IQSYS

Page 2: Semantic integration of traditional and web-based information sources    

What is a mashup?

• A mashup is a website or application that combines content from more than one source into an integrated experience.

• The etymology of this term possibly derives from its similar use in pop music.

/Wikipedia/

Page 3: Semantic integration of traditional and web-based information sources    

Quotes on mashups• “Web mashups, and other Web 2.0 development

(e.g. Ajax) are all facets of the same phenomenon that : – information and presentation are being separated in

ways that allow for novel forms of reuse.”

• “The mash-up is the offspring of an environment where application developers facilitate the creation of integrated, yet highly derivative application hybrids by third parties, something they do by providing rich public APIs to their user base.”

Page 4: Semantic integration of traditional and web-based information sources    

What’s so special about mashups?

• Content used in mashups is typically sourced from a third party via a public interface or API.

• Other methods of sourcing content for mashups include web feeds (e.g. RSS or Atom), web services and screen scrapping.

• Some in the community believe that only cases where public interfaces are not used count as mashups.

• Many people are experimenting with mashups using Google, eBay, Amazon, Flickr, and Yahoos APIs.

• Google has a mashup editor in beta.

Mashup = Application Integration á la Web 2.0Mashup = Application Integration á la Web 2.0?

Page 5: Semantic integration of traditional and web-based information sources    

What we are going to speak of?S emantic

IN tegration

T echnology

A pplied in

G rid-like,

M odel-driven

A rchitectures

R&D project:• Sponsored by the National

Research and Development Program, 2005-2007

Consortia: • Coordinator: IQSYS • Developer Organisations:

• IQSYS, BUTE, SZTAKI• User Organisations:

• OSZK, MTI, ARECO/eBolt

Page 6: Semantic integration of traditional and web-based information sources    

Information Integration with Sintagma

SINTAGMA

Database A

Database B

Application A(web service)

Application B(traditional)

External Application(e.g. mashup application)

(RDBMS, XML, RDF)

• Separates clearly the data access and transformation layers of integration from the presentation layer

• Uses a comprehensive metadata repository (Model Warehouse)– Semantics of data represented in the repository: maps local and remote

metadata to each other– Data access and transformation driven by the repository

Data access and transformation

Presentation and further processing

Page 7: Semantic integration of traditional and web-based information sources    

Search and analysis application (e.g. mashup)

m e t a d a t a m a p p i n g

Search and analysis of Web data

d a t a s e r v i c e

m e t a d a t a m a p p i n g

m e t a d

a t a m a p

p i n

g

SINTAGMA-node

Legend:

Page 8: Semantic integration of traditional and web-based information sources    

Sintagma – an approach to information integration

• Key Principles:– No duplication of data: Model Warehouse vs. Data Warehouse– Communication: one-way, on-line (no modification of data, instant access)– Integration of web services as information sources supported (no modification

required)• Key Components:

– Manages various forms of metadata (Model Manager)– Accesses various structured and semi-structured information sources

(Wrappers):• RDBMS• RDF• XML• Web Services

– Preprocesses various „unstructured” information sources (Annotators):• Texts• Raster maps (labels and signs)• Excel tables

– Optimises query execution: query planning using deduction (Mediator)– Data Quality Control

Page 9: Semantic integration of traditional and web-based information sources    

Model Manager /Model Warehouse

Mediator(local)

SintagmaGUI

Data Quality

Controller

DQ Engine(meta)

RDBMSHTMLRDFWeb

Service

MapAnnotator

MapServer

DQ Engine(native)

mapstexts

TextAnnotator

XML

JDBCWrapper

WSWrapper

RDFWrapper

WDWrapper

XMLWrapper

Model Manager

(remote)DQ log

Data Quality Control subsystem

Text Annotation subsystem

Map Annotation subsystem

Architecture of SINTAGMA

Page 10: Semantic integration of traditional and web-based information sources    

Special concepts of business areas

Domain specific terminology

Conceptual Level

Interface Level

Application Level

Source Level

Integrated Application Model

local

local local

Domain specificknowledge/ontologies

Externalmodel

(e.g. BPM)

Data Source

n

Data Source

2

Data Source

1

transformedunified

Conceptual viewsof workers

in a business area

local

mapping

input

Legend:

model

Data Source

3

local

Integrated Conceptual Model

Common, clarifiedconcepts

Model Warehouse of SINTAGMA

Page 11: Semantic integration of traditional and web-based information sources    

Modelling in SINTAGMA

• The Model Warehouse– content of the Model Warehouse– interface models and abstractions– ontology concepts

• Use cases– Product comparison– Workflow of Equipment purchase– Web service integration demos

Page 12: Semantic integration of traditional and web-based information sources    

Model Warehouse• Content of the Model Warehouse

– Object-oriented models• Structural properties of sources in UML Object Model• Non-structural information given as OCL Constraints• Mapping between models as abstractions

– Description Logic models– Queries: source and conceptual level

• Classification of models – interface– unified (application)– conceptual

• Modeling: SILan – Semantic Integration Language– Describes content of Model Warehouse in textual format– Has well-defined semantics

Page 13: Semantic integration of traditional and web-based information sources    

Interface Models

Page 14: Semantic integration of traditional and web-based information sources    

Higher level models

• Abstractions (data transformations)– populate higher level entities

• Filter low level data (suppliers)• Transform data to appropriate higher level form (clients)

– can have multiple suppliers and clients

Page 15: Semantic integration of traditional and web-based information sources    

Higher level models (cont’d)• Invariants

– have to be satisfied by all the instances of a model element

– can contain navigation

• Queries– can be formulated on any model

• Interface level models: directly accessing data sources• higher level models: using mediation

– are interchangable with abstractions

Page 16: Semantic integration of traditional and web-based information sources    

Conceptual Models

Page 17: Semantic integration of traditional and web-based information sources    

Conceptual models (cont’d)

• These models encapsulate concepts given in Description Logic formalism

Page 18: Semantic integration of traditional and web-based information sources    

Use case 1: Product comparison

• Goal: find products that are similar to the products in a host system

• Information sources– catalogues from various vendors in Excel– database of the host system

• Problems to solve– heterogenity of the catalogues: preprocessing– algorithm for product comparison

Page 19: Semantic integration of traditional and web-based information sources    

Solution in SINTAGMA

Unified Products

CatalogueHost Database

Similar Products

MySQL XML

Excel

Excel

Excel

Model Warehouse

Product comparison

Preprocessing

Page 20: Semantic integration of traditional and web-based information sources    

Use case 2: Equipment purchase

Page 21: Semantic integration of traditional and web-based information sources    

Equipment purchase in an organisation

• Scenario– Each department maintains a wish-list of equipments– There are vendors who provide products to departments

• Vendors sell different types of products (vendor A sells printers and toners, Vendor B monitors and printers etc.)

• The financial department dynamically designates a preferred vendor for each product

• Questions: is there any expensive order? what is teh total ? etc.

• Information Sources:– Department’s wish-list:

• relational database with columns description, category, e.g.: „we have run out of paper”, „15/18”

– Financial department: • Web service, with operation determining where to buy a given

product, e.g.: (15,8) -> (A4 paper, 4, 23)– Vendors:

• Heterogenous web service which return prices, units and delivery date, e.g.: 23 -> (12, 1, 2007-07-01)

Page 22: Semantic integration of traditional and web-based information sources    

Event Driven Process Chain

Page 23: Semantic integration of traditional and web-based information sources    

Solution in Sintagma

Page 24: Semantic integration of traditional and web-based information sources    

Use case 3: Web Service Integration

• Integrating Amazon and Barnes&Noble

• Integrating RSS-sources (e.g. origo, nol, index, metro)

• Integrating World Championship Results (20o2 and 2006)

Page 25: Semantic integration of traditional and web-based information sources    

Integrating Amazon and Barnes&Noble

Conceptual Level

Interface Level

Application Level

Source Level

Amazon

Barnesandnoble.com

web service

Amazon.comweb service

Legend:

model

Currency exchange

service

currency

Barnes&Noble

AmazonBN

Price comparison

Availability under

limit in HUF

query

input

mapping

Page 26: Semantic integration of traditional and web-based information sources    

Integrating results of World Championships

Conceptual Level

Interface Level

Application Level

Source Level

2002 WC

Result(2002 WC)

Web service

query

input

Legend:

model

Unified WC matches

Result(2006 WC)

Web Service

2006 WC

Optimised WC matches

transformation

combination

Score: n-m

Match Id: 0-63

Score1: nScore2: mMatch Id: 1-64

Team matches

First FourTeams

Team matchesby year

Positions

grouping

derivation

Teams in both WCsMatches in both WCs

Matches of teams

mapping

No of matches

played by teams

Team positions by year

Page 27: Semantic integration of traditional and web-based information sources    

Integrating RSS-feeds

Conceptual Level

Interface Level

Application Level

Source Level

origo

Nol.hu RSS

source

Origo.huRSS-

source

Legend:

model

Index.huRSS-

source

index

nol

Unified RSS-feeds

Search for

occurances of a

specific word

(e.g. „budapest”)

metro.hu RSS

source

metro

query

input

mappingVIP data-base

VIP

TextAnnotator

goverment

opposition

Search forhigh level concepts

(e.g. political conflicts)

combinationmembers of

Page 28: Semantic integration of traditional and web-based information sources    

Summary• The system

– is a semantic information integration tool– handles various structured sources

• relational, various semi-structured sources and web services

– preprocesses various unstructured sources• texts, maps, tables

– uses logic / constraint logic programming– can be used in mashup creation

• disciplined and flexible approach to data access in mashups• separates data integration from mashup presentation logic• resolves semantic and technical differences in sources

Page 29: Semantic integration of traditional and web-based information sources    

Real estate search - Trulia

• A real estate search engine that helps you find homes for sale and provides real estate information at the local level to help you make better decisions in the process. Trulia pulls in real estate data from partnerships with thousands of brokers and agents and displays it on a Google Maps interface.

• Trulia shows you how sales prices have been trending where it matters—in your county, city, ZIP code and neighborhood. They also offer heat maps and real estate guides.

• http://www.trulia.com/#start

Page 30: Semantic integration of traditional and web-based information sources    

Hotel Guide - Trivop

• The self-proclaimed first videoguide for hotels doesn’t disappoint. Locate hotels on this Google Maps + Hotel mashup and view user-created videos of the hotels. This gives a much better view of a prospective hotel before visiting.

• Currently looks like they only have hotels in England and France, but with their recruiting efforts one can only assume Trivop will becoming to a region near you.

• http://www.trivop.com

Page 31: Semantic integration of traditional and web-based information sources    

Visual Music search – Music Map

• Visual music search application mashed with Amazon data. Choose and artist and album, see related artists in an abstract tree graph. Wicked.

• http://www.dimvision.com/musicmap/

Page 32: Semantic integration of traditional and web-based information sources    

Search for Popular Music – Hype Machine

• The Hype Machine follows music blog discussion. Every day, hundreds of people around the world write about music they love.

• The Hype Machine tracks a variety of MP3 blogs. If a post contains MP3 links, it adds those links to its database and displays them on the front page.

• Some of the frequently accessed tracks are cached by the Hype Machine server, much like Google Search caches web pages, to reduce load on the bloggers' servers and protect their bandwidth. Those tracks are NOT available for download, but you can preview them via the "listen" links that are next to each track or using your media player.

• The blog that posted a particular track is identified under every track by name and with a "read post" link that leads to the blog post itself. If you enjoyed a track someone posted, stop by and let them know!

• You can purchase CDs and individual tracks by using the "amazon" and "itunes" links that appear next to most tracks. Each purchase you make via the Amazon and iTunes links supports both the artists and the Hype Machine. Please buy and enjoy.

• http://hypem.com/