Upload
clifford-stokes
View
216
Download
0
Tags:
Embed Size (px)
Citation preview
Mapping Techniques: Demonstration of automatic data
transformation
Maria Theodoridou
ICS-FORTH
April 2004
2 April 21, 2004 Maria Theodoridou
The Problem
Convert data of various formats into a common XML format compatible with the CIDOC Conceptual Reference Model
• Input data:– Databases– Text files– Spreadsheet files
• Output files: – XML files, compatible with the CIDOC Conceptual
Reference Model
3 April 21, 2004 Maria Theodoridou
An integration/migration tool designed to convert structured data from one format to another. Also designed to clean and restructure the data to fit the new format. Market Leader in its FieldMap Designer: transfers data among a wide variety of data file types. To transfer data, you design and run a conversion. Each conversion contains all the information needed to convert data from an existing data file or table to a new Target data file or table. Conversions are done in three Steps:
1. Define options for the source file 2. Define options for the target file 3. Define appropriate mapping from the source to the target file
using all the functions, expressions, actions and events the tool offers
Data Junction 7.5
4 April 21, 2004 Maria Theodoridou
Step 1 (Source)
• Define Source Type – ASCII, Access, Excel, SQL, Oracle etc.
• Define Source File• Each text source file will be separated into
fields and records – Field is a category of things, which have a
common characteristic– Things of the same field are listed in
separate records below the field– We use field and record delimiters
5 April 21, 2004 Maria Theodoridou
Step 1 (Source)
• We can sort our data descending or ascending depending on one or more fields
• By using the “Source Filters” we can keep only the records we want
• We can have the field names by using a header in the source file
• The parsing is done serially and every record is listed below the other
• From the Data Browser we see all the fields, records and the data
6 April 21, 2004 Maria Theodoridou
Step 2 (Target)
• Define Target Type– ASCII, Access, Excel, Lotus, SQL, Oracle etc.
• Define Target File• We can use an existing record layout
from:– A previous conversion– An XML file, an XML DTD, an XML Schema
7 April 21, 2004 Maria Theodoridou
General about a conversion• Every conversion uses the Event Handlers,
the Actions, the Expressions and the Functions
• Every Event, from a predefined set, contains one or more actions
• Every action has either a predefined set of parameters or uses an expression
• Every expression can use one or more of the predefined or the user defined functions
• We can also declare global variables, which can be used by every expression within a conversion.
8 April 21, 2004 Maria Theodoridou
Events
• Predefined set of Events • They occur at a specific moment with a
predefined order in the conversion cycle• Events are separated into Source Generic
Events, Target Events, Conversion Events, and Record Type Events
• Examples of Events:– BeforeConversion, AfterConversion,
AfterEveryRecord, OnDataChanged, BeforeMap, AfterMap, BeforePutRecord, OnDiscard
9 April 21, 2004 Maria Theodoridou
Actions
Actions have their own format and parameters:• Execute: executes an expression we specify• Clear: Clears the contents of a target record
memory buffer • Map: maps a specific target record type• Put: writes the results of a target record type
to a specific position in the target file• ClearMapPut: combines the functionality of
Clear, Map, and Put
• Abort: causes the conversion to abort
10 April 21, 2004 Maria Theodoridou
An Example
• We Map and then Put the target record types “CRM_Entity”, “is_identified_by” and “has_current_owner”
• The global variable “id” helps us map the appropriate fields to the “is_identified_by” target record type
11 April 21, 2004 Maria Theodoridou
Expressions
• They offer conditional mapping and data modification
• Each expression is written for an “Execute” Action or is a Target Field Expression
• We use the same syntax in every expression
• Every expression is written in the expression builder window
12 April 21, 2004 Maria Theodoridou
An expression example
A target field expression:
If (IsNull(Fields("nr"))) Then
"NULL"
Else
"OID:LUPA."&Fields("nr")
End If
13 April 21, 2004 Maria Theodoridou
Functions
Set of predefined functions separated in categories:• Conversion: typecast a string or a variable• Date/Time: convert strings into several formats of
date and time• Math: return the sign, absolute, logarithm etc.• Parse: parse a string into sub strings by using
delimiters
14 April 21, 2004 Maria Theodoridou
Functions
• Text: replace, compare, count, trim strings• File: read, copy, append and rename a
file or get the name of a directory or make a new directory
• Flow Control: For … Next, Go To, If … Then … Else, Return, Select Case … End Select, While … Wend.
• We can also have user-defined functions, save them as a Visual Basic Module File (.bas) and call them within the conversions we want
15 April 21, 2004 Maria Theodoridou
Step 3 (Map)
• Here we define the Events, Actions, Expressions and Functions we use
• We define the order and type of the fields of every target record type
• A target record type may contain a lot of other record types
• When a field is selected to be a record, then it automatically links to a record type, which has the same name with the field
16 April 21, 2004 Maria Theodoridou
Lupa transformation
Lupa – Stone data base of 7000 archaeological records, City of Vienna, Austria
Microsoft Access to XML conversion
17 April 21, 2004 Maria Theodoridou
Lupa transformation Source
18 April 21, 2004 Maria Theodoridou
Lupa transformation Target
XML file
19 April 21, 2004 Maria Theodoridou
Lupa transformation Map
Field name CIDOC CRM correspondence
Nr CRM_Entity->P47F_is_identified_by->E42_Object_Identifier
Bilder CRM_Entity->P129B_is_subject_of->E38_Image
Titel CRM_Entity-> P102F_has_title-> E35 Title
Objekt CRM_Entity ->P2F_has_type-> E55_Type
Fundort CRM_Entity -> P12B_was_present_at-> E5_Event
Bezirk CRM_Entity->P12B_was_present_at->P7F_took_place_at->E53_Place
Verwahrort CRM_Entity->P55F_has_current_location->E53_Place
Literatur CRM_Entity->P70B_is_documented_in->E31_Document
Inschrift CRM_Entity->P65F_shows_visual_item->P142F_has_clear_text
Inschrift_Abfrage CRM_Entity->P65F_shows_visual_item->P141F_has_transcription
20 April 21, 2004 Maria Theodoridou
CIL transformation
CIL – Corpus Inscriptionum Latinarum
25,150 roman inscriptions
text to XML conversion
21 April 21, 2004 Maria Theodoridou
CIL transformation Source
CIL 13, 00029* = AE 1994, 1194.Abelloni / deo / (A)elia(?) Mar/[6] / v(otum) s(olvit) CIL 13, 00118* (AE 1989, 0519).D(eo?) M(ercurio?) / pro sal(ute) Imp(eratoris)M(arci) Aur(eli) Antonini / Fel(icis) Aug(usti)Brit(annici) p(ontificis) m(aximi) tr(ibunicia)p(otestate) XV imp(eratoris) / II co(n)s(ulis)II des(ignati) III p(atris) p(atriae) C(aio)Iul(io) / Drutedo et Balorice Taur(i) f(ilio) /ex v(oto)
22 April 21, 2004 Maria Theodoridou
CIL transformation Source
ContentExtractor is a report reader
designed to mine important data out of unstructured report formats such as invoices, bank statements, emails, html docs, etc., and turn that information into a database
We used Content Extractor to convert unstructured text to Microsoft Excel file format
23 April 21, 2004 Maria Theodoridou
CIL transformation Target
XML file
24 April 21, 2004 Maria Theodoridou
CIL transformation Map
Field name CIDOC CRM correspondence
Field 1 CRM_Entity->E34 Inscription->P65F_shows_characters
Field 1 CRM_Entity->E34 Inscription->P141F_has_transcription
Field 2 CRM_Entity->P70B_is_documented_in->E31_Document
Field 2 CRM_Entity->P70B_is_documented_in->E31_Document
Thank You!