Ontologizing EDI
doug foxvog
23 July 2004
Ontologizing EDI
• What is EDI?
• EDI Data Types
• Ontologizing of EDI
• Ontologizing Invoice Message Type
• Summary
EDI Electronic Data Interchange
• EDI is a system for standardized business message passing
• Used by hundreds of thousands of corporations
• Look into ontologizing EDI
• Can Invoice, PurchaseOrder, Functional Acknowledgment, PaymentOrder be mapped to SW languages?
EDI• Obtained ASC X12 EDI Workbook (23 MB HTML)
– 318 Message Types (Transaction Sets)– 1019 Message Segment Types– 34 “Composite” Segment Types– 1466 Data Element Types
• Code Sets• Numbers• Text• Parts of Data Segments
– Up to 1500 (or more) codes per code set– 578 External Code Sets (must find elsewhere)
• Hardcopy is Size of Large Telephone Book– Without External Code Sets
Ontologizing EDI
• What is EDI?
• EDI Data Types
• Ontologizing of EDI
• Ontologizing Invoice Message Type
• Summary
Message Type Description • Transaction Set Defines:
– Segment Types in Message– Order of Segment Types– Optionality of Segment Types– Conditionality
• A if B; A if not B; A only if B; A only if not B
– Repetitions of Segment Types • Maximum Number or Unlimited
– Repetition of Groups of Segment Types
• Same Segment Type in different part of message has different meaning with respect to message
Invoice Message Examined
• 90 Segment Type Fields• 9 Mandatory Fields • Groups of Fields Repeat Zero to 200,000
times (or unlimited)• Nested Loops of Segment Type Groups• Field Types Repeat w/ Different Semantics
– Date/Time Field in 8 places– Reference Information in 6 places
• 49 Segment Types• 191 Data Element Types (used by Segments)
Invoice Message Contents• Header Information
• 0 - 200 Parties Identified
• Ungrouped Information
• 1-1000 Industry Codes (multiple Code Sets)
• Reference, Vessel, & Accounting Blocks
• 1 - 200,000 Line-Item Blocks– Many Sub-blocks within Line-Item Block
• Summary Information – Accounting, Tax, & Shipping Blocks Repeat
Example Segment TypeCurrency
• Has 21 Data Elements– 2 Currency Codes– 1 Exchange Rate– 1 Currency Market Code– 2 Entity Identifiers– 5 Date/Time Sets (!)
• Date/Time Qualifier Code• Date Field• Time Field
Example Data Element TypesEntityIdentifier
• Code identifying type of– Organizational Entity– Physical Location– Property– Person – Entity type relative to transaction (e.g. Employer)
• 1500 Available Codes
Date/TimeQualifier• Code identifying
– Type of thing date applies to– How date applies to thing (ends, promised for, …)
• 1416 Available codes
Ontologizing EDI
• What is EDI?
• EDI Data Types
• Ontologizing of EDI
• Ontologizing Invoice Message Type
• Summary
Ontologizing EDI for SWS• Purpose would be ontology mapping
– Relay info from EDI invoice– Produce EDI format purchase order– Produce EDI format payment message– Send & detect EDI acknowledgement
messages
• How reasonable would such a mapping be considering such a huge dataset?– What can we get by with for these four
message types?
• What have others done?
Ontologizing at Different Levels• Transaction Set (Message Type)
– Meaning of segments relative to TS unstated
• Data Segment Groups– Appear unlabled in Transaction set file
• Data Segment Types– Format of each provided in file– Data elements in each– Data element dependencies stated
• Data Element Codes– Can be concepts, relations, mixture – Affect meaning of Segments– Some applicable only to certain Segment or TS types
Data Segment Ontologizing
• EDI file describes on message layout• Ontology must focus on meaning • Relationship among Data Elements
must be expressed– Currency segment defined with 2
Currency codes.– If both present, one is source currency
which is being converted into second currency.
Data Element Ontologizing• Some are homogeneous code sets so that
it is easy to encode whole set– Currency Code – Over 160 currencies– Currency Exchange Code – 6 exchanges
• Some are heterogeneous– Time Code
• UTC+2, EastEuropeTime, EastEurSummerTime
– Entity Identifier Code• Org., Person, Location, Participant type• Has multiple internal taxonomies
• Some are text– CityName – cities need to be ontologized
How Much Needs to be Ontologized?
• How to determine subset needed/ appropriate for Web Services?
• Meaning of many Data Element Codes need ontologizing – how to select?
• Large variety of topics to be covered– An ontology is needed for each– Ontologies need to be tied together
Should EDI be Ontologized?
• How much effort to cover enough to establish communication with EDI systems?
• Has anyone else ontologized EDI, or a significant portion thereof?
• Are companies moving away from EDI to other systems?
• Is this effort to aid buggy-whip manufacturers?
Ontologizing EDI
• What is EDI?
• EDI Data Types
• Ontologizing of EDI
• Ontologizing Invoice Message Type
• Summary
Ontologizing Invoice
• Topics include – Time – Currencies– Temporal relations – Contracts– Geographical regions – Reports– Physical products – Banking– Measured quantities – Taxes– Meta-information – Delivery
• Need ontologies for each of these
Ontologizing Invoice• The Data Segment types included in the
Invoice Transaction Set were ontologized.– To different levels by different students
• Some of the Data Element types had their codes ontologized.– This would be needed for ontology mapping
• Different Ontology Languages used– WSML, FLORA, RDF-S, CycL
• Data Elements within Currency and Date/Time Data Segment types expressed in several languages.
Date – Time Segment• Date and/or Time
• Date/Time Format specifier
• Date/Time Qualifier– Type of thing timestamped– How timestamp relates to thing
• Date/Time of event• Start/End date/time• Expected/Promised/Scheduled/Requested … time• Effective date, expiration date, due date, dob• Corrected/Former
– Combination of above with type
Comparing Invoice to PurchaseOrder and PayOrder
Invoice
49
191
Purchase
Order
63
217
Pay
Order
45
132
Segment Types
Data Element Types
Much overlap: • 86 Data Elements in all 3 – 71 in two of three• 295 total Data Elements used• 19 Segment Type in all 3 – 15 in two of three• 105 total Segment Types used
What has been Ontologized?• All Invoice Data Segments (to some extent)• Dates & Times• Temporal Relations• Currency Types • Currency Markets• Geopolitical Entities
– Country list to relate to Currencies
• 1200+ Agent types graphically placed in taxonomy – not encoded yet.
Ontologizing EDI
• What is EDI?
• EDI Data Types
• Ontologizing of EDI
• Ontologizing Invoice Message Type
• Summary
Summary• EDI is a massive set of descriptions of
message formats
• An individual message type permits inclusion of thousands of different codes which are syntactically meaningful
• Heterogeneous codes require individual attention
• Portions of EDI already converted to RDF
• EDI may be being phased out
• Questionable whether we should encode
Questions?