Using Code List Methodology for Value and Validation (from OASIS Code List Representation and UBL...

Preview:

Citation preview

Using Code List Methodology for Value and Validation (from OASIS Code List

Representation and UBL TCs) in OASIS CIQ Specifications V3.0– A Case Study

Ram KumarRam KumarChairman Chairman

OASIS CIQ Technical CommitteeOASIS CIQ Technical Committee

Ram KumarRam KumarChairman Chairman

OASIS CIQ Technical CommitteeOASIS CIQ Technical Committeehttp://www.oasis-open.org/committees/ciqhttp://www.oasis-open.org/committees/ciq

June 2007June 2007

Agenda Code List

What, Why, Standard OASIS Code List Representation TC OASIS UBL Methodology for Code List

Value and Validation (UMCLVV) from OASIS UBL TC

OASIS CIQ TC Implementation of OASIS Code List Specifications – A Case Study

What is a Code List?aka enumerations, aka controlled vocabularies aka classification scheme and classification values

A set of values to choose from which represent an agreed upon semantic concept

Days of a week = {“Mon”, “Tue”, “Wed”, “Thu”, “Fri”, “Sat”, “Sun”}

Code List = List Name + values List Name = Days of a week Values = {“Mon”, “Tue”, “Wed”, “Thu”, “Fri”, “Sat”,

“Sun”}

Why Code Lists are important?

It is not just elements and attribute names in XML that need to be semantically unambiguous & aligned for interoperability

The lexical form of element and attribute text content also needs to be aligned, i.e. simple data items need to be represented the same way

This is more important for applications For data oriented XML particularly (e.g. CIM), Code

Lists are as important as elements and attributes – they form part of the complete vocabulary of the document

Standard for Code List If code lists were really so simple and

obvious, there would be a single, well known and acceptable way of handling them in XML

There is no agreed solution, though The problem is that while code lists are a

well understood concept, people do not actually agree on exactly what code lists are, and how they should be used

The code list is in the eyes of the beholder

The XML schema may require only a 3-letter codes to represent the code list

The database may require a set of numeric codes, plus display labels (possibly in different languages)

The application may need to know which 3-letter code corresponds to which numeric code, so that it can process the XML and update the database

All of this code list information needs to be stored together in a single representation of the code list, so that all usages of code list can be generated from the same source information

The only constant is change Code lists change For a code list model to be useful, it has to

account for the fact that the code lists will change over time

There is little use in having a code list model that works only for a code list that is frozen in time

The code list model has to support changes between versions of a code list

The only constant is change Not all changes to a code list are version changes, however Some changes may be local changes to a distributed code

list The ISO 3-letter currency code list contains GBP for British

Pounds. However, prices on the London Stock Exchange are normally quoted in pence

This has led to the practice of adding an extra code to the standard ISO list (e.g. GBp, GBX) in order support pence as well as pounds

This kind of customisation is far from uncommon The utility of any code list model is greatly reduced if it does

not cater for local modifications of code lists

OASIS Code List Representation Technical Committee The OASIS Code List Representation format, “genericode”,

is a single model and XML format (with a W3C XML Schema) that can encode a broad range of code list information

The XML format is not designed for run-time orreal-time use, but to have the standardizedinterchange format massaged into an optimized representation

27 of the 40 requirements gathered are implemented in v1.0 of the specifications

Genericode Model Has a tabular structure for code list information Each row in the table represents a single distinct entry in the code list,

i.e. each row represents a single uniquely identifiable item in the code list.

Each column in the table represents a metadata value that can be defined for each distinct entry in the code list. Each column is either required or optional. A required column does not allow any row to have an undefined (nil or null) value. An optional column allows undefined values.

A genericode key is a set of one or more required columns that together uniquely identify each distinct entry in the code list. Optional columns cannot be used for keys. Each code list must have at least one key.

Genericode keys are equivalent to what people usually mean when they talk about the “codes” in a code list. However, genericode allows multiple keys for each code list, and there is no single preferred key.

Concept Keep code lists aka enumerations out of the core

XML schema by using “schemes” The idea is that the code lists from which an element

value is taken is indicated via a “scheme” attribute containing a URI which represents the scheme (code list)

Same as the way that URIs are used to represent XML namespaces

This is done so that a newer version of core XML schema need not be released just because an externally controlled enumeration that it uses has changed (e.g. country code)

OASIS UBL Methodology forCode List and Value

Validation

(UMCLVV 0.8)

XML Instance Document ValidationNamespace: xmlns="urn:oasis:names:tc:ciq:xNL:3

Graphical Schema View:

XMLinstanceXXX.xml

<StsMetadataRecord>

xsi:schemaLocation="urn:oasis:names:tc:ciq:xNL:3”

<ESLVersionNumberID>5.0</ESLVersionNumberID><Person>

<cbc:BirthDate>1967-08-13</cbc:BirthDate><LearnerRegistration>

<cbc:NationalStudentNumberID>123456</cbc:NationalStudentNumberID></LearnerRegistration><PersonNameGeneric>

<cbc:FirstName>John</cbc:FirstName><cbc:LastName>Smith</cbc:LastName>

</PersonNameGeneric>. . .

Text view of XML instance:

XML instance documents can be validated against the applicable XML Schema

Background (Glossary)

XML Data ContentIn an XML instance document, any values- between XML angles ‘>’ and ‘<’and- between quotes of an attribute are message data content

Examples:<BirthDate>1960-06-09</BirthDate>

<Country> <CountryCode listSchemeURI=" urn:oasis:names:tc:ciq:xNL:3:codelist:gc:Country-1“>AUS</CountryCode> <Name>Australia</Name></Country>

Background (Glossary), continued

Types of XML data content: Code values Other values (non-code values)

Examples:<Country>

<CountryCode>AUS</CountryCode></Country>

<BirthDate>1960-06-09</BirthDate>

W3C XML Schema Limitations

W3C XML Schema is mostly about data structures

But it does some Data Content Validation has good support for

- data type conformity- min/max values- length, patterns …

has limited support for:- enumerations

has no support for- complex business rules- versioned changes of validation (without affecting the Schema’s version)

Business Rules Examples

Date Arithmetic:

BirthDate < CurrentDate – 6 Years

Attribute Value Restriction:The code list value “First Name” cannot occur more than onceThe code list value “Last Name” cannot occur more than once

Element Use RestrictionCountry element cannot occur more than once, but optional

Zero-length string:

<Name></Name>

Business Rules Examples, continued

Code Liststhe code list (+version) used by CountryCode must be an accepted code list<CountryCode listSchemeURI="urn:oasis:names:tc:ciq:xNL:3:codelist:gc:Country-1“>AUS</CountryCode>

Code ValueCountryCode ‘XYZ’ must be valid in that Country code list version <CountryCode listSchemeURI=" urn:oasis:names:tc:ciq:xNL:3:codelist:gc:Country-1“>AUS</CountryCode>

Co-occurrenceif Status=‘Closed’ then ClosureReason must be present also<StatusCode>Closed</StatusCode><ClosureReason>Obsolete</ClosureReason>

Data Content Validation Conclusion XML Schema does not cover all data content validation

requirements Embedding content validation in XML Schema has undesired

consequences in conjunction with re-use and Schema versioning

Business rules vary more frequently than schema constraints, and the business rules between different partners wouldvary where the schema constraints remain the same.

By layering value constraints on top of structural/ lexical constraints, the schemas can remain unchanged while being adapted to different partners through different value constraints

Is data content validation required ? How can data content be validated in XML instances ?

Without Data Content Validation in XML

Aextends

A

Content Validation at A: Content Validation at B:- Program code - Program code- Database constraints - Database constraints

Interoperability issues:- Validation at A equivalent to Validation at B?- Data quality of message is difficult to control- Communication of data quality issues between A & B- Relies on trust in the sender- Hard to ascertain equal interpretation of codes

XML file

W3C XMLDocument Schema

Schema Validation

Design

Implementation

Data ExchangePartner Agreement

With Data Content Validation in XML

Sender’s and receiver's data content validation must be - electronic - portable- of shared logic and error output- platform-independent- versioned

Aextends

AXML file

XML Content Validation 2. Content Validation

Design

Implementation

Data ExchangePartner Agreement

W3C XMLDocument Schema

1. Schema Validation

With Data Content Validation in XML

Sender’s and receiver's data content validation must be - electronic - portable- of equivalent logic and error output- platform-independent- versioned

Aextends

AXML file

UMCLVV 2. Content Validation

Design

Implementation

Data ExchangePartner Agreement

W3C XMLDocument Schema

1. Schema Validation

UMCLVV Features

Code Value ValidationExample:CountryCode must be a valid CountryCode

Code List Metadata ValidationExamples:CountryCode must belong to an agreed, named Country Code list (+version) urn:oasis:names:tc:ciq:xNL:3:codelist:gc:Country-1

Complex Rules ValidationExamples:- BirthDate < CurrentDate- StatusCode ‘Closed’ requires a ClosureReason.

UMCLVV Features, continued

Completely separate from W3C XML Schema

Platform-independent ISO/IEC 19757-3Schematron (implemented using W3C XSLT stylesheets)

Completely independent of Naming and Design Rules (NDRs)

Versioning in isolation of XML Schema

UMCLVV Process Overview

OASIS UBL Methodology for Code List and Value Validation (UMCLVV)

ValidationCoding

W3C XML Validation Stylesheettransform generate

Data Exchange Partner Agreement

Data Content Validation Requirements

UMCLVV involved Roles

OASIS UBL Methodology for Code List and Value Validation

Data Content Validation Requirements

ValidationCoding

W3C XML Validation Stylesheettransform generate

Business Analysts & Testers

Users

(Developers)

(Data Architects)

UMCLVV Service StaffRun-time Operator Specialist

Documentation

Developers & Testers

Users

UMCLVV Run-Time Components

Aextends

A

W3CXML

ValidationStylesheet

XML file

W3CXML

DocumentSchema(s)

Value Validation The validation process involves the use of Schematron

language and XSLTs Schematron is a rule-based XML Schema language,

developed by Rick Jelliffe and internationally standardized as ISO/IEC 19757-3, using XPath expressions to describe validation rules .

Schematron is used to confirm the success or failure of a set of assertions made about XML document instances.

Schematron can be used as an adjunct to DTDs, RelaxNG or XML Schemas. It allows co-occurrence constraints, non-regular constraints, and inter-document constraints

Methodology - Overview

Methodology Data Flow Diagram

UMCLVV Status OASIS draft standard 0.8 Ownership of UMCLVV will move from UBL TC to

CLR TC (will incorporate a name change)

No known platform-independent alternative

Plug-and-play run-time component

Methodology can evolve without impacting run-time requirements

A A

W3CXML

DocumentSchema

W3CXML

ValidationStylesheet

A

B

C

D

E

F

Default Code List (gc)

XSDUMCLVV

XML

structure validationCode list validation

XML

Validated

Application A

B

C

G

H

Customised Code List (gc)

References

References

CVA

schXSL

UMCLVV Process

Application of UMCLVV Process in an Enterprise

Enterprise Code ListsUMCLVV

Enterprise XML Schemas

Application B

Customised enterprise code

lists

Business Rules

Application A

Customised enterprise code

lists

Business Rules

Benefits of UMCLVV

Verify that instance document is valid as per DEPA Validate data content platform-independently Sender and receiver get the same validation result Simple run-time requirement (XSLT) Strong candidate to become a global industry standard

(UN/CEFACT is taking an interest) W3C Stylesheet and Schema are industry standards Simple run-time requirement (XSLT or Python

or any other ISO Schematron implementation)

Benefits of UMCLVV, continued

Supports versioned validation in isolation of schema version

Documentation is in synch with implementation

Validation can be switched on/off as required (by msg. server or appl.)

Simplifies application coding

Simple run-time requirement allows for evolution of UMCLVV

Details of methodology is transparent to operations

Risks of UMCLVV

An OASIS draft standard

UMCLVV not widely used yet

Methodology may change or evolve

Requires Schematron and XPath expertise

Affects the XML instance document processing (extra steps)

Affects the testing of XML Schema/XSLT release packages

OASIS CIQ TC Case Study – Using UMCLVV Methodology

OASIS CIQ Technical Committee

Industry Specifications for defining Party Centric Data from global (international) perspective

Party – Person or Organisation Name (241+ countries in over 36 formats) Address/Location (241+ countries in over 130 formats)

Party centric attributes Party relationships

Delivering royalty free, open, international, industry and application neutral XML specifications for representing, interoperating, and managing party(person/organization) centric information

Why Genericode and UMCLVV Approach for CIQ TC? Keeps code list and values outside of the core

CIQ XML Schemas Provide users with the ability to define the

semantics for the data represented in CIQ structure

Provide users with the ability to customize the CIQ XML Schemas without modifying the CIQ XML Schemas

Provides users the ability to write business rules to constrain the structure of the CIQ XML Schemas without modifying the XML schemas

OASIS CIQ Specifications Party Name Schema – xNL.xsd Supporting enumeration list (10) – xNL-types.xsd

Party Address Schema – xAL.xsd Supporting enumeration list (30) – xAL-types.xsd

Party Information Schema – xPIL.xsd Supporting enumeration list (56) – xPIL-types.xsd

CIQ Specifications without Genericode Approach

Use Party Name as Case Study

Code Lists defined in an XML Schema (xNL-types.xsd) that is “included” in xNL.xsd

Enumeration List referenced from xNL-types.xsd

xNL Enumeration List

Users given the choice to modify the code lists to meet their specific requirements

Basic default values provided, but it is up to the users to use them as is or customise it

xNL Enumeration List - Drawbacks

Each application has to have its own enumeration list Point to point negotiations between applications No standard enumeration list file that remains untouched Change in enumeration list will result in change to

application code generation The Name schema might be used in multiple locations in an

organisation (e.g. billing, marketing, sales, customer identification) and hence, customising the enumeration list is not straightforward

It might be an overhead for an application to use a large code list when it requires only 3 values

Objective of this case study

Move away from embedding code lists as XML schemas and “include” or “import” them in base XML schemas

Investigate the use of genericode approach and UMCLVV in CIQ Specifications

Implement genericode approach in CIQ Specifications as an optional feature

Customise the genericode based default code lists with specific requirements without modifying the default code lists

Apply business rule constraints on the core CIQ XML schemas without modifying the XML schemas

Case Study - Scenarios Add a new code list value to default name code

list (“NativePlaceName”) Restrict the default name code list to allow no

more than one first and last name (“FirstName”, “LastName”)

Restrict the default code list to allow only “FirstName”, “LastName”, and “NativePlaceName” as code values

Apply business rule constraints on XML Schema

Customising the default xNL Code List without changing it to cater the above requirements is impossible

Preparing xNL Schema with Genericode Approach to Handle

Code Lists

Step 1- Create default .gc files

Identify and decide on list-level and instance-level metadata to be included

Create .gc file for each enumeration list in xNL-types.xsd

Ensure that the .gc file is valid structurally against genericode-code-list.xsd file

.GC file - Example

Code Value

List Level Metadata

Instance Level Metadata In the absence of metadata properties for values in the

instance being validated, only the values found in the associated external list representation can be used. There being no qualification of the values in the instance, all values in the external file are in play as valid values for validation

If the instance being validated does have metadata properties specified for a given value, then that value is asserted to be a value from a particular version or identified list of values.

Instance level metadata allows an instance to disambiguate a coded value that might be the same value from two different lists.

Step 2: Modify xNL.xsd

Remove references to enumeration list defined as xml schemas

Include distinct instance level metadata for all elements/attributes that uses code list values

Instance Level Metadata used Ref == genericode ShortName Ver == genericode Version URI == genericode CanonicalUri VerURI == genericode CanonicalVersionUri

Instance Level Metadata

Instance level Metadata for “ElementType” attribute

xs: string

Step 3: Prepare Context/Value Association (CVA) File

Every element and attribute information item below the document element of an XML document is in a document context described by its hierarchical ancestry of elements. A fully qualified document context specifies the information item’s precise location in the document.

Define the all the default document contexts with pointers to the default genericode files produced from xNL-types.xsd

CVA File

Step 4 - Prepare files for Value Validation

Run the supplied batch/shell files as part of the UMCLVV process to create the necessary files for code list value validation

Applying Constraints to Default Code Lists

Default Schema and Code List Values

- Add a new code value “NativePlaceName”

- Restrict the code values to have only “FirstName” and “LastName”

Step 1 – Add a new code list value

Add a new code list value “NativePlaceName”

Create a gc file with this code value

Step 2 – Restrict the default code list

Restrict the code values to only “FirstName” and “LastName”

Create a .gc file with this restriction

Step 3 – Create Restriction CVA File

Applying Business Rules to Constrain Default XML Schemas

Step 4 – Define Business Rules to include constraints to default schema

Restrict the schema to accept only one First Name and one Last Name

Business Rules to define constraint

No changes to xNL Schema

Step 4 - Prepare files for Value Validation

Run the supplied demonstration batch/shell files as part of the UMCLVV process to create the necessary files for value validation

CIQ Global Address Specification (xAL)

Can be customized to specific country address structure using UMCLVV, but at the same time keeping the customized structure in compliance with xAL default structure

Example 1: Customizing xAL for Singapore

Let us assume that Singapore Address does not require the following xAL elements:

Administrative Area Rural Delivery, or Post Office Location Coordinates Free Text Address Country

Example 2: Customising xAL for Singapore

Example 2: Business Rule for Singapore Address

No changes to xAL Schema

Example 2: Customizing xAL to only use Free Text Address Lines

Business Rule for Example 2

No changes to xAL Schema

CIQ Specifications with Genericode Approach

Skills Required to use OASIS Code List Approach

XML Schema Language Schematron Language XSLT (some times) XPATH XML Processors/XML Parsers Batch Files / Shell Files

Experience using UMCLVV and Genericode Approach

Powerful The only standard for managing code lists now in

industry Manual effort (requires patience) Painful without tool support But once everything has been set up, works

beautifully Does not deal with mapping between schemas

OASIS Codelist Representation (Genericode) Version 1.0, May 2007, http://docs.oasis- open.org/codelist/cd-genericode-1.0/doc/oasis-code-list-representation-genericode.pdf

OASIS UBL Methodology for Codelist and Value Validation, Working Draft 0.8, November 2006, http://www.oasis-open.org/committees/document.php?document_id=21324

OASIS Code List Adaptation Case Study (OASIS CIQ), May 2007, http://www.oasis-open.org/committees/document.php?document_id=23711

OASIS Party Information Standards, http://www.oasis-open.org/committees/ciq

References

Special Thanks……..

Ken Holman, Chair, OASIS Code List Representation TC

Juerg Tschumperlin, Data Management Solutions, New Zealand

Thank You

Recommended