56
CELAN WP 2 – DELIVERABLE D2.1 ANNOTATED CATALOGUE OF BUSINESS-RELEVANT SERVICES, TOOLS, RESOURCES, POLICIES AND STRATEGIES AND THEIR CURRENT UPTAKE IN THE BUSINESS COMMUNITY ANNEX 2 INVESTIGATION OF BUSINESS-RELEVANT STANDARDS AND GUIDELINES IN THE FIELDS OF THE LANGUAGE INDUSTRY Project Title: CELAN Project Type: Network Project Programme: LLP – KA2 Project No: 196466-LLP-1-2010-1-BE-KA2-KA2PLA Version: 1.2 Date: 2013-01-30 Author: Blanca Stella Giraldo Pérez (sub-contract for standards investigation and analysis) Contributors: Infoterm (supervision), other CELAN partners (comments) The CELAN network project has been funded with support from the European Commission, LLP programme, KA2. This communication reflects the views only of the authors, and the Commission cannot be held responsible for any use which may be made of the information contained therein.

CELAN WP 2 DELIVERABLE D2.1 ANNOTATED CATALOGUE OF

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

CELAN WP 2 – DELIVERABLE D2.1 ANNOTATED CATALOGUE OF BUSINESS-RELEVANT

SERVICES, TOOLS, RESOURCES, POLICIES AND STRATEGIES AND THEIR CURRENT UPTAKE IN THE BUSINESS COMMUNITY

ANNEX 2

INVESTIGATION OF BUSINESS-RELEVANT STANDARDS AND GUIDELINES

IN THE FIELDS OF THE LANGUAGE INDUSTRY

Project Title: CELAN Project Type: Network Project Programme: LLP – KA2 Project No: 196466-LLP-1-2010-1-BE-KA2-KA2PLA

Version: 1.2 Date: 2013-01-30 Author: Blanca Stella Giraldo Pérez (sub-contract for standards investigation and analysis) Contributors: Infoterm (supervision), other CELAN partners (comments) The CELAN network project has been funded with support from the European Commission, LLP programme, KA2. This communication reflects the views only of the authors, and the Commission cannot be held responsible for any use which may be made of the information contained therein.

CELAN D2.1 ANNEX 2_fv1.2

2

Executive Summary

The investigation of industry&business-relevant standards and guidelines in the fields of the language industry (LI) was subdivided into four parts:

General standardization framework relevant to CELAN,

Basic standards related to the ICT infrastructure with particular impact on the LI,

Specific standards pertaining to language technologies, resources, services and LI related competences and skills,

Latest developments with respect to the complementarity of LI standards and assistive technologies standards.

At the end of the investigation a summary and recommendations are given.

Standards can play an important role in support of developing LI policies/strategies, educational schemes, language technology tools (LTT) and language and other content resources (LCR), language services or using the offers of language service providers (LSP). They can definitely contribute to the design of better products and the saving of financial resources from the outset and thus to avoid the need of having to retrofit products in later stages of their life cycle at significantly higher costs. They also can help enterprises to select staff with appropriate skills and competences or to have their staff trained duly taking pertinent standards into account.

At today’s stage of development of the LI, the respective standards are indispensable for producing high-quality products and rendering high-quality services thus

Making the use of LTT, LCR and LSP really cost effective,

Avoiding or at least minimizing undesirable mistakes and the resulting conflicts.

The overview gained through this investigation can help enterprises

To clearly define their requirements and specifications in conjunction with o the outsourcing of language services (LS),

o the development or adaptation of language technology tools/systems (LTT) as well as of language and other content resources (LCR);

To refer in tenders or bids or contracts of all sorts to the pertinent standards;

To state their competences/skills (or that of their staff) in compliance with standards;

To make their quality management transparent and understandable for their customers;

To become more competitive in an increasingly multilingual world.

Using standards properly and referring to them can help to avoid misunderstandings, undesirable mistakes and conflicts, risks and even liabilities.

From the above it can be derived that is necessary for LI product developers and LSP to be familiar with LI-related standards and the respective certification schemes. For users and customers of LI products and services a sufficient knowledge about these standards and certification schemes is definitely useful.

CELAN D2.1 ANNEX 2_fv1.2

3

Abbreviations

AAC augmentative and alternative communication AAL ambient assisted living ASTM ASTM International (formerly known as the American Society for Testing and

Materials) CD committee draft (first formal stage in the ISO standardization process) CEN European Committee for Standardization CL Common Logic CLARIN Common Language Resources and Technology Infrastructure CLDR Unicode Common Locale Data Repository CR CEN Report (equivalent to TR in ISO) CSS Cascading Style Sheets CWA CEN Workshop Agreement (equivalent to PAS in ISO) DC Dublin Core DCR ISO Data Category Registry (containing the data categories, ISOcats, of ISO/TC 37) DITA Darwin Information Typing Architecture DOL Distributed Ontology Language EA European co-operation for Accreditation (EA) EDI electronic data interchange EDIFACT electronic data interchange for administration, commerce and transport Electropedia online electrical and electronic terminology database of IEC (also known as the "IEV

Online") ELRA European Language Resources Association ETSI European Telecommunication Standards Institute ETSI ISG LIS European Telecommunications Standards Institute’s Industry Specification Group

“Localization Industry Standards” FLaReNet Fostering Language Resources Network FOSS Free and open source software GMX Global Information Management Metrics eXchange GPL general purpose language HTML Hypertext Mark-up Language HTTP Hypertext Transfer Protocol I&D Information and Documentation ICT information and communication technologies IEC International Electrotechnical Commission IEEE Institute of Electrical and Electronics Engineers IETF Internet Engineering Task Force IEV International Electrotechnical Vocabulary (of IEC) IPA International Phonetic Alphabet ISO International Organization for Standardization ITU International Telecommunication Union IUPAC International Union of Pure and Applied Chemistry IUPAB International Union of Pure and Applied Biology IUPAP International Union of Pure and Applied Physics JTC Joint Technical Committee L10N localization LCR language and other content resources LDML Locales Data Mark-up Language LI language industry LICS® Language Industry Certification system LISA Language Industry Standards Association LR language resource LRT language resource technology LS language services LSP language service providers LT language technology

CELAN D2.1 ANNEX 2_fv1.2

4

LTT language technology tools MDR metadata registry MEEK Functional Multilingual Extensions to European Keyboard Layouts META Network of the Multilingual Europe Technology Alliance: META-NET MoU/MG Management Group of the ITU-ISO-IEC-UN/ECE Memorandum of Understanding

concerning eBusiness standardization NMB national member bodies OASIS Organization for the Advancement of Structured Information Standards OAXAL Open Architecture for XML Authoring and Localization Reference Model OCR optical character recognition OD OpenDocument ODF Open Document Format for Office Applications OECD Organization for Economic Cooperation and Development OLAC Open Language Archives Community (within the framework of the Open Archives

Initiative, [OAI] OntoIOp Ontology Integration and Interoperability OWL Web Ontology Language PAS publicly available specification (in ISO – equivalent to CWA in CEN) PwD person with disabilities RDF Resource Description Framework SAM Speech Assessment Methods SAMPA (or SAM-PA) SAM Phonetic Alphabet SAP Standardisation Action Plan SC sub-committee SDO standards developing organizations SemAF Semantic annotation framework SKOS Simple Knowledge Organization System SME small and medium-sized enterprise SOAP Simple Object Access Protocol SPL special purpose language SRX Segmentation Rules eXchange SSH social sciences and humanities SynAF Syntactic annotation framework TBT Technical Barriers to Trade (Agreement on) TC technical committee TEI Text Encoding Initiative TMX Translation Memory eXchange TR Technical Report TS technical specification (in ISO) UBL Universal Business Language ULI Unicode Localization Interoperability technical committee UML Unified Modeling Language UN United Nations UNCSGN United Nations Conferences on the Standardization of Geographical Names UNCRPD United Nations Convention on the Rights of Persons with Disabilities UPU Universal Postal Union W3C World Wide Web Consortium WCAG Web Content Accessibility Guidelines WG working group WTO World Trade Organization XLIFF XML Localization Interchange File Format XML eXtensible Mark-up Language

CELAN D2.1 ANNEX 2_fv1.2

5

Table of contents: Executive Summary ........................................................................................................................................... 2 Abbreviations ..................................................................................................................................................... 3 Table of contents: .............................................................................................................................................. 5 0 Background ........................................................................................................................................... 6 1 General standardization framework relevant to CELAN ....................................................................... 7 1.1 Standardizing bodies and other standards developing organizations (SDO) ....................................... 7 1.2 Standards documents ........................................................................................................................... 9 1.3 Regulations concerning standardization and certification .................................................................... 9 1.3.1 Regulations governing standardization in general ................................................................................ 9 1.3.2 Regulations governing certification in general .................................................................................... 11 1.3.3 Regulations governing eCertification in general ................................................................................. 11 1.4 Primary and secondary standards to the LI ........................................................................................ 12 1.5 Methodology applied in this investigation of existing standards ......................................................... 12 2 Basic standards related to the ICT infrastructure with particular impact on the LI ............................. 14 2.1 Standards concerning character (glyph) coding, etc. ......................................................................... 14 2.1.1 International standards ....................................................................................................................... 14 2.1.2 European specific character requirements ......................................................................................... 14 2.1.3 European Culturally Specific ICT Requirements ................................................................................ 14 2.2 Standards related to the coding of names of countries, languages and scripts ................................. 14 2.3 Standards related to the application of character coding ................................................................... 15 2.3.1 Keyboard standards ............................................................................................................................ 15 2.3.2 Ordering rules ..................................................................................................................................... 16 2.3.3 Optical character recognition (OCR) ................................................................................................... 16 2.3.4 Speech-to-written and written-to-speech conversion ......................................................................... 16 2.4 Standards related to data modeling .................................................................................................... 17 2.4.1 Generic standards concerning data modeling .................................................................................... 18 2.4.2 Standards about data elements and metadata as well as metadata registries .................................. 18 2.4.3 Standards about semantic structuring ................................................................................................ 19 2.5 Protocols, formats and schemas ........................................................................................................ 19 2.6 Standards related to the quality of data and information .................................................................... 20 2.7 Information and Documentation (I&D) standards ............................................................................... 20 2.8 Standards related to mobility and accessibility ................................................................................... 21 2.9 Certification based on standards ........................................................................................................ 21 3 Specific standards pertaining to language technologies, resources, services and LI related

competences and skills ....................................................................................................................... 22 3.1 Language technologies (LT) and language technology tools (LTT) ................................................... 23 3.2 Language and other content resources (LCR) ................................................................................... 25 3.2.1 Unstructured LCR ............................................................................................................................... 25 3.2.2 Structured LCR ................................................................................................................................... 26 3.2.2.1 Standards concerning LCR methodology ........................................................................................... 27 3.2.2.2 Standards containing standardized content ....................................................................................... 27 3.2.2.2.1 Items of standardized structured content at a meta-level ............................................................ 28 3.2.2.2.2 LCR of standardized structured content per se ........................................................................... 29 3.2.2.2.3 LCR of non-standardized structured content ............................................................................... 29 3.3 Quality of language services and language service providers (LSP) ................................................. 29 3.4 LI related competences and skills ....................................................................................................... 31 3.4.1 ICT-focused eCertification .................................................................................................................. 31 3.4.2 LI-related skills/competences and eCertification ................................................................................ 32 3.5 Standards and guidelines concerning language policies/strategies ................................................... 33 4 Latest developments ........................................................................................................................... 34 5 Summary and recommendations ........................................................................................................ 39 References ...................................................................................................................................................... 42 Appendix 1: Tables .......................................................................................................................................... 43 Appendix 2: Recommendation on software and content development principles 2010 .................................. 48 Appendix 3: List of Standards developing organizations (SDO) in the fields of the ICT ................................. 49 Appendix 4: Localization (L10N) related standardization ................................................................................ 56

CELAN D2.1 ANNEX 2_fv1.2

6

CELAN D2.1 – ANNEX 2

Investigation of Industry&business-relevant standards and guidelines

in the fields of the language industry 0 Background The investigation of industry&business-relevant standards and guidelines in the fields of the language industry (LI) was subdivided into four aspects:

General standardization framework relevant to CELAN,

Basic standards related to the ICT infrastructure with particular impact on the LI,

Specific standards pertaining to language technologies, resources, services and LI related competences and skills,

Latest developments with respect to the complementarity of LI standards and assistive technologies standards.

The related aspects of

Industry&business-relevant language policies and strategies concerning language, standardization, certification and accessibility,

Analysis of policies concerning language, standardization, certification and accessibility, were to a large part taken off from this Annex and integrated in the main document D2.1.

The overview gained through this investigation can help enterprises

To clearly define their requirements and specifications in conjunction with o the outsourcing of language services (LS), o the development or adaptation of language technology tools/systems (LTT) as well

as of language and other content resources (LCR);

To refer in tenders or bids or contracts of all sorts to the pertinent standards;

To state their competences/skills (or that of their staff) in compliance with standards;

To make their quality management transparent and understandable for their customers;

To become more competitive in an increasingly multilingual world. Besides, using standards properly in the above-mentioned way can help to avoid misunderstandings, undesirable mistakes and conflicts, risks and even liabilities. It is necessary for LI product developers and LSP to be familiar with LI-related standards and the respective certification schemes. For users and customers of LI products and services a sufficient knowledge about these standards and certification schemes is definitely useful.

CELAN D2.1 ANNEX 2_fv1.2

7

1 General standardization framework relevant to CELAN Standards organizations and other standards developing organizations (SDO) identified as of relevance to CELAN are described in this chapter together with a general overview on standardization and the methodology applied. The organizations are summarized at the end of this document in Appendix 1, Table 3: Identified stakeholders. 1.1 Standardizing bodies and other standards developing organizations (SDO) There would be no language industry (LI) if there were no language technologies (LT) – and the development of technologies necessitates standards. Standards are primarily developed in the framework of official standardizing bodies, i.e. – in the context of the LI – mainly by the international standards organizations:

International Organization for Standardization (ISO),

International Electrotechnical Commission (IEC),

Joint Technical Committee ISO/IEC-JTC 1 Information technology,

International Telecommunication Union (ITU),

European Committee for Standardization (CEN),

European Telecommunication Standards Institute (ETSI). These international standardizing bodies and their national member bodies (NMB) develop so-called open and formal standards in a coordinated way through formal standardization. Here open means open to all societal stakeholders and accessible – but not necessarily free-of-charge. These standards are under copyright of the respective standardizing body, but not proprietary as those of other SDO. In the framework of the Joint Technical Committee ISO/IEC-JTC 1 “Information Technology” many of the general standards for all or most of the ICT are developed. However, increasingly other TC responsible for eBusiness, eHealth, eLearning and the like are also developing quite general standards having a bearing on the whole of the ICT. Beside the above-mentioned standardizing bodies there are many standards developing organizations (SDO) developing standards. In the field of the information and communication technologies (ICT) most of them are industry consortia developing industry standards – see Appendix 3. Many of these claim to develop “open” standards in the meaning of free-of-charge standards. But in most cases they have not been developed in the same “open” way as those of the standardizing bodies and are proprietary. Among these SDO only a few qualify as matching the official standardizing bodies in terms of authority and binding nature of their standards. Thus, the first task of this investigation on existing standards, guidelines and legislation was to identify the official standardizing bodies and other SDO which are developing international standards relevant to the CELAN project. In addition to ISO, IEC, ITU, ETSI and CEN the following SDO were found qualifying as developers of standards pertinent to the LI:

World Wide Web Consortium (W3C) and in particular its Internet Engineering Task Force (IETF),

Institute of Electrical and Electronics Engineers (IEEE),

Organization for the Advancement of Structured Information Standards (OASIS),

ASTM International (formerly known as the American Society for Testing and Materials). The organizations mentioned so far usually conduct their standardization work in committees, often called technical committees (TC) that mostly emerged in the wake of expressed needs for standards. In most cases these TC are quite independent to define their own scope and work programme. If they have to deal with many standards, the work is often distributed to sub-committees (SC) and working groups (WG) or the like. Usually the standardizing activities are better coordinated within a standardizing body or SDO than between different SDOs – although the experts working in the committees are usually networking across organizations and thus customarily coordinating the development of standards. But given this situation, certain overlaps and at the same time gaps are inevitable.

CELAN D2.1 ANNEX 2_fv1.2

8

In addition to the above, there are probably hundreds of industry consortia considering themselves as SDO in various fields of ICT standardization. (See Appendix 3: List of Standards developing organizations (SDO) in the fields of the ICT) SemanticStandards.org provides lists of semantic standards and other resources available on the Internet (See: http://www.semanticstandards.org/):

Wikipedia 1: http://en.wikipedia.org/wiki/List_of_XML_markup_languages,

Wikipedia 2: http://en.wikipedia.org/wiki/List_of_XML_schemas,

Wikipedia 3: http://en.wikipedia.org/wiki/Category:Industry-specific_XML-based_standards,

Standard Setting Organizations and Standards List,

Survey of Fora & Consortia,

SEMIC.EU,

Xov Repository. The lists and other resources accessible through these links indicate that there are tens of standardization efforts going on in vertical industrial fields (incl. eBusiness&eCommerce, eHealth, eGovernment) aiming at content interoperability by developing semantic standards.

Including all kinds of pertinent industry standards (incl. vertical standards), standardization concerning content interoperability (i.e. semantic standards) is a quite complex and fragmented field. This situation seems to become too complex (resulting in high costs) even for large enterprises. LI-related standards could be seen as one kind of vertical standards, but as they relate to many kinds of ICT and other standards, they are taken here as part of the horizontal standards. The problem is that mainstream ICT standardization does not sufficiently recognize them as indispensable in all ICT standards where language aspects are touched upon or impacted. Given this situation, SMEs might become entrapped to invest into tools/systems oriented at vertical industry standards that – if the necessary capability for integration and interoperability is not given – could prevent the expansion into new markets or the upgrading of their systems in the future, not to mention the costly conversion or re-input of their content, if systems have to be upgraded or replaced.

As the LI is a fairly young industrial sector there are many standardization activities going on, which more appropriately could be termed as “pre-normative research and development”. Several such organizations (including project consortia or networks) could be identified, which develop pre-normative documents that are accepted as quasi-standards by certain stakeholder groups and increasingly also by the LI at large, such as

Text Encoding Initiative (TEI),

Common Language Resources and Technology Infrastructure (CLARIN),

European Language Resources Association (ELRA),

Fostering Language Resources Network (FLaReNet),

Network of the Multilingual Europe Technology Alliance (META): META-NET. Some new project consortia have been formed recently. As a consequence of the situation outlined above, a considerable number of standards, guidelines, recommendations and best practices have emerged in support of the development of the LI as well as to promote competitiveness and quality of products and services on the market. This situation shows that there is clearly a need for more standards on the one hand and for the harmonization of existing standards on the other hand. At national level, the activities of standardizing organizations are often based on national legislation. The European Union also has had a keen interest in standardization – latest when the discussion about achieving a single market started in the 1980s. Some countries – also in Europe – have developed legislation concerning the status and use of the language(s) under their juris-diction. Furthermore, EU institutions have developed a number of guidelines and recommendations concerning multilinguality and multilingualism in Europe. Public and political interest in standardization and public and political interest in language issues meet under different aspects of the LI.

CELAN D2.1 ANNEX 2_fv1.2

9

1.2 Standards documents The second task was to analyze the kinds of documents falling under the term “standard”, in the scope of the LI, which range from the standards of standardizing bodies (including basic standards, publicly available specifications, technical reports, codes of practice, etc.) via more or less normative guidelines, recommendations to best practices of all kinds of SDO, some of which are considered as quasi-standards. In addition to the historical technical standards only referring to “hardware”, later also “software” (in the traditional sense), today there are several distinct types of standards:

Methodology standards (probably already comprising more than 50% of all standards),

Terminology standards (or a part on terms and definitions in subject standards),

Product/process/service standards,

Interface standards,

Testing standards,

Standardized coding systems,

Data standards. Many existing standards are a mixture of the above.

Given the complexity and fragmentation of standardization efforts going on with respect to content interoperability or semantic standards, this investigation concentrates on international standards and those standards that can be considered as horizontal and largely generic. This is meant as a guide to a higher safety of investment into the respective ICT infrastructure especially by SMEs.

1.3 Regulations concerning standardization and certification In some cases there are legal provisions concerning standardization, in most cases there are standards documents governing standardization. In these normative “meta-standards” “standard-ization” is defined as “an activity for establishing, with regard to actual or potential problems, provisions for common and repeated use, aimed at the achievement of the optimum degree of order in a given context”. [see ISO/IEC Guide 2] In particular, the activity consists of the processes of formulating, issuing and implementing standards – based on the consensus of the most important stakeholders. Therefore, standards published by official standardizing bodies are called open standards in contrast to industry standards, which usually are proprietary. Big efforts are undertaken to harmonize existing open standards at national, regional and international levels so that they do not compete with or even contradict each other, which is also regulated by the WTO (World Trade Organization) Agreement. The Agreement on Technical Barriers to Trade (TBT) – sometimes referred to as the Standards Code – is one of the legal texts of the WTO Agreement which obliges WTO Members to ensure that technical regulations, voluntary standards and conformity assessment procedures do not create unnecessary obstacles to trade. Annex 3 of the TBT Agreement is the Code of Good Practice for the Preparation, Adoption and Application of Standards which is known as the WTO Code of Good Practice. In accepting the TBT Agreement, WTO Members agree to ensure that their central government standardizing bodies accept and comply with this Code of Good Practice and agree also to take reasonable measures to ensure that local government, non-governmental and regional standardizing bodies do the same. 1.3.1 Regulations governing standardization in general The main standards documents governing standardization in general at international level are:

ISO/IEC Directives Part 1 (2012) Procedures for the technical work,

ISO/IEC Directives Part 2 (2011) Rules for the structure and drafting of International Standards,

ISO/IEC Directives, Supplement. Consolidated procedures specific to ISO (2012), [including also Annex SQ (normative) Procedures for the standardization of graphical symbols and Annex SK (normative) Procedure for the development and maintenance of standards in database format]

CELAN D2.1 ANNEX 2_fv1.2

10

ISO/IEC Directives, IEC Supplement, Procedures specific to IEC (2012), [including also Annex I (normative) Implementation of the ISO/IEC Directives for the work on the International Electrotechnical Vocabulary]

ISO/IEC Guide 2:2004 Standardization and related activities – General vocabulary,

ISO/IEC Directives, Supplement. Procedures specific to JTC 1 (2010),

ISO/IEC Guide 21-1:2005 Regional or national adoption of International Standards and other International Deliverables – Part 1: Adoption of International Standards,

ISO/IEC Guide 21-2:2005 Regional or national adoption of International Standards and other International Deliverables – Part 2: Adoption of International Deliverables other than International Standards.

Since the mid-eighties standards has become more and more regarded as a major prerequisite for ensuring quality through systematic quality management. In this connection, quality has been defined as the totality of features and characteristics of a product or service that bear on its ability to satisfy stated or implied needs. The main standards documents governing quality management in general at international level are:

ISO 9000 (series) Quality management systems – Fundamentals and vocabulary,

ISO 9001:2008 Quality management systems – Requirements,

ISO 9004:2009 Managing for the sustained success of an organization – A quality management approach,

ISO 14001:2004 Environmental management systems - Requirements with guidance for use.

With the international standard ISO 8000 (series) Data quality, data and information quality has entered into the level of quality management at large.

The general framework for European standardization policy is provided by the following basic documents:

Directive 98/34EC of the European Parliament and of the Council laying down a procedure for the provision in the field of technical standards and regulations,

Decision No 1673/2006/EC of the European Parliament and of the Council of 24 October 2006 on the financing of European standardization,

General guidelines for the co-operation with the European Standards Organizations (2003), which can be accessed under: http://ec.europa.eu/enterprise/policies/european-standards/documents/general-framework/index_en.htm The “Directive 98/34EC of the European Parliament and of the Council laying down a procedure for the provision in the field of technical standards and regulations” is the most important legal act concerning European standardization. In order to avoid barriers to trade caused by deferring national standards in the different member states of the EU, it stipulates a mutual obligation and a procedure to inform both the other member states and the Commission in advance when a new national standards is planned to be adopted. In addition, the Directive determines in its annexes I and II which standard organizations both at European level and national level are the "recognised" standards bodies within the EU. The “Decision No 1673/2006/EC of the European Parliament and of the Council of 24 October 2006 on the financing of European standardization” establishes the legal basis for the financial support provided by the Commission to the European standardisation system. This financial support consists mainly in grants both for the functioning of the central secretariats of the three European standards organisations and for the support of specific actions to be carried out by the European standards organisations. The overall sum paid for the support of European standardisation has been stable over the last years and amounts to annually € 20 million. The general guidelines for the co-operation between CEN, CENELEC and ETSI and the European Commission and the European Free Trade Association, adopted and signed on 28 March 2003, are a purely political document. Therein, all the partners confirm their common understanding

CELAN D2.1 ANNEX 2_fv1.2

11

about the role of European standardisation, about its principles such as openness, transparency and impartiality and about their willingness to cooperate, on the basis of these principles, in support of European policies. 1.3.2 Regulations governing certification in general Certification is defined as a procedure by which a first, second or third party gives written assurance that a product, process or service conforms to specified requirements. Certification involves a number of documented processes, at the end of which there is a documented assessment result. In the context of ISO 9001:2008 or ISO 14001:2004, “certification” refers to the issuing of written assurance (the certificate) by an independent external body that it has audited a management system and verified that it conforms to the requirements specified in the standard. “Registration” means that the auditing body then records the certification in its client register. So, the organization’s management system has been both certified and registered. “Certification” is the term most widely used worldwide, although “registration” is often preferred in North America, and the two are used interchangeably. “Accreditation” means something different from certification (or registration). Accreditation is a third-party attestation (i.e. a formal recognition) by an accreditation body that a conformity assessment body (i.e. certification body) is competent to carry out ISO 9001:2008 or ISO 14001:2004 certification in specified business sectors. The European co-operation for Accreditation (EA) is the network of the national accreditation bodies in Europe. In simple terms, accreditation is like certification of the certification body. Certificates issued by accredited certification bodies are perceived on the market as having increased credibility. The main standards documents governing certification in general at international level are:

ISO/IEC 17000:2004 Conformity assessment – Vocabulary and general principles,

ISO 19011:2011 Guidelines for auditing management systems. Besides, ISO 9001:2008 or ISO 14001:2004 contain also provisions concerning certification. If the certification scheme is based on a standard, the standards compliance is assessed according to validation or verification criteria, defined as policy, procedure or requirement used as a reference against which evidence is compared. The quality of data and data-related services and tools has only recently entered the radar of quality assessment and certification approaches. Needless to say that the potential for high quality of data and related services and tools is higher if they are standards-based.

The more “normative” standards are, the easier it is to establish certification schemes based on standards. Thus “certification” has become a determining issue in connection with quality-related standards. Certification based on international standards is gaining importance.

1.3.3 Regulations governing eCertification in general In recent years “eCertification” (or ICT certification) has been gaining importance. eCertification can be considered as the set of processes by which an individual gains a credential in a particular ICT skill/competence or more generally in a range of skills/competences. The main standards documents governing eCertification in general at international and European level are:

ISO/IEC 17024:2003 Conformity Assessment – General requirements for bodies operating certification of persons,

ISO/IEC TR 19759:2005 Software Engineering – Guide to the Software Engineering Body of Knowledge (SWEBOK),

ISO/IEC 24773:2008 Certification of software engineering professionals – Comparison framework,

CWA 16052:2009 ICT Certification in Europe,

CWA 15515:2006 European ICT Skills Meta-Framework – State-of-the-art review, clarification of the realities, and recommendations for next steps.

CELAN D2.1 ANNEX 2_fv1.2

12

CEN Workshop Agreements (CWA) are CEN publications similar to ISO/PAS (Publicly Available Specification), a sort of pre-standard. CEN Members have agreed that certain CWAs may be provided on the CEN web-site for electronic downloading free of charge. In connection with the LI, standards compliance may refer to first of all methodology standards, such as:

Language technology tools/systems (LTT),

Language and other content resources (LCR),

Language services and their provision by language service providers (LSP),

eCertification of the competences/skills of LI experts,

Training schemes referring to the above and the training material used. The requirement that content (especially data structures and data) should be standards-compliant so that it can be certified up to the degree of content interoperability is a relatively new conception. Since 2010 there is even an international standard on an important aspect of language policy, namely ISO 29383:2010 Terminology policies – Development and implementation. (see item 3.5) 1.4 Primary and secondary standards to the LI The third task was to broadly categorize the identified standards and standardization activities into primary and secondary standards from the point of view of the LI. The following categories were considered as primary standards:

General standards for the ICT infrastructure (including LTT),

Basic and specific standards for the LI, and to further subcategorize the LI-related standards. Secondary standards (such as general hardware and software standards) are not considered in this investigation. Sometimes it is not easy to clearly differentiate the categories due to overlapping issues/aspects. General and basic standardization activities having a bearing on the LI in various “vertical” fields of standardization – e.g. Health informatics (ISO/TC 215), Optics and photonics (ISO/TC 172), Assistive products for persons with disability (ISO/TC 173), Environmental management (ISO/TC 207) and several others – are increasing. Often these activities do not sufficiently know or respect the rules and regulations at the level of the general standardization framework relevant to the LI. 1.5 Methodology applied in this investigation of existing standards This document aims at identifying those “standards” of primary relevance to the areas covered by the CELAN project on the basis of existing documents and information available on the Internet. The information was collected keeping in mind the needs of small and medium-sized enterprises (SMEs) that want to globalize either now or in the near future. The identified standards are classified, as much as possible, following the objectives of the CELAN project and in particular the CELAN Typology of LI products and services developed for this purpose. (see: CELAN D2.1 Annex 3) The methodology for the identification of pertinent standards was based on selected documents and studies about standardization as well as on the websites of standardizing bodies and SDO. The information about standards was compiled from these and other background information and the websites of standardizing bodies and SDO. The following documents set the background to identify relevant standards in the area and quote descriptions and features of individual standards after having determined the tasks described above: (1) Monica Monachini e.a. The Standards’ Landscape Towards an Interoperability Framework. The FLaReNet proposal. Building on the CLARIN Standardisation Action Plan (July 2011): This document presents an overview of the current scene towards an Interoperability Framework and acts as a reference point for the current standards that the community fosters and encourages to adopt/improve. It was drafted in close synchronization with other relevant initiatives such as CLARIN, ELRA, ISO and TEI and META-Share. Building on the CLARIN Standardisation Action Plan, the document adapts and extends it to the needs of the broader LT Community, beyond the SSH (social sciences and humanities) research areas including the industry.

CELAN D2.1 ANNEX 2_fv1.2

13

The main goal of this document is to give a practical orientation for various LT players, both commercial and academic; the main message being that a harmonized domain of language resources and technology can be achieved stepwise, but that an effort to adopt standards is necessary to overcome fragmentation. http://www.flarenet.eu/sites/default/files/FLaReNet_Standards_Landscape.pdf (2) Kara Warburton. Standards and Guidelines for the Language Industry (2009): This document describes some standards that could be important to the language industry, specifically in the areas of content development (authoring), content management, translation and terminology. It focuses on the key relevant standards in these areas and does not claim to be an exhaustive study of all language-related standards. With a few exceptions, standards dating prior to 2000 were not included because their currency was considered questionable, particularly for technical standards. In addition, some of the resources described are not actually formal standards, but may refer to internationally recognized guidelines and best practices, or resources that are required to implement standards. (see also http://www.crtl.ca/dl119&%3Bdisplay: Kara Warburton (2007). Standards and Guidelines for the Language Industry. Language Technologies Research Centre (March 2006/Revised Feb. 2007) (3) Gerhard Budin. Identification of problems in the use of LR standards and of standardization needs (2009): The purpose of this document is to identify problems that are encountered when Language Resource (LR) standards are used and to identify and describe needs for such LR standards. In order to reach its goal, it is unavoidable to first identify existing LR standards focusing on their scopes and application areas in order to subsequently assess their actual use by different user groups and the problems they have encountered in this. (FLaReNet Deliverable D 4.1 16 Oct) (4) Nuria Bel e.a. Standardization Action Plan for CLARIN, 2009: This document describes a proposal for a Standardisation Action Plan (SAP) for the Clarin initiative in close synchronization with other relevant initiatives such as Flarenet, ELRA, ISO and TEI. While Flarenet is oriented towards a broader scope since it is also addressing standards that are typically used in industry, CLARIN wants to be more focused on its statements to the research domain. Due to the overlap it is agreed that the Flarenet and CLARIN documents on standards need to be closely synchronized. This note covers standards that are generic (XML, UNICODE) as well as standards that are domain specific where naturally the language resource technology (LRT) community has much more influence. http://www.clarin.eu/node/2841 It was found out that the standards referred to in these background documents are classified from different perspectives according to the objectives and purposes stated by different groups interested in their development, implementation and research or updating. In any case the above-mentioned materials proved to be a good starting point to carry out the investigation in preparation of the present document. Consciously, however, not every standards consortium or organization we know about can be included. In order to qualify, consortia comprised on the list must meet certain criteria, including the following:

The organization must be international in outlook and scope, not simply an instrument of single-nation policy;

The organization must have an active and international membership;

The organization must not be set-up specifically as a single-vendor, government, or proprietary technology advocacy group;

The organization's work must be of major importance to the areas of LI-related standardization or its processes.

In this identification process formal and de-facto standards, as well as guidelines and more or less mandatory recommendations were considered. Besides, best practices that have become a sort of model to be followed were taken into consideration, if they can help users to pave the way to success.

CELAN D2.1 ANNEX 2_fv1.2

14

2 Basic standards related to the ICT infrastructure with particular impact on the LI This category comprises general ICT “standards” that inevitably underlie all LI sectors. Of course, they are also basic standards, but at a more general ICT level. They are required in different stages of development in ICT in general, but having an impact on LTT. As many standards are multipart documents covering a range of aspects, they could be mentioned in clause 2 as general (basic) standards as well as in clause 3 (possibly with certain parts only). 2.1 Standards concerning character (glyph) coding, etc. Standards falling under 2.1, 2.2 and 2.3 below are a prerequisite for software design to support translation, localization and other language services efficiently. 2.1.1 International standards The standardized coding of characters (glyphs) is fundamental for any exchange of any written information – including spoken language transcribed into text or the transcription of text written in different writing systems. It is basic for all derived activities, from computer-assisted writing via translation and localization to publishing. In spite of the high level of development of the respective standards, there are still characters (glyphs) to be coded, not to mention newly emerging ones (e.g. emoticons). Backward compatibility of software for older text and data is another big issue with many problems to be solved. The most basic of the basic standards related to character coding is ISO 10646 and its implementation in the form of Unicode. The main standards documents at international level are:

ISO 10646:2011 Information technology – Universal Coded Character Set (UCS),

UTF-8 UCS (Universal Coded Character Set) Transformation Format,

Unicode Standard Annex #29: Text boundaries (5.0.0). 2.1.2 European specific character requirements UNICODE does not/cannot cover the totality of aspects of character coding in the world. For needs in Europe CEN/TC 304 developed the following standards:

CEN/TR 14381:2003 Information technology – Character repertoire and coding transformations – European fallback rules,

CEN/TS 1923:2003 European character repertoires and their coding – 8-bit single-byte coding,

CR 13907:2000 Information Technology – Character Repertoire and Coding Transformations – General model for graphic character transformations,

CR 13928:2000 Information Technology – Guide to the use of character set standards in Europe.

2.1.3 European Culturally Specific ICT Requirements In addition to the above-mentioned European Standards there are also two CWA related to localization issues (including character set aspects), available on the CWA download area at CEN:

CWA 13873:2000 Information Technology – Multilingual European Subsets in ISO/IEC 10646-1,

CWA 14094 European Culturally Specific ICT Requirements. 2.2 Standards related to the coding of names of countries, languages and scripts Perhaps as fundamental as the standards related to character coding are those concerning the coding of names of countries, languages and scripts. The codes often have to be used in combination, such as “en US” standing for US American English. The most important of these standards are:

ISO 3166 (multipart) Codes for the representation of names of countries and their subdivisions,

UN M.49 – Standard country or area codes for statistical use,

ISO 639 (multipart) Codes for the representation of names of languages,

IETF RFC 4646 – Tags for identifying languages,

CELAN D2.1 ANNEX 2_fv1.2

15

IETF RFC 4645 – Initial language subtag registry,

IETF RFC 4647 – Matching of language tags,

ISO 15924:2004 Information and documentation – Codes for the representation of names of scripts.

Whereas the codes according to ISO 3166 and ISO 15924 are comparatively stable, the 3- letter and 4-letter codes of ISO 639 are constantly extended. At present the codes comprise the following numbers of names of languages and their variants:

189 2-letter code elements according to ISO 639-1:2002 Codes for the representation of names of languages – Part 1: Alpha-2 code,

485 3-letter code elements according to ISO 639-2:1998 Codes for the representation of names of languages – Part 2: Alpha-3 code,

about 6,500 3-letter code elements according to ISO 639-3:2007 Codes for the representation of names of languages – Part 3: Alpha-3 code for comprehensive coverage of languages and ISO 639-5:2008 Codes for the representation of names of languages – Part 5: Alpha-3 code for language families and groups,

the total – including 639-6:2009 Codes for the representation of names of languages – Part 6: Alpha-4 code for comprehensive coverage of language variants may probably already amount to around 20,000.

When the 639 codes will once include also all dialects and other language variants, the total of code elements may well reach several 100.000 in the future. The language codes are constantly updated and maintained by dedicated registration authorities – therefore, the year of publication of the standards documents is misleading. 2.3 Standards related to the application of character coding

2.3.1 Keyboard standards Although we take it for granted to use our keyboard layout every day, there are many keyboard layout standards in the world, many or most of them industry standards for language communities in less developed countries. Still there are many gaps, especially for language communities with languages of limited diffusion. At international and European level there are:

ISO/IEC 14755:1997 Information technology – Input methods to enter characters from the repertoire of ISO/IEC 10646 with a keyboard or other input device,

ISO/IEC 9995-1:2009 Information technology – Keyboard layouts for text and office systems – Part 1: General principles governing keyboard layouts,

ISO/IEC 9995-2:2009 Information technology – Keyboard layouts for text and office systems – Part 2: Alphanumeric section,

ISO/IEC 9995-3:2010 Information technology – Keyboard layouts for text and office systems – Part 3: Complementary layouts of the alphanumeric zone of the alphanumeric section,

ISO/IEC 9995-4:2009 Information technology – Keyboard layouts for text and office systems – Part 4: Numeric section,

ISO/IEC 9995-7:2009 Information technology – Keyboard layouts for text and office systems – Part 7: Symbols used to represent functions,

ISO/IEC 9995-8:2009 Information technology – Keyboard layouts for text and office systems – Part 8: Allocation of letters to the keys of a numeric keypad,

ISO/IEC DIS 9995-9:2012 Information technology — Keyboard layouts for text and office systems — Part 9: Multilingual-usage, multiscript keyboard group layouts,

ISO/IEC FDIS 9995-10 Information technology – Keyboard layouts for text and office systems – Part 10: Conventional symbols and methods to represent graphic characters not uniquely recognizable by their glyph on keyboards and in documentation,

CR 14270:2001 European keyboards – Guidelines and overview (CEN Report based on ISO/IEC 9995).

There are many more at national level for input in Chinese, Japanese, Arabic, Indian languages etc.

CELAN D2.1 ANNEX 2_fv1.2

16

Given the fact that due to workplace mobility the same computer may be used at times by different mother-tongue speakers the MEEK initiative tries to find the best solution for a pan-European keyboard which can be used by everybody everywhere in Europe without the need:

To plug in a different keyboard (and possibly disable certain functionalities of the software),

To use the virtual keyboard on the display (which slows down performance and may also disable certain functionalities of the software).

MEEK (Functional Multilingual Extensions to European Keyboard Layouts) is a CEN Workshop aimed to assist in the preparation of multilingual extensions to European keyboard layouts. (http://www.csc.fi/english/pages/meek).

2.3.2 Ordering rules Concerning ordering rules for characters (glyphs), the following International Standard exist:

ISO/IEC 14651:2011 Information technology – International string ordering and comparison – Method for comparing character strings and description of the common template tailorable ordering (ISO/IEC-JTC 1/SC 2),

ISO 12199:2000 Alphabetical ordering of multilingual terminological and lexicographical data represented in the Latin alphabet (ISO/TC 37/SC 2).

The latter also provides provisions for parallel alphabetical ordering for different purposes (e.g. telephone book vs. dictionary in some languages). For languages, especially those with a script other than Latin, there are often no standards, only conventions – often even several competing ones. 2.3.3 Optical character recognition (OCR) The standards on OCR-A and OCR-B are also important for industry among others with respect to replace the manual input of large text/data volumes by OCR input. The two International Standards to be mentioned here are:

ISO 1073-1:1976 Alphanumeric character sets for optical recognition – Part 1: Character set OCR-A – Shapes and dimensions of the printed image,

ISO 1073-2:1976 Alphanumeric character sets for optical recognition – Part 2: Character set OCR-B – Shapes and dimensions of the printed image.

Complementary to the above, the following European standard provides further specifications:

EN 14603:2004 Information technology – Alphanumeric glyph image set for optical character recognition OCR-B – Shapes and dimensions of the printed image.

OCR has become an indispensable tool in many office devices. In connection with large volume scanning of documents and publications on the one hand and the recognition of handwriting on the other hand, it has gained great importance. For the large-scale archiving of documents in a form that allows text processing, it is also indispensable. 2.3.4 Speech-to-written and written-to-speech conversion In certain research areas phonetic transcriptions are required for further speech processing. Often the International Phonetic Alphabet (IPA) is still used here in spite of its limitations. In the EU-funded project "Speech Assessment Methods" (SAM), SAMPA (or SAM-PA – SAM Phonetic Alphabet) was developed in order to facilitate email data exchange and computational processing of transcriptions in phonetics and speech technology. SAMPA specifies IPA characters in terms of ASCII characters, but since SAMPA is based on phoneme inventories, each SAMPA table is valid only in the language it was created for. In order to make this IPA encoding technique universally applicable, X-SAMPA was created, which provides one single table without language-specific differences. SAMPA and X-SAMPA are still widely used in computational phonetics and in speech technology. The latter is still useful as the basis for an input method for true IPA. CXS (CONLANG X-SAMPA) is an unofficial extension of X-SAMPA used by members of the Conlang Mailing List with the intention of improving the system for use in language construction.

CELAN D2.1 ANNEX 2_fv1.2

17

SAMPA – Speech Assessment Methods Phonetic Alphabet; See: www.phon.ucl.ac.uk/home/sampa

X-SAMPA – Extended Speech Assessment Methods Phonetic Alphabet

CXS – Conlang Extended Speech Assessment Methods Phonetic Alphabet (unofficial version of X-SAMPA)

Although X-SAMPA to Unicode IPA conversions have been developed, this field may need harmonization in the wake of the requirements for content interoperability (covering also content accessibility). Most recently instant interpreter applets for smart phones interpret (i.e. translate with speech output) simple utterances between a multitude of languages fairly reliably. It could well be that we are on the eve of a breakthrough in the use of spoken natural language at the user interface to all sorts of ICT devices.

The use of different character sets and fonts as well as codings not conforming to standards can lead to many undesired consequences in publications, technical documentation, speech-to-written conversion and various kinds of communication within the intranet of an enterprise as well as with the outside world. Therefore, strict rules concerning this comparatively primitive aspect of language processing should be part of the overall language policy/strategy of an enterprise.

2.4 Standards related to data modeling Data modeling in software engineering is the process of creating a data model for an information system by applying formal data modeling techniques. This process is used to define and analyze data requirements needed to support the business processes within the scope of corresponding information systems in organizations. Therefore, the process of data modeling involves professional data modelers working closely with business stakeholders, as well as potential users of the information system. Without appropriate data models and data modeling methods as well as techniques information processing cannot be efficient – not to mention content integration and interoperability. There are three different types of data models produced for the information system while progressing from requirements to the actual database. The data requirements are initially recorded as a conceptual data model which is essentially a set of technology independent specifications about the data and is used to discuss initial requirements with the business stakeholders. The conceptual model is then translated into a logical data model, which documents structures of the data that can be implemented in databases. Implementation of one conceptual data model may require multiple logical data models. The last step in data modeling is transforming the logical data model to a physical data model that organizes the data into tables, and accounts for access, performance and storage details. Data modeling defines not just data elements, but also their structures and relationships between them. Data modeling techniques and methodologies are used to model data in a standard, consistent, predictable manner in order to manage it as a resource. The use of data modeling standards is strongly recommended for all projects requiring a standard means of defining and analyzing data within an organization, e.g., using data modeling:

To manage data as a resource;

For the integration of information systems;

For designing databases/data warehouses (aka data repositories) Data modeling may be performed during various types of projects and in multiple phases of projects. Data models are progressive; there is no such thing as the final data model for a business or application. Instead a data model should be considered a living document that will change in response to a changing business. The data models should ideally be stored in a repository so that they can be retrieved, expanded, and edited over time. Whitten (2004) determined two types of data modeling:

CELAN D2.1 ANNEX 2_fv1.2

18

Strategic data modeling: is part of the creation of an information systems strategy which defines an overall vision and architecture for information systems. Information engineering is a methodology that embraces this approach;

Data modeling during systems analysis: In systems analysis logical data models are created as part of the development of new databases.

Data modeling is also used as a technique for detailing business requirements for specific databases. It is sometimes called database modeling because a data model is eventually implemented in a database. 2.4.1 Generic standards concerning data modeling There are many standards concerning data modeling – the most generic being:

ISO 11179 (multipart) Metadata Registries (MDR) (ISO/IEC-JTC 1/SC 32/WG 2 MetaData),

ISO ISO/IEC TR 20943 (multipart) Information technology – Procedures for achieving metadata registry content consistency [incl. Part 1: Data elements] (ISO/IEC-JTC 1/SC 32/WG 2),

Unified Modeling Language (UML) (this OMG standard is a standardized general-purpose modeling language in the field of object-oriented software engineering).

Besides, there are data modeling standards in various eApplication areas (i.e. vertical standards), such as

ISO 9735 (multipart) Electronic data interchange for administration, commerce and transport (EDIFACT) (ISO/TC 154 Processes, data elements and documents in commerce, industry and administration),

ISO 10303 (multipart) Industrial automation systems and integration – Product data representation and exchange (STEP – incl. also EXPRESS) (ISO/TC 184/SC 4 Industrial data),

ISO/TS 15000 (series) Electronic business eXtensible Markup Language (ebXML) (ISO/TC 154 Processes, data elements and documents in commerce, industry and administration),

ISO 13584 (multipart) Industrial automation systems and integration – Parts library (PLIB) (ISO/TC 184/SC 4 Industrial data),

ISO/IEC 19788:2012 (multipart) Information technology – Learning, education and training – Metadata for learning resources (ISO/IEC-JTC 1/SC 36 Information technology for learning, education and training),

IEC 61360 (series) Standard data element types with associated classification scheme for electric components (IEC/SC 3D/WG2 Classification of components and definition of technical data element types),

Universal Business Language (UBL) (this OASIS standard is a library of standard electronic XML business documents such as purchase orders and invoices).

In the educational area IEEE 1484.12.1:2002 (multipart) Standard for Learning Object Metadata (LOM) describes the attributes that learning objects may have, which can also be applied for computer-assisted language learning (CALL). The closely related SCORM (Sharable Content Object Reference Model), a collection of standards and specifications for web-based eLearning, is about creating units (i.e. sharable content objects – SCO) of online training material that can be shared/reused across systems and in different contexts. SCORM is also a good example for the fact that increasingly several standards have to be applied for a given purpose.

Data standards related to data modeling can be further subdivided into:

Standards about data elements and metadata as well as metadata registries,

Standards about semantic structuring.

2.4.2 Standards about data elements and metadata as well as metadata registries Besides the above-mentioned ISO 11179 (multipart) Information technology – Metadata registries (MDR) (of ISO/IEC-JTC 1/SC 32/WG 2) the following documents can be mentioned here:

CELAN D2.1 ANNEX 2_fv1.2

19

ISO 15836:2009 Information and documentation – The Dublin Core metadata element set

OECD (2007) Data and metadata reporting and presentation handbook

OLAC Metadata (OLAC uses an XML format to interchange language-resource metadata within the framework of the Open Archives Initiative [OAI]).

Under a different perspective, metadata and data categories can be seen as a particular kind of structured content – see clause 3.2.2. The OLAC (Open Language Archives Community) Metadata Usage Guidelines provide guidelines on how to describe language resources using the OLAC metadata standard. The standard itself documents the formal syntax of valid metadata records, but does not explain or exemplify the use of the individual metadata elements. In detail it describes also the usage:

Of the attributes that may be used on metadata elements;

In the OLAC context for the fifteen core elements of the Dublin Core metadata set.

Of other elements that may be used. Concluding with remarks about the granularity of resources it addresses the question of what size of thing (from file up to entire corpus) is treated as an item in an OLAC repository. 2.4.3 Standards about semantic structuring In connection with the efforts to find solutions for processing semantics, the following standards can be mentioned:

OWL – Web Ontology Language (a specification endorsed by the W3C),

SKOS – Simple knowledge Organization System (a family of formal languages designed for representation of thesauri, classification schemes, taxonomies, subject-heading systems, or any other type of structured controlled vocabulary. currently developed within the W3C framework),

ISO/IEC 24707:2007 Information technology – Common Logic (CL): a framework for a family of logic-based languages.

In ISO/TC 37 Terminology and other language and content resources the following new document is on its way towards becoming an international standard:

ISO/CD 17347 Ontology Integration and Interoperability (OntoIOp) – Part 1: The Distributed Ontology Language (DOL).

DOL specifies a meta ontology language which can make existing ontology methods and systems interoperable. ISO 17347 will hopefully make strongly heterogeneous ontologies (especially the light-weight ontologies of structured content) interoperable, including the full range of re-usability as well as re-purposability of content (such as adapting content for eLearning, or persons with disabilities – PwD). 2.5 Protocols, formats and schemas There is an abundance of markup languages, protocols, formats and schemas (often with different versions or releases) that are of fundamental importance in the LI, such as:

XML Extensible Markup Language,

XML Schema,

XML Encryption,

XML Signature,

HTML – Hypertext Markup Language,

XHTML – Extensible Hypertext Markup Language,

RDF – Resource Description Framework,

Simple Object Access Protocol (SOAP),

Hypertext Transfer Protocol – HTTP. As mentioned in clause 1.1 SemanticStandards.org provides lists of semantic standards and other resources available on the Internet. (See: http://www.semanticstandards.org/) Under http://en.wikipedia.org/wiki/List_of_XML_markup_languages a list of XML markup languages can be found, and XML schemas are listed under http://en.wikipedia.org/wiki/List_of_XML_schemas.

CELAN D2.1 ANNEX 2_fv1.2

20

Under http://en.wikipedia.org/wiki/Category:Industry-specific_XML-based_standards one can find Industry-specific XML-standards.

In various fields of the LI, especially in language services, this affluence of formats, schemas and markup languages is constantly causing problems in carrying out the services. Therefore, it needs a high level of ICT literacy on the LSP’s side in order to master these intricacies and complexity.

2.6 Standards related to the quality of data and information On the one hand, the quality of content (data and information) depends on the application of more or less all pertinent standards on quality management of language services in general. On the other hand, there are standards directly referring to data quality emerging from eApplication areas, such as in eBusiness:

ISO 8000 (series) Data quality (ISO/TC 184/SC 4 Industrial data),

ISO 22745 (multipart) Industrial automation systems and integration – Open technical dictionaries and their application to master data (eOTD) (ISO/TC 184/SC 4 Industrial data).

The International Association for Information and Data Quality (IAIDQ) has been founded in 2004 as a professional association for those interested in improving business effectiveness through quality data and information. It is composed of individual and corporate members and has national member associations in certain countries. IAIDQ developed among other activities the Information Quality Certified Professional (IQCP) certification for individual experts.

Data and information quality in connection with content integration and interoperability has become an important topic in content and knowledge management. 2.7 Information and Documentation (I&D) standards Some standards developed in ISO/TC 46 Information and documentation for I&D as well as library applications are also of relevance to the management of language and other content resources (LCR), such as those for the referencing of sources, for records management and for the conversion of written languages:

ISO 690:2010 Information and documentation – Guidelines for bibliographic references and citations to information resources (ISO 12615:2004 Bibliographic references and source identifiers for terminology work is based on ISO 690 and ISO 2709),

ISO 2709:2008 Information and documentation – Format for information exchange,

ISO 5127:2001 Information and documentation – Vocabulary,

ISO 8459:2009 Information and documentation – Bibliographic data element directory for use in data exchange and enquiry,

ISO 7220:1996 Information and documentation – Presentation of catalogues of standards (together with ISO 7220:1996/Cor 1:2001),

ISO 832:1994 Information and documentation – Bibliographic description and references – Rules for the abbreviation of bibliographic terms,

ISO 999:1996 Information and documentation – Guidelines for the content, organization and presentation of indexes,

ISO 2384:1977 Documentation – Presentation of translations,

ISO 15836:2009 Information and documentation – The Dublin Core metadata element set (together with ISO 15836:2009/Cor 1:2009),

ISO 23081-1:2006 Information and documentation – Records management processes – Metadata for records – Part 1: Principles,

ISO 23081-2:2009 Information and documentation – Managing metadata for records – Part 2: Conceptual and implementation issues

ISO/TR 23081-3:2011 Information and documentation – Managing metadata for records – Part 3: Self-assessment method,

ISO 16175-1:2010 Information and documentation – Principles and functional requirements for records in electronic office environments – Part 1: Overview and statement of principles,

CELAN D2.1 ANNEX 2_fv1.2

21

ISO 16175-2:2011 Information and documentation – Principles and functional requirements for records in electronic office environments – Part 2: Guidelines and functional requirements for digital records management systems,

ISO 16175-3:2010 Information and documentation – Principles and functional requirements for records in electronic office environments – Part 3: Guidelines and functional requirements for records in business systems.

In addition the following ISO/TC 46 standards may be also of relevance:

ISO 25964-1:2011 Information and documentation – Thesauri and interoperability with other vocabularies – Part 1: Thesauri for information retrieval,

ISO 5127:2001 Information and documentation – Vocabulary.

The ISO/TC 46 standards about the conversion of written scripts are dealt with in part 3.

2.8 Standards related to mobility and accessibility Under a broad perspective of “interoperability" and “localization” (L10N), mobility and accessibility require many same or similar requirements for software and content development. For instance many requirements for mobility are prerequisites in ambient assisted living (AAL) for persons with disabilities (PwD) or ageing persons (some of them to a smaller or larger degree – sometimes even multiple – disabled):

Mobile Web Best Practices 1.0. Basic Guidelines. W3C Recommendation 29 July 2008 (specifies Best Practices for delivering Web content to mobile devices),

Extended Guidelines for Mobile Web Best Practices 1.0 (supplements the Mobile Web Best Practices 1.0 by providing additional evaluations of conformance to Best Practice statements and by providing additional interpretations of these statements).

Many requirements for accessibility are shared or could/should be shared with those for eLearning:

ISO/IEC TS 29140-2:2011 Information technology for learning, education and training – Nomadicity and mobile technologies – Part 2: Learner information model for mobile learning,

Therefore, some of them are dealt with in clause 3.1 while the accessibility related standards – reflecting their increasing societal and political importance – are dealt with in clause 3.4. 2.9 Certification based on standards The general quality management and certification standards apply. Only recently, there have been standards emerging specifically focusing on content aspects as well as language service quality on the one side and expert or personnel competences and skills, on the other side. This topic in its strategic dimension has been integrated in detail into D2.1. Certification of competences and skills in the LI are dealt with in 3.4.

CELAN D2.1 ANNEX 2_fv1.2

22

3 Specific standards pertaining to language technologies, resources, services and LI related competences and skills Besides those standards that are foundational to the ICT infrastructures at large (see part 2), those that are specific for certain fields/aspects of the language industry are gathered in this part 3. They are probably the most pertinent for the CELAN project’s target user groups. It is essential to recognize that some of these are documents beyond the strict definition of a standard, as they are the outcome of past or ongoing “pre-normative research and development” activities (such as in the form of R&D projects) that are directed to fulfill the needs emerging in the LT and LI at large. These standards may be classified into four categories as follows:

Language technologies and language technology tools (LTT),

Language and other content resources (LCR),

Language services and language service providers (LSP),

Language industry related competences and skills.

Companies – and particularly SMEs that are not familiar with the state of the art of LI developments as well as existing standards – can refer to the standards mentioned here in contracts concerning the

Development of LTT,

Development of LCR,

Carrying out of language services by LSP. This may avoid conflict with respect to misunderstandings or due to a lack of familiarity with many details of such a contract and the work to be done; it also will improve the quality of the results of the contract from the outset.

The ISO/TC 46 Information and documentation standards about the conversion of written scripts may apply to any of the above-mentioned categories:

ISO 9:1995 Information and documentation – Transliteration of Cyrillic characters into Latin characters – Slavic and non-Slavic languages,

ISO 233:1984 Documentation – Transliteration of Arabic characters into Latin characters,

ISO 233-2:1993 Information and documentation – Transliteration of Arabic characters into Latin characters – Part 2: Arabic language – Simplified transliteration,

ISO 233-3:1999 Information and documentation – Transliteration of Arabic characters into Latin characters – Part 3: Persian language – Simplified transliteration,

ISO 259:1984 Documentation – Transliteration of Hebrew characters into Latin characters,

ISO 259-2:1994 Documentation – Transliteration of Hebrew characters into Latin characters – Part 2: Simplified transliteration,

ISO 843:1997 Information and documentation – Conversion of Greek characters into Latin characters,

ISO 3602:1989 Documentation – Romanization of Japanese (kana script),

ISO 7098:1991 Information and documentation – Romanization of Chinese,

ISO 9984:1996 Information and documentation – Transliteration of Georgian characters into Latin characters,

ISO 9985:1996 Information and documentation – Transliteration of Armenian characters into Latin characters,

ISO 11940:1998 Information and documentation – Transliteration of Thai,

ISO 11940-2:2007 Information and documentation – Transliteration of Thai characters into Latin characters – Part 2: Simplified transcription of Thai language,

ISO/TR 11941:1996 Information and documentation – Transliteration of Korean script into Latin characters,

ISO 15919:2001 Information and documentation – Transliteration of Devanagari and related Indic scripts into Latin characters.

In translation, localization and other language services the above-mentioned ISO standards are used as such or in a modified form as legally prescribed or conventionally used (e.g. for some Asian languages) at national level.

CELAN D2.1 ANNEX 2_fv1.2

23

3.1 Language technologies (LT) and language technology tools (LTT) Standards falling under the LTT category are mainly referring concerning LI activities and services supported by or geared towards the development of tools/systems:

Guidelines for building multilingual Web sites (EURESCOM Report. P923 Multilingual Web Sites: Best practice, guidelines and architectures. D1 Guidelines for building multilingual Web sites. Sept. 2000);

Darwin Information Typing Architecture (DITA) (OASIS standard which in combination with the DITA Open Toolkit publishing system features single source publishing, inheritance, topic-based authoring and content reuse);

DocBook (Free and open source software [FOSS] see http://docbook.sourceforge.net; DocBook is a semantic markup language. It specifies the meaning of the elements in a document, not how they are intended to be presented to the end user. It provides separation between the content of the document and the visual representation);

OpenDocument 1.0 (OASIS standard Open Document Format for Office Applications [ODF], also known as OpenDocument [OD]: an XML-based file format for spreadsheets, charts, presentations and word processing documents);

OAXAL (Open Architecture for XML Authoring and Localization Reference Model) is a method to exploit technical documentation assets by extending the usefulness of core XML-related standards in a comprehensive standards-based architecture and to allow system builders to create an integrated environment for document creation and localization, with reference in particular to the OSCAR standards of LISA;

Authoring Techniques for XHTML & HTML Internationalization: Specifying the language of content 1.0 (W3C document providing practical techniques related to character sets, encodings and other character-specific matters that HTML content authors can use to ensure that their HTML is easily adaptable for an international audience);

Global Information Management Metrics eXchange (GMX) (a collection of current and proposed standards, primarily targeted at the needs of the translation industry, proposed by LISA);

XML Localization Interchange File Format 1.2 (XLIFF, OASIS standard conceived as an XML-based format created to standardize localization);

UTS #35. LDML – Locales Data Markup Language (an XML format [vocabulary] for the exchange of structured locale data, which is used in the Unicode Common Locale Data Repository [CLDR]);

Cascading Style Sheets Level 2 Revision 1 (CSS 2.1) Specification. W3C Recommendation 07 June 2011 (CSS 2.1 builds on CSS2 which builds on CSS1; also important for accessibility);

Mobile Web Best Practices 1.0. Basic Guidelines. W3C Recommendation 29 July 2008 (specifies Best Practices for delivering Web content to mobile devices).

In addition to the general Web Content Accessibility Guidelines (WCAG) 2.0 having become the

international standard ISO/IEC 40500:2012 Information technology -- W3C Web Content

Accessibility Guidelines (WCAG) 2.0 (which is exactly the same as the original WCAG 2.0

standard from the W3C Web Accessibility Initiative – WAI), there are Working Groups working on

W3C Specifications among others on

Authoring Techniques for XHTML & HTML Internationalization: Characters and Encodings 1.0,

Authoring HTML: Handling Right-to-left Scripts (provides HTML/XHTML authors with best practice for developing internationalized HTML supported by CSS, focusing specifically on advice about character sets, encodings, and other character-specific matters),

Extended Guidelines for Mobile Web Best Practices 1.0 (supplements the Mobile Web Best Practices 1.0 by providing additional evaluations of conformance to Best Practice statements and by providing additional interpretations of these statements).

The OASIS/TC “Translation Web Services” proposes to define a standard that provides an encapsulation of all the information required to support the following value proposition through the

CELAN D2.1 ANNEX 2_fv1.2

24

framework of the Web Services initiative: "Any publisher of content to be translated should be able to automatically connect to and use the services of any translation vendor, over the Internet, without any previous direct communication between the two". Within the framework of LISA (Localization Industry Standards Association) a special interest group (SIG) called OSCAR (Open Standards for Container/Content Allowing Re-use) developed a number of industry standards pertinent to the LI. When LISA went out of operation in March 2011, it designated the European Telecommunications Standards Institute’s Industry Specification Group “Localization Industry Standards” (ETSI ISG LIS) as the successor organization for its standards portfolio, under the condition that ETSI ISG LIS would continue to make them freely available (they are currently available at http://www.gala-global.org/lisa-oscar-standards). Under the umbrella of an agreement between LISA and ISO and within the framework of a liaison of LISA on the one hand and ISO/TC 37 and its SC 3 and SC 4 on the other hand, ISO/TC 37 agreed to incorporate these industry standards into its program of work several years ago. In June 2012 ISO/TC 37 and ETSI ISG LIS agreed that the translation/localization community would benefit from the adoption of industry standards as International Standards and undertook measures to renew the former LISA arrangements and include the following standards in its program of work:

• TMX (Translation Memory eXchange): XML-based format for the standardized exchange of translation memory data

• GMX-V (Global information management Metrics eXchange – Volume): a standard way to count words and characters within a document and a standard XML format to share this data between applications

• XML Text Memory (xml:tm): vendor-neutral open XML standard for embedding text memory directly within an XML document using XML namespace syntax

The following working standards had already been earlier included in its program of work: • TBX (Term Base eXchange), XML-based format for standardized exchange of data from

terminology databases (is already an ISO/TC 37 standard: ISO 30042:2008). • SRX (Segmentation Rules eXchange), a common description for the segmentation of text

for translation and other language-related processes (already a new work item proposal in TC 37: ISO/NP 24621; this standardization effort is taking into account Unicode Standard Annex #29: UAX – Unicode Text Segmentation, which is being updated by the Unicode Localization Interoperability (ULI) technical committee)

The above-mentioned standards will be used as starting material to be revised and further maintained by ISO/TC 37 in accordance with the ISO/IEC Directives, Part 1, ETSI being in A-liaison with TC 37 and its SC 3 as well as SC 4. At the time of their publication as International Standards, ETSI ISG LIS requests to be able to make freely available a version that is technically equivalent to the version published by ISO.

The above-mentioned LISA-standards – also called OSCAR standards – were developed, because the general ICT data modeling and format standards do not sufficiently take LI requirements and needs into account. These standards are a big help for players within the LI, but beyond they compete with a large variety of other standards. As mentioned in clause 2.4, this affluence of formats, schemas and markup languages is constantly causing problems in carrying out language services.

It was found out that L10N-related standards can be classified from different perspectives according to the objectives and purposes stated by different groups interested in their development, implementation, and research or updating. In Appendix 4: Localization (L10N) related standardization the following charts permit to get a general panorama of the Localization-Related Standards Organizations as well as Localization-Related Standards:

Chart 1: Localization-related standards organizations

Chart 2: Localization-related standards

CELAN D2.1 ANNEX 2_fv1.2

25

3.2 Language and other content resources (LCR) According to ELRA (European Language Resource Association) language resources are:

Text corpora,

Speech corpora,

(Lexicographical data and) Terminologies.

In a recent article [Galinski and Reineke 2011], an attempt was made to quantify the volumes of potential lexicographical and terminological entries in an ever increasing number of domains or subjects. The lexis of general purpose language (GPL) in the highly developed languages may comprise up to 500,000 lexemes (including a considerable share of terminology). However, the total number of scientific-technical concepts across all domains or subjects may well comprise 100~150 million. The volumes of existing proper names and other kinds of appellations, some of which may be subject to translation into other languages or conversion into other scripts, are uncountable – there may be several hundred million.

These volumes – more often than not with different language versions – pose a huge challenge to software and content developers as well as LSP. The industry customers – especially SMEs – most probably are totally unaware of the quantitative and qualitative phenomena related to LCR.

Given the fact that content resources

Are of many different kinds,

Are not confined to language resources,

May comprise or even consist of non-linguistic content (logos, formulas, icons, audiovisual content, etc.),

it is more appropriate to broadly subdivide them first into LCR of unstructured content and LCR of structured content. 3.2.1 Unstructured LCR The most well-known kinds of unstructured content are:

Running text (in all kinds of literature, scientific-technical texts, in the print media, etc.),

Speech corpora,

Music of all sorts,

Film, video, audiovisual and multimedia content, etc. Some of these are tagged or marked-up for further processing, such as text corpora. ELRA offers many corpora on a semi-commercial basis. Items of structured content are the building elements of unstructured content. Therefore, the focus of this investigation is on standards concerning structured content and the methodology concerning structured content. Under a LI standardization perspective there is not much to say in connection with unstructured content. Concerning texts the Text Encoding Initiative (TEI) Consortium, which emerged out of series of large-scale international projects, is a non-profit membership organization composed of academic institutions, research projects, and individual scholars from around the world. It collectively develops and maintains a standard for the representation of texts in digital form, namely the TEI Guidelines for Electronic Text Encoding and Interchange. Since 1994, the TEI Guidelines have been widely used by libraries, museums, publishers, and individual scholars to present texts for online research, teaching, and preservation. They focus (though not exclusively) on the encoding of documents in the humanities and social sciences, and in particular on the representation of primary source materials for research and analysis. The TEI Guidelines define and document a markup language for representing the structural, renditional, and conceptual features of texts by specifying encoding methods for machine-readable texts. They are expressed as a modular, extensible XML schema, accompanied by detailed documentation, and are published under an open-source license. See: http://www.tei-c.org/Guidelines/ TEI had a big impact on the standardization of various aspects of structured content – among others in the form of the establishment of ISO/TC 37/SC 5 Language resources management. The TEI Guidelines certainly helped to make LI approaches like translation memory (TM), parallel texts, bitexts, automatic translation (AT) and various forms of semi-automatic translation, etc. more

CELAN D2.1 ANNEX 2_fv1.2

26

efficient and to develop many services in the LI. A standard or guideline on cross-language word-count, which is looked for by big LSP, is still an unresolved issue.

There is another field needing standardization efforts, which is called simplified natural language (in technical documentation) or more broadly controlled natural language (CNL). In public administration there are similar efforts called plain language, where the Plain Language Movement in the US and other countries is an effort to eliminate unnecessarily complex language from academia, government, law, and business. In this connection many governments and other large organizations (incl. enterprises) have developed specific style guides. As there must be common principles and methods for the use of simplified natural language, controlled natural language, plain language and the like independent of language on the one hand and language-specific on the other hand, ISO/TC 37/SC 4 a couple of years ago embarked on a new working item proposal on ISO/AWI Language resource management – Simplified natural languages. On the basis of such a standard authoring tools could be developed/adapted.

3.2.2 Structured LCR Items of structured content (at the level of lexical semantics) may comprise linguistic and non-linguistic representations of concepts, which can be designative (such as designations in terminology: comprising terms, symbols and appellations) or descriptive (such as various kinds of definitions or non-verbal representations), or hybrid. Standards or guidelines in this category may refer to the methods of language resource management or may contain standardized (or otherwise regulated) content items (such as standardized vocabularies). So far non-verbal designations and representations of concepts as well as appellations (i.e. proper names representing individual concepts) have been underrepresented in terminology theory and methodology. But they are very important in standardization itself, scientific-technical writing, technical documentation, translation and documentation – not to mention eLearning. Besides, linguistic content may be combined with or imbedded in non-linguistic content – and vice versa.

In the light of the above, it is understandable that organizations dealing with content and particularly language service providers (LSP) increasingly need to handle (or manage) non-linguistic types of content, too. Therefore, they also need language technology tools (LTT) which can cope with non-linguistic content. Their customers – especially SMEs – should be made aware of potential problems arising from these facts.

The situation outlined here calls for extended – and as much as possible standardized – methods for content integration and content interoperability (including not only the requirements for the re-use, but also for the re-purposability of content), and consequently also for the ability of LTT to cope with the new requirements of integration and interoperability. Needless to say that International Standards for content interoperability are the prerequisite for:

Avoiding a huge duplication of efforts,

Developing methods (incl. certification) and devices to assure content quality,

Introducing content interoperability into educational and training schemes,

Enabling many eApplications to re-use and re-purpose existing structured content extensively.

In this connection, standards concerning data quality, data administration, content management and workflows are increasingly becoming imperative. There are several technical committees at international level dealing with various – more or less generic – aspects of content interoperability, for instance:

ISO/TC 37 Terminology and other language and content resources,

ISO/IEC-JTC 1/SC 32 Data management and interchange (especially its WG 2 MetaData),

ISO/IEC-JTC 1/SC 36 Information technology for learning, education and training,

ISO/TC 184 Automation systems and integration (especially its SC 4 Industrial data),

ISO/TC 46 Information and documentation.

CELAN D2.1 ANNEX 2_fv1.2

27

The standards developed by these TC do not (yet) take into account the specific requirements of eLearning and eAccessibility for persons with disabilities (PwD). Under a quality management perspective ISO 8000 (series) Data quality is a starting point for developing standardized methods and principles with respect to data and information quality as well as content integration and interoperability. For the purpose of this document the LCR-related standards are differentiated into

Standards concerning LCR methodology,

LCR containing standardized content. 3.2.2.1 Standards concerning LCR methodology Basic methodology standards:

ISO 704:2009 Terminology work – Principles and methods (again under revision),

ISO 860:2007 Terminology work – Harmonization of concepts and terms,

ISO 1951:2007 Presentation/representation of entries in dictionaries – Requirements, recommendations and information,

10241-1:2011 Terminological entries in standards – Part 1: General requirements and examples of presentation,

ISO/FDIS 10241-2:2012 Terminological entries in standards – Part 2: Adoption of standardized terminological entries,

ISO 12199:2000 Alphabetical ordering of multilingual terminological and lexicographical data represented in the Latin alphabet,

ISO 12615:2004 Bibliographic references and source identifiers for terminology work (based on ISO 690 and ISO 2709),

ISO 15188:2001 Project management guidelines for terminology standardization,

ISO 23185:2009 Assessment and benchmarking of terminological resources -- General concepts, principles and requirements,

ISO 25964-1:2011 Information and documentation – Thesauri and interoperability with other vocabularies – Part 1: Thesauri for information retrieval,

ISO 21829 Terminology for language resources (under development),

ISO Data Category Registry (DCR, containing the ISO/TC 37 data categories, ISOcat). ISO/TC 37/SC 4 standards for syntactic and semantic annotation:

ISO 24617 (multipart) Language resource management – Semantic annotation framework (SemAF),

ISO 24610 (multipart) Language resource management – Feature structures,

ISO/DIS 24611 Language resource management – Morpho-syntactic annotation framework,

ISO 24613:2008 Language Resource Management – Lexical Markup Framework (LMF)

ISO/FDIS 24612 Linguistic annotation framework (LAF),

ISO 24615:2010 Language resource management – Syntactic annotation framework (SynAF),

TEI Guidelines (Guidelines of the Text Encoding Initiative, which collectively define an XML format, are the defining output of the community of practice. The format differs from other well-known open formats for text (such as HTML and OpenDocument) in that it's primarily semantic rather than presentational).

3.2.2.2 Standards containing standardized content

There is a whole range of LCR containing different kinds of standardized content. Besides, the volumes of standardized structured content resources are constantly growing in terms of numbers of databases as well as of the numbers of the content items contained. With a few exceptions, collections of LCR comprising standardized content are not easily accessible, and – if they are accessible – not always in a user-friendly form. On the other hand such collections can, sometimes even must be used as default in many activities, such as in certain eBusiness activities. Therefore, such LCR are often referred to in contracts of LSP with industry clients.

CELAN D2.1 ANNEX 2_fv1.2

28

3.2.2.2.1 Items of standardized structured content at a meta-level There are items of standardized structured content at a meta-level, which

Are useful or even necessary to understand standardizing activities including the standard-ization of structured content;

Are necessary to manage all kinds of structured content, such as metadata (also called data categories in ISO/TC 37, data dictionaries in ISO/TC 184/SC 4 etc.);

Provide attributes for other items of structured content, such as coding systems (e.g. country codes, language codes, script codes etc.).

(1) Collections of standardized structured content useful or even necessary to understand standardizing activities including the standardization of structured content, such as:

ISO/IEC Guide 2:2005 (which formally speaking is also a collection of standardized termin-ology; accessible through the ISO Online Browsing Platform http://www.iso.org/obp/ui/),

Clause 3 “Terms and definitions” of ISO/IEC Guide 21-1:2005,

Clause 3 “Terms and definitions” of ISO/IEC Guide 21-2:2005,

Clause 3 “Terms and definitions” of ISO 10241-1:2011 Terminological entries in standards – Part 1: General requirements and examples of presentation,

Clause 2 “Terms and definitions” of ISO 10241-2:2012 Terminological entries in standards – Part 2: Adoption of standardized terminological entries,

ISO/IEC 2382 (multipart) Information processing systems – Vocabulary,

ISO 1087:2000 Terminology work – Vocabulary – Part 1: Theory and application,

ISO 21829 Terminology for language resources (under development),

ISO 5127:2001 Information and documentation – Vocabulary,

ISO 17724:2003 Graphical symbols – Vocabulary,

Terms of quantities and units defined or explained in ISO 80000 (multipart) Quantities and units (some parts are IEC publications),

This kind of standardized or quasi-standardized structured content can also be found in many LCR-related standardization activities, such as in

IUPAC, IUPAB, IUPAP and other nomenclature rules for naming terms or species

Classification rules for classifications in eBusiness and other eApplications

OECD Glossary of Statistical Terms of the OECD 2002 (by the Organization for Economic Cooperation and Development)

Glossary of terms for the standardization of geographical names 2002 of the United Nations Conferences on the Standardization of Geographical Names (UNCSGN) and its UN Group of Experts on Geographical Names (UNGEGN) (ST/ESA/Stat/SER.M/85)

TERMPOST, the UPU’s (Universal Postal Union) official terminology database containing a selection of postal terms and expressions drawn from the UPU Acts and publications as well as everyday vocabulary used within the postal sector

(2) Collections of standardized structured content called metadata (defined by ISO/IEC 11179-1 as data that defines and describes other data), data categories (or data element type defined by ISO 1087-2 as a result of the specification of a given data field), data elements (defined by ISO/IEC 11179-1 as unit of data for which the definition, identification, representation and permissible values are specified by means of a set of attributes), data element concepts (defined by ISO/IEC 11179-1 as concept that can be represented in the form of a data element, described independently of any particular representation), data dictionaries (or IRD, information resource dictionary, defined by ISO/IEC 2382-19:1999 as database that contains metadata) and the like necessary to manage all kinds of structured content, such as:

ISO 11179 (multipart) Information technology – Metadata registries (MDR) (ISO/IEC-JTC 1/SC 32/WG 2 MetaData),

ISO Data Category Registry (ISO/DCR) [containing the data categories (ISOcat) of ISO/TC 37],

ISO 9735 (multipart) Electronic data interchange for administration, commerce and transport (EDIFACT) (ISO/TC 154 Processes, data elements and documents in commerce, industry and administration),

CELAN D2.1 ANNEX 2_fv1.2

29

ISO 13584 (multipart) Industrial automation systems and integration – Parts library (PLIB) (ISO/TC 184/SC 4 Industrial data),

ISO 15836:2009 Information and documentation – The Dublin Core metadata element set,

ISO/IEC 19788:2012 (multipart) Information technology – Learning, education and training – Metadata for learning resources (ISO/IEC-JTC 1/SC 36 Learning technologies),

ISO/IEC/WD TR 20007: Information technology – Cultural and linguistic interoperability – Definitions and relationship between symbols, icons, animated icons, pictograms, characters and glyphs,

The repositories of this category, also called asset description metadata, have become so many – not to mention their importance – that within the framework of the European Commission’s ISA programme an Asset Description Metadata Schema (ADMS) has been drafted. ADMS is meant to be a common way to describe semantic interoperability assets including metadata schemas, controlled vocabularies and code lists. ADMS is intended to become a key element of the upcoming federation of semantic asset repositories in Europe among others for the sake of the citizen’s right to have free access to public information. It is considered that better documenting semantic assets can help to improve the interoperability of eGovernment initiatives across Europe, and possibly beyond. (3) Collections of standardized structured content that attribute values to other items of structured content, such as

Certain coding systems (e.g. country codes acc. to ISO 3166, language codes acc. to ISO 639, script codes acc. to 15924:2004 etc.);

Many other coding systems some of which may indirectly govern LI activities and services: See: http://www.iso.org/iso/standards_development/maintenance_agencies.htm;

ISO 80000 (multipart) Quantities and units (some parts are IEC publications) – they make numerical values, formulas etc. meaningful.

3.2.2.2.2 LCR of standardized structured content per se Large-scale LCR of standardized structured content per se can be found among others in:

Electropedia: The World's Online Electrotechnical Vocabulary (or “IEV Online” version of the International Electrotechnical Vocabulary) http://www.electropedia.org/

ISO Online Browsing Platform http://www.iso.org/obp/ui/ (which replaced the previous ISO Concept DataBase, ISO/CDB) containing also non-linguistic structured content such as graphical symbols;

DIN-Term (DIN, terminology database of standardized terminology);

DINsml (DIN, database of standardized quantities and units). 3.2.2.2.3 LCR of non-standardized structured content Non-standardized mono- and multilingual repositories of all kinds of structured content abound on the Internet. The methodology standards mentioned so far – including the metadata standards – would apply to them too, but are not respected or respected only to a certain extent by the repository owners. Therefore, the average level of data quality is not as high as it would be desirable. The reasons for this may be the lack of successful business models for such repositories and certain copyright issues concerning the “ownership” of data elaborated through cooperative processes on web platforms created for that purpose. Most repositories also suffer from a lack of sustainability. An international standard or standardized guidelines addressing the numerous cooperational, economical and legal aspects to be considered when developing a web-based cooperative platform for developing structured content could be useful. 3.3 Quality of language services and language service providers (LSP) When in conjunction with quality and safety management discussions concerning industrial production and services in the 1990s the Directive 98/37/EC of the European Parliament and of the Council of 22 June 1998 on the approximation of the laws of the Member States relating to machinery was published, only few people recognized that it will have a huge impact on technical documentation and the related localization and translation services. The Directive referred to

CELAN D2.1 ANNEX 2_fv1.2

30

technical documentation and user manuals as part of the product which makes an enterprise potentially liable for faults in the documentation (in original language and all localized/translated versions). The revised Machinery Directive 2006/42/EC does not introduce any radical changes compared with the old Machinery Directive 98/37/EC, but aims at consolidating the achievements of the Machinery Directive in terms of free circulation and safety of machinery while improving its application. It was published on 9th June 2006 and the Member States had until 29th June 2008 to adopt and publish the national laws and regulations transposing the provisions of the new Directive into national law. The provisions of the new Directive became applicable on 29th December 2009. Under 1.1.1 Definitions in Annex 1 the Directive states:

“… (h) ‘intended use’ means the use of machinery in accordance with the information provided in the instructions for use; (i) ‘reasonably foreseeable misuse’ means the use of machinery in a way not intended in the instructions for use, but which may result from readily predictable human behaviour.”

And continues under clause 1.7. Information: “1.7.1. Information and warnings on the machinery Information and warnings on the machinery should preferably be provided in the form of readily understandable symbols or pictograms. Any written or verbal information and warnings must be expressed in an official Community language or languages, which may be determined in accordance with the Treaty by the Member State in which the machinery is placed on the market and/or put into service and may be accompanied, on request, by versions in any other official Community language or languages understood by the operators. 1.7.1.1. Information and information devices The information needed to control machinery must be provided in a form that is unambiguous and easily understood. It must not be excessive to the extent of overloading the operator. Visual display units or any other interactive means of communication between the operator and the machine must be easily understood and easy to use. 1.7.1.2. Warning devices Where the health and safety of persons may be endangered by a fault in the operation of unsupervised machinery, the machinery must be equipped in such a way as to give an appropriate acoustic or light signal as a warning. Where machinery is equipped with warning devices these must be unambiguous and easily perceived. The operator must have facilities to check the operation of such warning devices at all times. The requirements of the specific Community Directives concerning colours and safety signals must be complied with.”

These legal provisions – similar to legislation in the US and other countries – triggered the first standards on technical documentation, localization and translation under a quality management perspective. Today standards concerning the quality of LI products and services are not only accepted, but increasingly also demanded by customers and LSP. A few standards already exist, some are under development, and some more are needed:

EN 15038:2006 Translation services – Service quality;

SAE J2450:2005 Translation Quality Metric (SAE Standard, applicable to translations of automotive service information into any target language. The metric may be applied regardless of the source language or the method of translation and can be expanded to accommodate style and other requirements of particular new media);

LISA Quality Assurance Model (LISA – the LISA QA Model and SAE J2450 standard are designed to help manage the quality assurance process for all components of a localization project);

ASTM F2089-01:2007 Standard Guide for Language Interpretation Services;

ISO 2603:1998 Booths for simultaneous interpretation – General characteristics and equipment.

Focused on software localization the following standards can be mentioned here:

ISO/IEC 11581-10: 2010 Information technology – User interface icons – Framework and general guidance,

CELAN D2.1 ANNEX 2_fv1.2

31

ISO/IEC TR 19764: 2005 Guidelines, methodology, and reference criteria for cultural and linguistic adaptability in information technology products,

ISO/IEC TR 24785: 2009 Taxonomy of cultural and linguistic adaptability user requirements,

ISO/IEC/WD TR 20007: Information technology – Cultural and linguistic interoperability – Definitions and relationship between symbols, icons, animated icons, pictograms, characters and glyphs,

ISO/IEC/WD 30112: Information technology – Specification methods for cultural conventions.

The following are the emerging standards of ISO/TC 37/SC 5 that further specify the quality requirements of translation (and localization) and interpretation services:

ISO/CD 13611 Interpreting — Guidelines for community interpreting,

ISO/WD 14080 Assessment of translations (under discussion),

ISO/WD 17100 Translation services – Requirements for translation services,

ISO/DTS 11669 Translation services – Guidance for translation projects. The latter two are intended to replace EN 15038:2006. On the basis of EN 15038:2006 there are certification bodies offering a formal certification, such as the Language Industry Certification system (LICS®). 3.4 LI related competences and skills “eCertification” (or ICT certification) refers to – mostly certification based on provider-specific training – certification activities that have started already in the 1980s. Today there are deficits in the respective training and certification schemes with respect aspects related to LI and assistive technologies. On the other hand LI-related training and certification more and more includes eCertification aspects. 3.4.1 ICT-focused eCertification eCertification can be considered as the set of processes by which an individual gains a credential in a particular ICT skill or more generally a range of skills. CWA 16052: 2009 “ICT Certification in Europe” refers to the following definitions of eCertification: (1) “Certification often means the awarding of a certificate, or other testimonial, that formally recognizes and records success in the assessment of Knowledge, Skills and/or Competencies, as the final step in the completion of a Qualification. However, it is also used, in particular in relation to ICT Practitioner occupations, to mean the Qualification as a whole. It is important to be aware of these two (“narrow” and “broad”) meanings of Certification.” [Dixon and Beier in CWA 15515] (2) “Certification is the process of formally validating knowledge, know-how and/or skills and competencies acquired by an individual, following a standard assessment procedure. Certificates or diplomas are issued by accredited awarding bodies.” [Tissot: 2004cit. in CWA 16052] (3) “In general, ICT professional certifications are seen as a credential – the result of an objective assessment procedure run by an approved third party, in which an individual meets the performance specifications delineated in job profiles which are recognised by industry stakeholders.” [CEPIS: 2007; Cedefop: 2006]. The main standards documents governing eCertification in general at international and European level are first of all:

ISO/IEC 17024:2003 Conformity Assessment – General requirements for bodies operating certification of persons,

ISO/IEC 24773:2008 Certification of software engineering professionals – Comparison framework,

CWA 16052:2009 ICT Certification in Europe,

CWA 15515:2006 European ICT Skills Meta-Framework – State-of-the-art review, clarification of the realities, and recommendations for next steps.

Unfortunately, these standards seem to have relatively little influence on the proliferation of qualification and certification schemes on the market that in most cases are not standards-based.

CELAN D2.1 ANNEX 2_fv1.2

32

While it can be recognized that certification provides value in both the labour and product segments of the ICT market, the HARMONISE report [CEPIS 2007] describes over 600 often overlapping qualifications from over 60 providers as a "certification jungle", causing confusion to prospective users. The rapid growth in these industry qualifications has been driven by the market over recent years, indeed this market barely existed 15 years ago. They usually relate to a more specific set of skills, including for specific products, and are generally more practical in their approach than traditional academic qualifications. As these market certifications contrast and co-exist with the historic university based education system, leading to the phrase "parallel universe", there remains resistance, even hostility, e.g. in academic quarters in some countries, to these certifications. They are seen as developing skills not education, and product ability not underlying theory, little more than marketing aids to the commercial interests of the vendors. On the other hand, their global application contrasts with the national or even self-accreditation of most university degrees. [CWA 16052:2009] The above refers both to the certification of software experts with respect to the needs of the LI as well as to needs stemming from eAccessibility/eInclusion requirements. Participants at the ICCHP 2010 Conference confirmed that existing training and formal studies are not sufficient – even if certified under given certification/attestation systems – with respect to the skills and qualifications necessary for becoming familiar with the issues involved in global content interoperability and particularly in eAccessibility&eInclusion. Therefore, the “Recommendation on software and content development principles 2010” (see Appendix 2) was formulated in a special workshop at ICCHP 2010 and thereafter endorsed by several technical committees in standardization as well as in 2012 by the MoU/MG. 3.4.2 LI-related skills/competences and eCertification In recent years the certification of skills/competences of LI experts or personnel has been gaining importance including “eCertification” (or ICT certification) aspects necessary in the LI in general and in particular for LSP. For a couple of years a number of key players have had joint forces to better the situation on the certification market for LI experts: (1) The Language Industry Certification System (LICS®), a joint venture between AS+Certification, a subsidiary of the Austrian Standards Institute, and the International Network for Terminology (TermNet) organizes independent, third-party certification services for the language industry. LICS® is also collaborating with the European Certification and Qualification Association (ECQA). (2) The European Certification and Qualification Association (ECQA) is a non-for-profit association, joining institutions and several thousands of professionals from all over Europe and abroad.

The ECQA provides a world-wide unified certification schema for numerous professions. The same exam pool, exam rules and the same electronic exam system are used for certification exams in any participating country;

The ECQA joins experts from the market and supports the definition and development of the knowledge (skill cards) required for professions. Experts, joined in "Job Role Committees", are frequently initiating new professions and updating the existing professions as demand on the market requires;

The ECQA defines and verifies quality criteria for training organizations and Trainers to assure the same level of trainings all over the world. The certification procedure offers modularity of certification. Therefore, modularity of training all over the world is assured. Only verified and approved organizations and individuals may become ECQA certified service providers;

The ECQA centrally promotes all certified professionals. Databases of certified professionals are publicly available to help organizations on the market in seeking organizations and individuals for cooperation.

With respect to the LI, the following certified ECQA job profiles can be referred to:

ECQA Certified Terminology Manager – Basic; see: http://www.ecqa.org/index.php?id=52

ECQA Certified Information and Communication Engineer; see: http://www.ecqa.org/index.php?id=41

ECQA Certified Diversity Manager (in development)

ECQA Certified E-Learning Manager; see: http://www.ecqa.org/index.php?id=49

CELAN D2.1 ANNEX 2_fv1.2

33

ECQA Certified Integrated Design Engineer; see: http://www.ecqa.org/index.php?id=47 Some of the above should be extended to further aspects of the LI under the perspective of CELAN. (3) Besides, the Globalization and Localization Association (GALA) and equivalent associations at national or regional level (such as the German Professional Association for Technical Communication and Information-Development, tekom, for the German speaking communities) develop more and more sophisticated training and certification schemes for various skills and competences in the fields of technical documentation /technical communication, which – if more than one language is required – inevitably overlaps with localization and translation.

Given the fact that many of the aspects dealt with in the preceding chapters and in the following chapter are not sufficiently taught during the education of computer scientists and software engineers at higher-education institutions (HEI), extra-HEI training schemes as well as properly accredited certification of the thus acquired skills/competences are definitely a need.

3.5 Standards and guidelines concerning language policies/strategies Based on UNESCO’s Guidelines for terminology policies – Formulating and implementing terminology policy in language communities (2005) prepared by Infoterm, commissioned by UNESCO an international standard was developed in the framework of ISO/TC 37, namely ISO 29383:2010 Terminology policies – Development and implementation.

ISO 29383:2010 Terminology policies – Development and implementation covers an important aspect of language policy, but it will need additional authoritative documents of this kind to cover language policies and strategies comprehensively. They may be geared towards language communities as whole or individual organizations.

CELAN D2.1 ANNEX 2_fv1.2

34

4 Latest developments There are a few far-reaching new tendencies in standardization and certification, such as:

The general trend towards more standards for content and services;

The increasing high priority for standards developed for or having an impact on persons with disabilities (PwD).

Many technical committees have started to standardized not only the terminology of their scope, but also additional content and/or the data models and other methodological requirements of this content as well as services to be rendered on the basis of this content. In addition there are new technical committees emerging focusing on service areas, such as maintenance, transport, travel, cleaning, healthcare etc. Here the measurement of the quality of service is one of the major issues, which is closely connected to communication and documentation. The terminology of travel agencies and tour operators (CEN/TC 329 Tourism services) is important for all sorts of contracts, as are guidelines for the preparation of contracts in the maintenance field. Services, too, are increasingly rendered cross-border, and the EU has decided to create a single market for services based on the free movement of services and the freedom of establishment for service providers, while maintaining a high quality of services. (EU Directive 2006/123/EC) Standardization has an essential role to play in facilitating and strengthening the single market for services. Therefore, the language aspects cannot been overlooked in these standardization activities, which can trigger a high quantity of language services to be performed (using language technology in order to keep it manageable and affordable).

The input received on a large-scale public enquiry into the need for service standards across the EU by CEN in September 2003, showed that both further development for current work in service standardization, as well as development of new areas are needed and feasible. As a result there have been already developed more than 85 standards documents in different CEN and CENELEC committees relating to services.

The importance of the assistive technologies (for assisting people with special needs, viz. persons with disabilities – PwD) is rising on the radar of politics – as a growing societal, ethical and economic necessity. The LI and assistive technologies, and in particular alternative and augmentative communication (AAC), have much in common or at least could profit very much from each other. To some extent they share the same ailments:

Not being acknowledged within mainstream computer science and ICT;

Lacking job opportunities although the potential is there – especially in combination with the respective industries;

Totally undervalued in the education of computer scientists and programmers. However, so far the LI and AAC communities have very little contacts and hardly know of each other, which is also reflected by the situation in standardization. The demand for intuitively understandable user interfaces and product features is increasing. Global markets composed of population segments from different countries, regions, cultures and races make it a necessity to consider varying capabilities and different habits of users in the design of products. The application of principles derived from “Accessible Design”, “Universal Design”, “Design for All” and “Design for Society” is becoming increasingly mainstream. The usability of products, services and environments as perceived and experienced by end-users is a key driver in product development and not technical know-how alone.

CELAN D2.1 ANNEX 2_fv1.2

35

FIG. 1: Elderly (aged 60 and over), as a percentage of the population in 2010 and 2040 Source: United Nations (2011), World Population Prospects, 2010 Revision, quoted from: Richard Jackson, How demography is reshaping the economic and social landscape in the 21st century, in: The Geneva Association (2012), The Geneva Reports, No. 6, 2012, p. 19 The “United Nations Convention on the Rights of Persons with Disabilities” (UNCRPD), which was adopted in December 2006, is the basic international framework addressing the rights of persons with disabilities. It has been signed in June 2012 by 153 countries and ratified into national law by 115. The UNCRPD addresses the rights of persons with disabilities in general and contains, as article 9, a section about accessibility. The following report provides many useful facts and figures:

World Health Organization and World Bank (Eds.). (2011). World report on disability 2011. (ePUB).

Reportedly, in Japan, approximately 27 m people are aged over 65 years which equate 21.5% of the total population. This is expected to increase to 33.7% in 2035. In Japan 3.52 m people have physical disabilities that corresponds to 3.0% of the population. Other reports include the data and information for the EU population where 33 m people aged over 50 years report disabilities. This is projected to increase to 46 m in 2050. The US reported, in 2007, a total population of 257.2 m people of whom 54.0 m people, which equals 20%, are over 50 years of age. Those over 60 years of age number 55.0 m (21%). (See: IEC/TR 62678 Audio, video and multimedia systems and equipment activities and considerations related to accessibility and usability)

As a matter of fact, the percentage of persons with disabilities (PwD) in our aging societies might well double especially in some countries with large populations, over the next few decades – with a growing share of multiple impairments. Assistive technologies and AAC will play an indispensable role in this connection, as recognized by national governments and the EU Commission. This includes first of all the communication aspects among people, between people and their devices and among the devices.

CELAN D2.1 ANNEX 2_fv1.2

36

In view of these figures there are European and national programmes or even legislation (based on UNCRPD), such as:

M376 (2005). Standardization Mandate to CEN, CENELC and ETSI in support of European Accessibility requirements for public procurement of products and services in the ICT domain (http://www.mandate376.eu/);

M420 (2007). Standardization mandate to CEN, CENELEC and ETSI in support of European Accessibility requirements for public procurement in the built environment;

M473 (2010). Standardization mandate to CEN, CENELEC and ETSI to include “Design for All” in relevant standardization initiatives;

MeAc (Ed.) (2007). MeAC – Measuring Progress of eAccessibility in Europe. Assessment of the Status of eAccessibility in Europe. Bonn Centers for Disease Control and Prevention (CDC). (February 14, 2003). Morbidity and Mortality Weekly Report (MMWR). 52(06). (eJournal) Retrieved April 2011 from http://www.docstoc.com/docs/11959514/MMWR-February-14-2003-(PDF);

ICTSB (Ed.). (2000). Design for All. ICTSB Project Team Final Report. Retrieved April 2011 from http://www.ictsb.org/Activities/Design_for_All/Documents/ICTSB%20Main%20Report%20.pdf

As government expenditures in Western countries amount to between 25 and 40% of GDP many government procurement rules in some cases require that products are consistent with accessibility requirements that results in a significant market resulting from government actions. In addition to the above-mentioned EU Mandates, which will find their way into legislation and standardization rather sooner than later, there is the US Section 508 on IT accessibility (see: http://www.section508.gov/) and the EU Mandate M376 European Accessibility Requirements for Public Procurement of Products and Services in the ICT Domain which both define accessibility as one of the requirements for public procurement of IT products. In view of this situation, the Recommendation on Software and Content Development 2010 (Appendix 2) has been adopted at the 12th International Conference on Computers Helping People with Special Needs (ICCHP 2010) and endorsed by a number of technical committees as well as by the MoU/MG. The core text of the Recommendation reads:

“This recommendation addresses decision makers in public as well as private frameworks, software developers, the content industry and developers of pertinent standards. Its purpose is to make aware that multilinguality, multimodality, eInclusion and eAccessibility need to be considered from the outset in software and content development, in order to avoid the need for additional or remedial engineering or redesign at the time of adaptation, which tend to be very costly and often prove to be impossible. …

Software should be developed and data models for content prepared in compliance with the above-mentioned requirements to facilitate the adaptation to different languages and cultures (localization) or new applications (re-purposing), the personalization for different individual preferences or needs, including those of persons with disabilities. These requirements should also be referenced in all pertinent standards.”

Increasingly, it is necessary to combine AAC with language and other content resources or embed them in text. “Total Conversation” (http://hub.eaccessplus.eu/wiki/Total_Conversation) is a comparatively new conception showing the definite need to combine linguistic and non-linguistic items of structured content. User interface design geared towards PwD also proved useful for other purposes – especially in eLearning. The following document lists many standards at international, European and national level:

ISO/IEC TR 29138-2:2009 Information technology – Accessibility considerations for people with disabilities – Part 2: Standards inventory

Besides, accessibility aspects are scattered over a large number of all kinds of standards – just to mention a few:

ISO/IEC Guide 71:2001 Guidelines for standards developers to address the needs of older persons and persons with disabilities;

CELAN D2.1 ANNEX 2_fv1.2

37

ISO 9241-171:2008 Ergonomics of human-system interaction – Part 171: Guidance on software accessibility (provides ergonomics guidance and specifications for the design of accessible software for use at work, in the home, in education and in public places. It covers issues associated with designing accessible software for people with the widest range of physical, sensory and cognitive abilities, including those who are temporarily disabled, and the elderly. It addresses software considerations for accessibility that complement general design for usability as addressed by ISO 9241-110, ISO 9241-11 to ISO 9241-17, ISO 14915 and ISO 13407);

ISO/IEC TR 29138-1:2009 Information technology – Accessibility considerations for people with disabilities – Part 1: User needs summary (identifies a collection of user needs of people with disabilities for standards developers to take into consideration when developing or revising their standards. These user needs are also useful for developers of information technology products and services and for accessibility advocates to consider.);

IEC/TR 62678:2010 Audio, video and multimedia systems and equipment activities and considerations related to accessibility and usability;

CWA 15554:2006 Specifications for a Web Accessibility Conformity Assessment Scheme and a Web Accessibility Quality Mark. Brussels: CEN;

ETSI TR 102612:2009 Human Factors (HF) – European accessibility requirements for public procurement of product and services in the ICT domain. Sophia Antipolis: ETSI.

But it still needs a few generic/fundamental standards for content interoperability and inter-human communication combining LI as well as assistive technologies – in particular AAC – requirements.

On the other hand, it lacks references to assistive technologies and LI standards especially in crucial software development standards with a view to software quality and software project management. ISO/IEC 90003:2004 Software engineering – Guidelines for the application of ISO 9001:2000 to computer software (in combination with ISO/IEC/TR 90005:2008 Systems engineering – Guidelines for the application of ISO 9001 to system life cycle processes) identifies the issues which should be addressed and is independent of the technology, life cycle models, development processes, sequence of activities and organizational structure used by an organization. Additional guidance and frequent references to the ISO/IEC JTC 1/SC 7 software engineering standards are provided to assist in the application of ISO 9001:2000: in particular

ISO/IEC 12207:2008 Systems and software engineering – Software life cycle processes,

ISO/IEC TR 9126 (multipart) Software engineering – Product quality,

ISO/IEC 14598 (multipart) Information technology – Software product evaluation,

ISO/IEC 15939:2007 Systems and software engineering – Measurement process,

ISO/IEC TR 15504 (multipart) Information technology – Process assessment, and possibly also:

ISO/IEC 25000 (series) Software Engineering – Software product Quality Requirements and Evaluation (SQuaRE) (containing an explanation of the transition process between the old ISO/IEC 9126 and 14598 and SQuaRE, and also presents information on how to use ISO/IEC 9126 and 14598 in their previous form)

ISO/IEC 15288:2008 Systems and software engineering – System life cycle processes,

ISO/IEC TR 19759:2005 Software Engineering – Guide to the Software Engineering Body of Knowledge (SWEBOK),

ISO/IEC 26514:2008 Systems and software engineering – Requirements for designers and developers of user documentation,

ISO/IEC 14143 (multipart) Information technology – Software measurement – Functional size measurement.

In standards about content interoperability and content quality, such as

ISO 8000 (multipart) Data quality the requirements of assistive technologies are not mentioned at all, while those of the LI are barely mentioned.

CELAN D2.1 ANNEX 2_fv1.2

38

After standards mentioned above have been developed or extended, many other standards related to content integration and interoperability would better complement each other. Compliance with Recommendation 2010 would considerably facilitate maintenance and minimize additional programming as well as positively influence the lifecycle of software in general. On all accounts, the Web Content Accessibility Guidelines (WCAG) and other accessibility standards should be considered from the outset in software and content development. In order to avoid the need for additional or remedial engineering or redesign at the time of adaptation, the points addressed by this recommendation should be represented especially in standards on management and project organization of content development and systems/software engineering under quality, sustainability and life cycle perspectives.

The European Disability Strategy 2010-2020 (see clause 2.1.1) addresses the strengthening of accessibility as one of its objectives. Possible means to support accessibility are legislative instruments as well as standardization and public procurement programs that include requirements on accessibility. At the IEC/ISO/ITU workshop “Accessibility and the contribution of International Standards” organized in Geneva, on 3 to 5 November 2010, by the World Standards Cooperation (WSC), no 6 of the Recommendations with high priority adopted on 5 November 2010, namely

“6. If the content of documents proposed in New Work Item Proposals is related to accessibility, then this should be identified (e.g. through the introduction of a system of check boxes). Agendas of meetings of standards committees should include a standing item in which accessibility should be addressed.”

points in the same direction.

From the above it is evident that there is a global trend and a growing demand for accessible products, services and environments. Standards can play an important role in support of the implementation of the policy trends outlined above (regarding procurement, infrastructure design, buildings, transport chains, design of everyday products, etc.). If standards are developed with a consideration of accessibility principles, they can be useful tools in support of these objectives. They can contribute to the design of better products and the saving of financial resources by integrating accessibility features into new products from the start and avoid the need of having to retrofit products in later stages of their life cycle at significantly higher costs. ISO members can perform an active function in relation to their national governments in highlighting the relevance of standards for the implementation of the objectives of the UNCRPD. This may be of particular importance if a national government has ratified the Convention and is therefore under an obligation to demonstrate that it is taking actions with regard to its national implementation.

CELAN D2.1 ANNEX 2_fv1.2

39

5 Summary and recommendations This document shows that there exist many standards of relevance to the language industry. The situation resembles to some extent that in the field of eBusiness-related standards, where there is an affluence of industry (de facto) standards compared to a minority of international (de jure) standards of official standards organizations. This explains why there is a need for international standards to fill certain gaps on the one hand and for the harmonization of competing standards, which often create barriers to integration and interoperability instead of overcoming them, on the other hand.

Figure 2: Competing de facto and de jure standards in the field of eBusiness (author: Raymond Betz around 2005)

Standards – especially international standards – in the LI are important to ensure integration and interoperability of systems and content and thus the efficiency of processing and communicating the necessary information in terms of structured and unstructured content. This efficiency lets enterprises enjoy the full range of benefits of using the products and services of the LI – and makes them affordable to SMEs. As LI-related standards and assistive technology standards – in particular those related to communication – often address similar issues (and are both useful in eLearning); there should be more coordination efforts among the respective standardization activities going on. The respective standards should enjoy more attention in the education and training of computer scientists and software engineers.

Unfortunately, only few LI experts – whether from academia or industry – know the full range not only of LI products and services in general, but also of existing standards in particular. There is definitely a lack of education and training for vendor-neutral, objective and LI standards-knowledgeable consultants, which should be taken up by politics and industry associations.

CELAN D2.1 ANNEX 2_fv1.2

40

Recommendations Looking at the 2010-2013 ICT Standardisation Work Programme

“ICT standardisation is part of the general standardisation activities and contributes to the policy objective of improving European competitiveness while balancing industry expectations with societal needs”

language industry (LI) related standardization – given the stupendous rise of the LI over the last 5-10 years – Is clearly undervalued. Standardization efforts related to the LI may even become a major driver towards overcoming the fragmentation and deficits in interoperability of ICT standards at large.

Recommendation 1 (to policy and decision makers confronted with LI-related issues): In the age of globalization policy and decision makers should pay due attention to LI-related standards, which facilitate localization, i.e. the adaptation of products and services (including the respective documentation as well as communication processes) to the languages and cultures of target markets. In this connection, the comparatively new aspect of content quality is becoming a major issue in overall quality management of the organization. Efficient localization is indispensable for successful globalization which explains the astounding growth of the language industry (LI) with its products (i.e. language technology tools/systems [LTT], language and other content resources [LCR]) and services over the last years. Parallel to the development of LI products and services industry requires more and more the capability of ICT to be integratable or at least be interoperable – a requirement increasingly extended also to content. In this connection it is necessary to identify existing standards which need harmonization and gaps to be filled in standardization. Recommendation 2 (to all stakeholders in the field of LI-related standardization): LI products and services and their integration and interoperability with ICT in general and with various content management systems (CMS) in particular – duly taking into account the rising needs for the integration of assistive technologies – makes the enhanced coordination of LI-related standardization and implementation efforts imperative at several levels such as:

Character coding, fonts development etc.,

Metadata and metadata registries, repositories of codings etc.,

Formats, schemas, protocols, mark-up languages etc.,

Data modelling approaches (incl. meta-models and a ontology meta-language),

Document and content management (incl. sustainable archiving requirements),

Overall policies and management practices. Recommendation 3 (to standards developers in the field of LI-related standardization): The field of LI-related standardization activities has become – in line with ICT standardization at large – quite complex and fragmented, which calls for an increase of coordination efforts. This coordination could take place through the standardizing experts involved, improved communication and information channels among related standardization activities, more detailed cross-referencing of related standards (incl. those of industry SDO) and other means. As a side-effect this coordination – also including the harmonization of language used in standards – would improve the general quality level of standards. Recommendation 4 (to developers and users of LI products and services): Whether using or developing LI products and services in-house or outsourcing the use of these products and services to language service providers (LSP), due attention should be paid to pertinent standards which not only helps to save costs (and gain efficiency at the same time), but also to avoid misunderstandings (causing conflicts, risks and possible liabilities) as well as to benefit from improved quality of the resulting LI products and services. Recommendation 5 (to users of LI products and services): An efficient use of LI products and services needs thorough preparation and a systematic approach (e.g. on the basis of an explicit language policy/strategy) – including also a good overview of existing standards. Using – whenever possible – standards-compliant LI products and

CELAN D2.1 ANNEX 2_fv1.2

41

services helps to improve the efficiency of all localization activities and of content management at large. Looking for a vendor-neutral and objective consultant familiar with LI-related standards is a good way to implement the respective know-how fast and to help users take the most appropriate decision for their – often unaware – language problems. Recommendation 6 (to developers of LI products and services): Given the increased requirements for ICT system and content to be integratable and interoperable – extended towards the quality and interoperabiltiy of content – the developers of LI products and services should endeavor to be compliant with international standards and further develop them in order to overcome the fragmentation (resulting in barriers to interoperability) in this field. Especially SMEs should not be advised to invest into tools/systems which – if the necessary capability for integration and interoperability is not given – could prevent the upgrading of their systems in the future, not to mention the costly conversion or re-input of their content, if systems have to be upgraded or replaced. Recommendation 7 (to developers of language and other content resources): Standards-compliant or even standardized high-quality structured content is becoming more and more a crucial issue in the use of LI products and services. Quality structured content is invariably connected to the application of pertinent standards-based or standardized metadata. In particular, resources of structured content should be developed with content integration and interoperability in mind, where the re-purposability in assistive technologies is becoming a new requirement which is not so amazing given the complementarity or even similarities of data and information used through assistive technologies to content in the language industry. Recommendation 8 (to language service providers): Language service providers (LSP) need personnel having a broad range of technical competences and skills for carrying out the services – not to mention the necessary insight in the market, business experience, understanding of legal and other non-technical aspects. In rendering their services they have to deal with non-linguistic types of content, for which they need the right language technology tools (LTT). They also have to cope with a number of formats, schemas and markup languages constantly causing problems in their daily work. Therefore, it needs a high level of ICT literacy – including the familiarity with pertinent standards – on the LSP’s side in order to master this complexity, which they often have to share with their customers. Since LSP are growing in terms of numbers and size they are also major employers for graduates of LI-related fields, which should be considered particularly by higher education institutions (HEI). Recommendation 9 (to educational and training institutions/organizations): Given the fact that many of the aspects covered by LI-related standards and the complementarity standards related to assistive technologies are not sufficiently taught during the education of computer scientists and software engineers at higher education institutions (HEI), the intensification of cooperation also with respect to standardization activities is desirable. In addition, extra-HEI training schemes as well as an accredited – preferably standards-based – certification of the thus acquired skills/competences are definitely a need on the market.

CELAN D2.1 ANNEX 2_fv1.2

42

References

Nuria Bel e.a. Standardization Action Plan for CLARIN, 2009: http://www.clarin.eu/node/2841 Lachlan Blackhall. Educational Content Authoring Tools. A report written for the College of

Engineering and Computer Science, The Australian National University (2011) (PDF – Creative Commons Attribution-NonCommercial-NoDerivs 3.0 Unported License)

Gerhard Budin. Identification of problems in the use of LR standards and of standardization needs (2009) (FLaReNet Deliverable D 4.1 16 Oct)

Cedefop (ed.). ICT Skills Certification in Europe. Cedefop Dossier Series 13. Luxembourg: Office for Official Publications of the European Communities (2006)

CEPIS (ed.). Survey of Certification Schemes for ICT Professionals across Europe towards Harmonisation (HARMONISE). Project of CEPIS Council of European Professional Informatics Societies, final report. See: http://www.cepis-harmonise.org (September 2007)

COM(2010) 245 final/2. Communication from the Commission to the European Parliament, the Council, the European Economic and Social Committee and the Committee of the Regions – A Digital Agenda for Europe. European Commission: Brussels, 26.8.2010. See: http://www.google.be/#hl=de&gs_nf=1&pq=european%20disability%20strategy%202010-2020%20&cp=27&gs_id=b&xhr=t&q=A%20Digital%20Agenda%20for%20Europe&pf=p&sclient=psy-ab&oq=A+Digital+Agenda+for+Europe&gs_l=&pbx=1&bav=on.2,or.r_gc.r_pw.r_qf.,cf.osb&fp=870bc2462f1bb87&biw=1920&bih=909&bs=1

COM(2010) 636 final. Communication from the Commission to the European Parliament, the Council, the European Economic and Social Committee and the Committee of the Regions – European Disability Strategy 2010-2020. European Commission: Brussels, 15.11.2010. See: http://www.google.be/#hl=de&sclient=psy-ab&q=European+Disability+Strategy+2010-2020+&oq=European+Disability+Strategy+2010-2020+&gs_l=hp.12...1195.1195.0.3089.1.1.0.0.0.0.130.130.0j1.1.0...0.0...1c.6i5GbXsNQPU&pbx=1&bav=on.2,or.r_gc.r_pw.r_qf.,cf.osb&fp=870bc2462f1bb87&biw=1920&bih=909

Directive 2006/42/EC of the European Parliament and of the Council of 17 May 2006 on machinery, and amending Directive 95/16/EC (recast) (Text with EEA relevance). L 157/24 Official Journal of the European Union 9.6.2006

Directive 2006/123/EC of the European Parliament and of the Council of 12 December 2006 on services in the internal market. L 376/36 Official Journal of the European Union 27.12.2006

EURESCOM. Guidelines for building multilingual Web sites. EURESCOM Report. P923 Multilingual Web Sites: Best practice, guidelines and architectures. D1 Guidelines for building multilingual Web sites (Sept. 2000)

Erwin Folmer, Jack Verhoosel. State of the Art on Semantic IS Standardization, Interoperability & Quality. The Netherlands: UT, CTIT, TNO en NOiV (2011). See: http://doc.utwente.nl/76291/

Erwin Folmer. Quality of Semantic Standards. The Netherlands: UT, CTIT, TNO en NOiV (2012). SemanticStandards.org. See: https://sites.google.com/site/erwinfolmeronsemanticstandards/

Christian Galinski, Detlef Reineke. Vor uns die Terminologieflut [Coming, the terminology deluge]. In: eDITion (2011) 2 pp. 8-12

Christian Galinski, Karel Van Isacker. Standards-based Content Resources: A Prerequisite for Content Integration and Content Interoperability. In K. Miesenberger et al (Eds.), Computers Helping People with Special Needs. 12th International Conference, ICCHP 2010, Vienna, Austria, July 2010. Proceedings, Part I (pp. 573–579). Berlin/Heidelberg/New York: Springer (2010)

Daniele Gerundino and Michael Hilb. The ISO Methodology. Assessing the economic benefits of standards. ISO Focus+ (June 2010) pp. 10-16 (based on a Roland Berger Study 2009)

Expert Panel for the Review of the European Standardization System. Standardization for a competitive and innovative Europe: a vision for 2020. REPORT EXP 384 final, February 2010)

Richard Jackson, How demography is reshaping the economic and social landscape in the 21st century, in: The Geneva Association (2012), The Geneva Reports, No. 6(2012)

Monica Monachini e.a. The Standards’ Landscape Towards an Interoperability Framework. The FLaReNet proposal. Building on the CLARIN Standardisation Action Plan (July 2011) http://www.flarenet.eu/sites/default/files/FLaReNet_Standards_Landscape.pdf

Kara Warburton. Standards and Guidelines for the Language Industry (2009) Jeffrey Whitten, Lonnie D. Bentley, Kevin C. Dittman. Systems Analysis and Design Methods. 6th

edition (2004) UNESCO. Guidelines for terminology policies – Formulating and implementing terminology policy

in language communities. Paris: UNESCO (2005)

CELAN D2.1 ANNEX 2_fv1.2

43

Appendix 1: Tables Table 1: Identified standards, guidelines and legislation of relevance to the CELAN project (in particular for the aspects covered by CELAN WP2, briefly evaluated on the basis of a set of criteria) After giving an outline of the standardization framework at large and explaining the methodology applied, the standards identified as pertinent to the LI were classified into:

Basic standards related to the ICT infrastructure with particular impact on the LI (see chapter 2, where they are evaluated and put into perspective):

o Standards concerning character (glyph) coding, etc., o Standards related to the coding of names of countries, languages and scripts, o Standards related to the application of coding, o Standards related to data modeling, o Protocols, formats and schemas, o Standards related to the quality of data and information, o Information and documentation (I&D) standards, o Standards related to mobility and accessibility, o Certification based on standards;

Specific standards pertaining to language technologies, resources, services and LI-related competences and skills (see chapter 3, where they are evaluated and explained in line with the CELAN Typology of Language Industry (LI) Products and Services):

o Language technologies and language technology tools (LTT), o Language and other content resources (LCR), o Quality of language services and language services providers (LSP), o LI-related competences and skills, o Training schemes referring to the above and the training material used;

Latest developments in the convergence of LI-related standards and standardization in the field of assistive technologies (see chapter 4).

The standards, guidelines and legislation were identified according to the following criteria:

Primary relevance to the areas covered by the CELAN project,

Needs of small and medium-sized enterprises (SME) that want to globalize either now or in the near future,

Following the structure of the CELAN Typology of LI Products and Services,

Identification according to kind of normative document, type and sub-type of standard, title of standard, organization publishing the standard, status of standard, content, relevance and gaps (as much as could be learned from the sources consulted).

CELAN D2.1 ANNEX 2_fv1.2

44

Table 2: Identified gaps in standardization (and legislation) briefly explained (1) General gaps or insufficiencies

ICT standardization in general Standardization being highly fragmented needs the identification of gaps in international standards to be filled and the harmonization of a multitude of standards often of SDO in vertical fields of standardization.

Terminology of the language industry

in addition the terminology occurring in standards across different application areas is often inconsistent which makes the application of standards difficult.

Newly emerging fields of standardization and the respective SDO

Newly emerging fields of standardization, such as in the framework of large-scale research and development (R&D) are often not or only poorly coordinated with the standardization efforts of technical committees of the official standards organizations

Pre-standardization R&D activities Academic research often foresees emerging needs for standardization activities, but is frequently carried out lacking consultation and cooperation with industry.

Quality-related aspects of LI products and services

The importance of quality-oriented development of standards and accredited standards-based certification should be further emphasized in order to respond to the requirements of industry concerning integration and interoperability as well as sustainability in ICT application.

Training of LI experts including pertinent standards

In the education and training of computer scientists and software engineers the LI-related aspects (and converging complementary aspects of assistive technologies) are under-represented in general, LI-related standards are undervalued in particular.

Indicators for the usefulness and benefits of LI-related standards

Methods to evaluate, validate and measure the usefulness and benefits of standards are not sufficiently developed in general. Together with information on best practices and success stories they are necessary, in order to convince industry to take LI-related standards more seriously.

Promotion of LI-related standards and awareness raising

The official standardization organizations are usually not the most skilled in promoting standardization and raising the awareness for the benefits of standardization. New fields of standardization, such as LI-related standards are particularly disadvantaged by this situation.

(2) Specific gaps or insufficiencies

Speech-to-written conversion Although X-SAMPA to Unicode IPA conversions have been developed, this field may need harmonization in the wake of the requirements for content interoperability (covering also content accessibility).

Standardization of new protocols, formats and schemas

Protocols, formats and schemas are often developed and standardized neglecting the upward (or forward) compatibility and downward (or backward) compatibility (meeting among others archiving needs in technical documentation) with respect to the general industry requirements concerning integration and interoperability as well as sustainability.

Controlled natural language With a view to better comprehensibility and improved translatability/localizability standards on the principles and methods of controlled natural language (CNL), simplified natural language (SNL), plain language and the like are needed; ISO/AWI Language resource management – Simplified natural languages would be a first step in this

CELAN D2.1 ANNEX 2_fv1.2

45

direction.

Word count A standard providing the rules for cross-language word count is badly needed by customers needing documentation in many languages as well as by the respective LSP; harmonized word segmentation standards are a prerequisite for such a cross-language word count standard.

Standardized character sets and standards-compliant fonts

The use of different character sets and fonts not conforming to standards can lead to many undesired consequences in publications, technical documentation, speech-to-written conversion and various kinds of communication within an enterprise as well as with the outside world. A clear and detailed declaration of the degree of compliance with Unicode should be required from each font developer.

Standardized metadata Standardized structured content called metadata, data categories (or data element type), data elements, data element concepts, data dictionaries (or IRD, information resource dictionary) and the like are necessary to manage all kinds of structured content in a transparent, integratable and interoperable way – not to mention the facilitation of data exchange. Standardized approaches and the development of open metadata registries should be promoted.

Standards-based or standardized codings

There are many standardized code systems, such as country codes ( ISO 3166), language codes (ISO 639), script codes (ISO 15924:2004), Quantities and units (ISO 80000 multipart) and many other coding systems which may directly or indirectly govern LI activities and services. (See: http://www.iso.org/iso/standards_development/maintenance_agencies.htm) In addition there are thousands of big, small or mini-repositories of codings – especially in the eApplication fields – which often are not even standards-based. Under today’s requirements of integration and interoperability these codings increasingly have to be interchanged – often in combined form.

Standardization related to language and other content resources

The Internet abounds with non-standardized online content resources – increasingly with those of monolingual and multilingual structured content. The existing methodology standards – including the metadata standards – would apply to them too, but are not respected or respected only to a certain extent by the repository owners. Therefore, the average level of data quality in such repositories is not as high as it would be desirable. (1) Given the ever-increasing quantities of items of structured content today new – preferably standards-based – approaches are needed under a quality perspective to cope with these quantities while improving the quality of resources at the same time – not to mention the user-friendliness of access, different modalities (like spoken and written content) and non-linguistic content that have to be dealt with etc. This requires efforts to harmonize data models and data modeling methods with respect to the requirements of content integration and interoperability. (2) An international standard or standardized guidelines addressing the numerous cooperational, economical and legal aspects to be considered when developing a platform

CELAN D2.1 ANNEX 2_fv1.2

46

for the web-based cooperative development of structured content could be useful. It would support the increased emergence of successful business models for such platforms, providing also information on good practices with respect to copyright issues concerning the “ownership” of data elaborated on the platforms – not to mention enhancing their sustainability.

Quality-oriented standards concerning language services

After the EU’s Directive 98/37/EC on the approximation of the laws of the Member States relating to machinery triggered the first standards in Europe on the quality of technical documentation, localization and translation, LSP-related standardization efforts during the last 2-3 years entered new dimensions with respect to the definition of quality of language services and their results as well as the inclusion of a broader range of services and products. These efforts need support – whether financial or non-financial – in order to come up with the most appropriate results.

Standards and authoritative guidelines on language policies/strategies

After UNESCO’s Guidelines for terminology policies – Formulating and implementing terminology policy in language communities were published in 2005, the need for more such documents targeting policy and decision makers at national, language community or enterprise level was increasingly recognized. In addition, simple guidelines addressing different kinds of decision makers in different situations are needed.

Standards and authoritative guidelines related to the converging of LI products and services and assistive technologies

Only lately academic conferences and workshops including representatives of the LI and assistive technologies recognized the benefits of a convergence of many aspects of the language technologies and the assistive technologies, as well as a convergence of the content-related methods by coordinating the respective standards. There is an upcoming big need for standards in this field.

CELAN D2.1 ANNEX 2_fv1.2

47

Table 3: Identified stakeholders (standards developing organizations etc.) Within the framework of the analysis of existing standards, the following major stakeholders were identified (see Report chapter 1): (1) International standards organizations:

International Organization for Standardization (ISO),

International Electrotechnical Commission (IEC),

Joint Technical Committee ISO/IEC-JTC 1 Information technology,

International Telecommunication Union (ITU),

European Committee for Standardization (CEN),

European Telecommunication Standards Institute (ETSI). (2) Additional Standards developing organizations (SDO):

World Wide Web Consortium (W3C) and in particular its Internet Engineering Task Force (IETF),

Institute of Electrical and Electronics Engineers (IEEE),

Organization for the Advancement of Structured Information Standards (OASIS),

ASTM International (formerly known as the American Society for Testing and Materials). (3) Project consortia or networks that develop pre-normative documents that are accepted as quasi-standards by certain stakeholder groups:

Text Encoding Initiative (TEI),

Common Language Resources and Technology Infrastructure (CLARIN),

European Language Resources Association (ELRA),

Fostering Language Resources Network (FLaReNet),

Network of the Multilingual Europe Technology Alliance (META): META-NET. In addition to the above, there are probably hundreds of industry consortia considering themselves as SDO in various fields of ICT standardization incl. aspects of the LI. See Appendix 3: List of standards developing organizations (SDO) in the fields of the ICT. SemanticStandards.org provides lists of semantic standards (See: http://www.semanticstandards.org/) and other resources available on the Internet, such as

Standard Setting Organizations and Standards List,

Survey of Fora & Consortia, which partly overlap with this list from CEN.

CELAN D2.1 ANNEX 2_fv1.2

48

Appendix 2: Recommendation on software and content development principles 2010

Purpose

This recommendation addresses decision makers in public as well as private frameworks, software

developers, the content industry and developers of pertinent standards. Its purpose is to make aware that

multilinguality, multimodality, eInclusion and eAccessibility need to be considered from the outset in

software and content development in order to avoid the need for additional or remedial engineering or

redesign at the time of adaptation, which tend to be very costly and often prove to be impossible.

Background

In software development, globalization1, localization

2 and internationalization

3 have a particular

meaning and application. In software localization they have been recognized as interdependent and of

high importance from a strategic level down to the level of data modelling and content interoperability.

In 2005, the Management Group of the ITU-ISO-IEC-UN/ECE Memorandum of Understanding on

eBusiness standardization adopted a statement (MoU/MG N0221), which defines as basic requirements

for the development of fundamental methodology standards concerning semantic interoperability the

fitness for

- multilinguality (covering also cultural diversity),

- multimodality and multimedia,

- eInclusion and eAccessibility,

- multi-channel presentations,

which have to be considered at the earliest stage of

- the software design process, and

- data modelling (including the definition of metadata),

and hereafter throughout all the iterative development cycles.

The above requirements are a prerequisite for global content integration and aggregation as well as

content interoperability. Content interoperability is the capability of content to be combined with or

embedded in other (types of) content items and to be extensively re-used as well as re-purposed for

other kinds of eApplications. In order to achieve this capability, software must support these requirements

from the outset. The same applies to the methods and tools of content management – including web

content management.

Recommendation

Software should be developed and data models for content prepared in compliance with the above-

mentioned requirements to facilitate the adaptation to different languages and cultures (localization) or

new applications (re-purposing), the personalization for different individual preferences or needs,

including those of persons with disabilities. These requirements should also be referenced in all pertinent

standards.

1 Globalization refers to all of the business decisions and activities required to make an organization truly international in scope and

outlook. G11N is the transformation of business, processes and products to support customers around the world, in whatever language, country, or culture they require.

2 Localization is the process of modifying products or services to account for differences in distinct markets. Therefore, L10N is an

integral part of G11N, and without it, other globalization efforts are likely to be ineffective. The interdependence of G11N and L10N has also been coined glocalization.

3 Internationalization is the process of enabling a product at a technical level for localization. An internationalized product does not

require remedial engineering or redesign at the time of localization. Instead, it has been designed and built from the outset to be easily adapted for a specific application after the engineering phase.

CELAN D2.1 ANNEX 2_fv1.2

49

Appendix 3: List of Standards developing organizations (SDO) in the fields of the ICT

according to CEN – Edition 17, December 2011

SemanticStandards.org provides lists of semantic standards (See: http://www.semanticstandards.org/) and other resources available on the Internet, such as

Standard Setting Organizations and Standards List

Survey of Fora & Consortia which partly overlap with this list from CEN.

1394 TA The 1394 High Performance Serial Bus Trade Association

A

AACS Advanced Access Content System

ACCELLERA

Accellera Systems Initiative aims at creating and advancing system-level design, modeling, and verification standards for use by the worldwide electronics industry

ACM Association of Computing Machinery

AES Audio Engineering Society

AFEI Association For Enterprise Integration

AIIM Association for Information and Image Management

AIM Association for Automatic Identification and Mobility

AMWA Advanced Media Workflow Association

ARMA

International – Standards & Best Practices in managing records & information

ARTS Association for Retail Technology Standards

ASTM International – American Society for Testing and Materials

ATIS Alliance for Telecommunications Industry Solutions

AUTOSAR Automotive Open System Architecture Partnership

B

BioAPI Biometric Application Programming Interface

Bluetooth Bluetooth Consortium

Broadband Forum Forum for next generation IP network specifications

BSF Broadband Services Forum

C

Cablelab Cable Laboratories

CalConnect.org Calendaring and Scheduling Consortium

CANENA

Council for Harmonization of Electrotechnical Standardization of the Nations of the Americas

CDG CDMA Development Group

CDISC Clinical Data Interchange Standards Consortium

CEA The Consumer Electronics Association

CELF Consumer Electronics Linux Forum

CHeS Coalition for Healthcare eStandards Inc.

CIPA Camera and Imaging Products Association

CISQ Consortium for IT Software Quality

CLSI Clinical and Laboratory Standards Institute

CompTIA Computing Technology Industry Association

CTIA Cellular Telecommunications & Internet Association

CVC Component Vendor Consortium

D

DCMI Dublin Core Metadata Initiative

DDEX Digital Data Exchange

CELAN D2.1 ANNEX 2_fv1.2

50

DDWG Digital Display Working Group

DECT Forum Digital Enhanced Cordless Telecommunications

DIGITAL EUROPE – represents the digital technology industry in Europe

DLNA Digital Living Network Alliance

DMPF The Digital Media Project

DMR Digital Mobile Radio

DMTF Distributed Management Task Force, Inc.

DRM Digital Radio Mondiale

DVB Digital Video Broadcasting Project

DVD Forum

international association of hardware manufacturers, software firms, content providers and other users of Digital Versatile Discs

E

ebIX

European forum for energy Business Information eXchange

Echonet Echonet Consortium

Eclipse.org

A project aiming to provide a universal toolset for development

ECMA

An international Europe-based Industry Association for Standardizing Information and Communication Systems

ECSS European Cooperation for Space Standardization

EDA Consortium Electronic Design Automation Consortium

EDIFICE The European B2B forum for the Electronics Industry

EEMBC Embedded Microprocessor Benchmark Consortium

EIA Electronic Industries Alliance

EIC Emergency Interoperability Consortium

EIDQ

Association for the Directory Information and Related Search Industry

EMF European Multimedia Forum

Energistics Energy Standards Resource Centre

EPASOrg Driving Interoperability in Card Payments

EPCglobal

is leading the development of industry-driven standards for the Electronic Product Code™ (EPC) to support the use of Radio Frequency Identification (RFID) in today's fast-moving, information rich, trading networks

ERTICO Intelligence Transport System and Services Europe

ETIS The Global IT Association for Telecommunications

Euro Geographic

international non-profit association under Belgian law. It is the membership association of the European cadastre, land registry and national mapping authorities

EUROGI

EURopean umbrella Organization for Geographic Information

EUROSMART European Smart Card Industry Association

F

FCIA Fibre Channel Industry Association

FEMTO Forum

In telecommunications, a femtocell is a small cellular base station, typically designed for use in a home or small business

FIPA Foundation for Intelligent Physical Agents

FlexRay Consortium automotive network communications protocol developed by the Flex Ray Consortium

FSTC Financial Services Technology Consortium

G

CELAN D2.1 ANNEX 2_fv1.2

51

Global Platform Advancing standards for smart card growth

Globus Alliance

international association dedicated to developing fundamental technologies needed to build grid computing infrastructures

GmSA Global Mobile Suppliers Association

GS1 (Formerly EAN)

GSA Gaming Standards Association

GSDi Global Spatial Data Infrastructure

GVF Global Very Small aperture Terminal (VSAT) Forum

H

HIBCC Health Industry Business Communications Council, The

HIMSS Healthcare Information and Management Systems Society

HL7 Health Level Seven

HomePlug Home Plug Power line Alliance

HomePNA Home Phone line Networking alliance

HR-XML Human Resource XML Consortium

I

I3A International Imaging Industry Association

IBIA International Biometric Industry Association

IBTA InfiniBand Trade Association

ICA International Communications Association

ICH Interoperability Clearinghouse

IDEAlliance International Digital Enterprise Alliance

IDEMA

International Disk Drive Equipment and Materials Association

IDPF International Digital Publishing Forum

IEEE Institute of Electrical and Electronic Engineers

IEST Institute of Environmental Sciences and Technology

IETF Internet Engineering Task Force

IFSF International Forecourt Standards Forum

IHE Integrating the Healthcare Enterprise

IIA Internet Industry Association

IMS Forum

a global non-profit industry association dedicated to the advancement of IP Multimedia Subsystem applications and services interoperability.

IMTC The International Multimedia Teleconferencing Consortium

INC Industry Numbering Committee

INCITS

International Committee for Information Technology Standards

iNEMI International Electronics Manufacturing Initiative

Intergeo A.s.b.l Interoperable Interactive Geometry

Internet2 Internet 2 Initiative

INTUG International Telecommunication User Group

IPC Association Connecting Electronic Industries

IPTC International Press Telecommunications Council

IPv6Forum Internet Protocol version 6 Forum

IrDA The Infrared Data Association

ISA The Instrumentation, Systems, and Automation Society

ISC Internet Systems Consortium

ISF Information Security Forum

CELAN D2.1 ANNEX 2_fv1.2

52

ISMA Internet Streaming Media Alliance

ITS America Intelligent Transportation Society of America

itSMF IT Service Management Forum

IVI Foundation Interchangeable Virtual Instruments Foundation

IWA International Webmasters Association

J

JCF Java Card Forum

Jedec

global leader in developing open standards for the microelectronics industry

K

Khronos Group

not for profit, member-funded consortium focused on the creation of royalty-free open standards for parallel computing, graphics and dynamic media on a wide variety of platforms and devices.

KNX KONNEX Association

L

Liberty Alliance Project

its materials are kept by Kantara Initiative Shaping the Future of Digital Identity

Linux Foundation

nonprofit consortium dedicated to fostering the growth of Linux.

LonMark International

global membership organization created to promote and advance the business of efficient and effective integration of open, multi-vendor control systems utilizing ISO/IEC 14908-1 and related standards

LXI LXI Consortium

M

MDA Mobile Data Association

MEF Metro Ethernet Forum

MIPC Mobile Imaging and Printing Consortium

MIPI Mobile Industry Processor Interface

MMA Midi Manufacturers Association

Mobey Forum Mobile Financial Services

MPEG Industry Forum Moving Picture Experts Group

MSF Multiservice Switching Forum

N

NANOG North American Network Operators Group

NCOIC Network Centric Operations Industry Consortium

NCPDP National Council for Prescription Drug Programs, Inc.

NFC Forum Near Field Communication Forum

NIL Com The NIL (Nanoimprint Lithography) Consortium

NISO National Information Standards Organization

NPES

Association for Suppliers of Printing, Publishing and Converting Technologies

O

OAG Open Applications Group

OAI Open Archives Initiative

OASIS

Organization for the Advancement of Structured Information Standards

OCP-IP Open Core Protocol International Partnership

ODVA Open Device Net Vendor Association, Inc.

OGC Open GIS Consortium

OGF Open Grid Forum

CELAN D2.1 ANNEX 2_fv1.2

53

OIF Optical Internetworking Forum

OIPF Open IPTV Forum

OMA Open Mobile Alliance

OMG Object Management Group

OMTP Open Mobile Terminal Platform Group

ONFI Open NAND Flash Interface

OPA Online Privacy Alliance

Open Ajax Alliance industry group devoted to the set of technologies and Web programming techniques known as Ajax

Open Forum Europe Not-for-profit organization helping to accelerate, broaden and strengthen the use of OSS in business and government

OSCRE Open Standards Consortium for Real Estate

OSE Open Security Exchange

OSGI Open Services Gateway Initiative

OSI Open Source Initiative

OTA Open Travel Alliance

OW2 OW2 Consortium

P

PC104 Consortium – for information on small form factor embedded computer

and I/O boards utilizing PC/104 technology

PCCA portable Computer and Communications Association

PCI SIG

Peripheral Component Interconnect Special Working Group

PDES

international industry/government consortium committed to accelerating the development and implementation of standards that enable enterprise integration and PLM interoperability for its member companies

PHS MoU Group

Personal Handyphone System Memorandum of Understanding Group

PICMG PCI Industrial Computer Manufacturers Group

PIDX Petroleum Industry Data Exchange Committee

Power.org

Organization to develop, enable, and promote Power CPUs as preferred open standard hardware platform for the electronics industry

Project Mesa Mobile Broadband for Public Safety

PWG Printer Working Group

R

RapidIO Trade Association non-profit corporation controlled by its members, directs the future development and drives the adoption of the RapidIO architecture

RosettaNet

industry standards provide business frameworks that allow individual companies to enhance the interoperability of business processes across the global supply chain

S

SA Forum Service Availability Forum

SATA-IO Serial ATA International Organization

SCSITA Small Computer System Interface Trade Association

SEMATECH Semiconductor Manufacturing Technology

SIA Security Industry Association

SIFA Schools Interoperability Framework Association

CELAN D2.1 ANNEX 2_fv1.2

54

SIM Alliance Subscriber Identification Module Alliance

SIP Forum

non-profit industry organization that promotes the advancement and adoption of the SIP protocol

SISO Simulation Interoperability Standards Organization

Smart Card Alliance –

non-profit association, works to stimulate the understanding, adoption, and widespread application of smart card technology

SMDG User group for Shipping Lines and Container Terminals

SNIA Storage Networking Industry Association

SPC Storage Performance Council

SSCI Systems and Software Consortium, Inc.

Symbian

mobile operating system (OS) and computing platform designed for smartphones and currently maintained by Accenture

T

TAHI The Application Home Initiative

TCG Trusted Computing Group

TD SCDMA Forum enhances the cooperation with other telecom associations to facilitate convergence and evolution of various technologies

TEI-C Text Encoding Initiative Consortium

TETRA MoU Association Terrestrial Trunked Radio

The Zhaga Consortium

specifications that enable interchangeability of LED light sources made by different manufacturers

TIA Telecommunications Industry Association

TISA Traveller Information Services Association

TMF TeleManagement Forum

TOG The Open Group

TPC Transaction Processing Performance Council

TWIST Transaction Workflow Innovation Standards Team

U

UMTS Forum Universal Mobile Telecommunications System Forum

Unicode Consortium

enables people around the world to use computers in any language.

UniForum

The International Association of Open Systems Professionals

UPnP Universal Plug and Play Forum

USB-IF Universal Serial Bus Implementers’ Forum

USPI

Uitgebereid Samenwerkingsverband Procesindustrie Nederland

V

VESA Video Electronics Standards Association

VICS Voluntary Interindustry Commerce Standards Association

VITA VMEBus International Trade Association

Voice XML Forum The Voice Extensible Markup Language Forum

VOIPSA Voice over IP Security Association

VPNC Virtual Private Network Consortium

W

W3C World Wide Web Consortium

WASC Web Application Security Consortium

WEB3D WEB3D Consortium

CELAN D2.1 ANNEX 2_fv1.2

55

WEDI Workgroup for Electronic Data Interchange

WfMC Workflow Management Coalition

WHAT Web Hypertext Application Technology

Wi-Fi Alliance

trade association that promotes Wireless LAN technology and certifies products if they conform to certain standards of interoperability

WInnF Wireless Innovation Forum

WiMAX Forum Worldwide Microwave Interoperability Forum

WiMedia Alliance

global nonprofit organization, defines, certifies and supports enabling wireless technology for multimedia applications

WINA Wireless Industrial Networking Alliance

WorldDAB Forum World Digital Audio Broadcast Forum

WPC Wireless Power Consortium

WS-I Web Services Interoperability Organization

X

XII - XBRL International eXtensible Business Reporting Language

X.org

refers to several things related to the X Window System: X.Org,

Z

ZigBee The ZigBee Alliance

CELAN D2.1 ANNEX 2_fv1.2

56

Appendix 4: Localization (L10N) related standardization

K. Warburton and A. Lommel show how standards activities related to localization are organized, both by organization and by function. The charts below were taken among others as reference in preparation of the present document.

Chart 1: Localization-related standards organizations Taken from: http://www.gala-global.org/files/webfm/GALA-Standards-A-Broad-View-WhitePaper.pdf. Authors: Kara Warburton and Arle Lommel

Chart 2: Localization-related standards Taken from: http://www.gala-global.org/files/webfm/GALA-Standards-A-Broad-View-WhitePaper.pdf. Authors: Kara Warburton and Arle Lommel