19
The GACS Project Caterina Caracciolo FAO Amsterdam, 22 Sept. 2014

The GACS Project by Caterina Caracciolo

Embed Size (px)

DESCRIPTION

Presentation delivered at the Agricultural Data Interoperability Interest Group -- Research Data Alliance (RDA) 4th Plenary Meeting -- Amsterdam, September 2014

Citation preview

Page 1: The GACS Project by Caterina Caracciolo

The GACS Project

Caterina CaraccioloFAO

Amsterdam, 22 Sept. 2014

Page 2: The GACS Project by Caterina Caracciolo

What is GACS?

• A collaboration between – FAO (AGROVOC) – NAL of USA (NAL Thesaurus) – CABI (CAB Thesaurus)

• To make a common repository of terminological and conceptual information in agriculture

Page 3: The GACS Project by Caterina Caracciolo

Why?

• To coordinate efforts in the same area• And profit from differences of the three

thesauri

Page 4: The GACS Project by Caterina Caracciolo

Who?

• FAO - AGROVOC thesaurus– 32,000 concepts (RDF/SKOS native), 20 languages– Covering agriculture, fisheries, forestry,

environment, food, ...• CABI – CAB Thesaurus– 245,000 terms, 11 languages (4 currently updated)– Majority, scientific names

• NAL – NAL Thesaurus– ~100,000 terms. English, Spanish. Most chemicals

Page 5: The GACS Project by Caterina Caracciolo

AGROVOC (website to change soon)

Page 6: The GACS Project by Caterina Caracciolo

NAL Thesaurus

Page 7: The GACS Project by Caterina Caracciolo

CAB Thesaurus

Page 8: The GACS Project by Caterina Caracciolo

How?

Phase 1: Feasibility study (concluded)Phase 2: Creation of a GACS core (now - early 2015)– A core of ~10,000 concepts aligned in the three

thesauri– Separate URIs

Phase 3: The “real” GACS– Expansion of the core– Expansion of the partnership

Page 9: The GACS Project by Caterina Caracciolo

Some questions/issues

Page 10: The GACS Project by Caterina Caracciolo

Some questions/issues

• Is there an overlap between the three thesauri?

• What is the potential for alignment?• How to select the “core”?• Will provenance information be kept? • What to do with different hierarchies?• Infrastructure?

Page 11: The GACS Project by Caterina Caracciolo

Is there an overlap between the three?

Page 12: The GACS Project by Caterina Caracciolo

An estimate of potential for alignment

Page 13: The GACS Project by Caterina Caracciolo

How to select the core?

1. Selection of “seeds” of 10,000 concepts based on use in app of choice

Page 14: The GACS Project by Caterina Caracciolo

How to select the core?

2. Run mapping algorithms and get a “single” core, then to be manually assessed

Page 15: The GACS Project by Caterina Caracciolo

What is the frequency of concepts used for a corpus?

--- A sample from Agris

Page 16: The GACS Project by Caterina Caracciolo

Will provenance information be kept?

• Yes, it is fundamental for all• We agreed on a set of metadata to keep at

the level of concept and terms– Creator, Date of creation, Date of last update, ..– Agreed format is SKOS-XL, to be able to make

statements on

Page 17: The GACS Project by Caterina Caracciolo

Hierarchies may be different, although similar...

Page 18: The GACS Project by Caterina Caracciolo

What about the GACS infrastructure?

• Suite of tools:– VocBench for editing (FAO, U of Tor Vergata)– Skosmos (Finnish National Library)

• Exposure and publication to be arranged, supported by FAO

Page 19: The GACS Project by Caterina Caracciolo

Pointers

• On aims.fao.org we regularly publish updates – register to bulletins

• GACS reports published so far:– http://

aims.fao.org/community/agrovoc/blogs/phase-one-gacs-approved-read-reports

• A website will follow