Resource Discovery (metadata and searching) Working Group Report

Preview:

Citation preview

Resource Discovery(metadata and searching)

Working Group Report

Issues discussed

• What kinds of resources should EMELD provide search services for?

• What should the design be for an EMELD search interface?

• How can EMELD get good metadata into its search database?

• What level of metadata should be exposed?

What resources?

• Anything that might be of value to the endangered language's linguist.– Language data– Tools– Advice (including reviews)– People– "Gateway" websites

What resources?

• But, there's no reason to rely on this working group for "what".

• A questionnaire distributed via Linguist

What resources?

• Two kinds of best practice resources

• Resources with best practice metadata– These resources can be discovered– Non-digital resources encouraged– Digital resources discouraged, but allowed

What resources?

• Best practice digital resources

• All digital resources encouraged to be of this type

• Benefits– Enhanced search features (due to document

interoperability)– Special "BP globe of approval" √

What resources?

• Side Note– Best Practice "approval" system should be tied

into a larger system through which digital resources could be listed as "publications"

– A topic for another working group? (Perhaps OLAC?)

What resources?• Issues which need to be addressed

• Metadata for resources interesting to linguists but which are not linguistic data

• Needed: Best practice metadata standards for– Tools– Advice– People– ...

• Test: EMELD could see how it would classify everything in BPU.

How to search?

• Assumption: Metadata and data is distributed

• Query Language– Metadata: OLAC standard– Data from interoperable documents: A new

standard

How to search?

• Resource Query Language Ideal– A generalized query protocal used across the

linguistics community– A series of "methods" to be defined can be

called on these resources to retrieve structured linguistic data matching query parameters

How to search?

• Problems implementing ideal– No clear sense as to what "methods" are

needed.– One solution: Examine results from

questionnaire

How to search?

• Problems implementing ideal– Very few repositories allow their data to be

accessed in a generalized way– First step: Encourage documentation of

repository data access systems and develop a metadata standard for this

How to search?• Long term implementation issues

– An OLAC Query Language Protocol• A well-defined linguistic query language

• A system for "packaging" queries

– Linguistic data search registry• Linguistic sites register they are data access sites

• They also register implemented search methods

– EMELD will archive best-practice documents for data access for data creators not capable of implementing the query protocol

How to search?• Pilot project

– Take some small subset of resources• Data inputted via Field

• Nijmegen? SIL? AIATSIS? AILLA?

– Take FIELD search out of FIELD– Search over that small set of resources– Ideally, keep both resources in separate

databases to begin to develop query interchange protocol

How to search?• Another project: Grammatical thesaurus

– Develop a grammatical thesaurus that gives common synomyns for a given grammatical term (Ex. oral stop, plosive)

– This could then be used to allow a user's search to be expanded to include synonyms for a given term.

– In all likelihood, there are other applications of this.

How to search?• Search interface

– EMELD should implement a VISER-like service for access to its database

– There are two distinct kinds of searches• Resource location

• Resource data search

How to search?• Search interface

– The details of the search interface implemented by EMELD are hard to conceive of until more resources can be accessed through it

– A questionnaire can help with this area too.• EMELD could ask people to try the search and

evaluate it

• Starting with the people in this room

Getting the data

• Sticks– EMELD Ambassadors– Assisted by Linguist Spider

Getting the data

• Carrots– Support harvesting metadata in document

headers for submitted URL's.– Resources with best practice metadata can be

referenced using some standard EMELD URI which can be used as a reference

– These resources could be posted and advertised on Linguist(but consult Baden first)

Getting the data

• Juiciest Carrots (Best Practice resources only)

– "Preferred" EMELD URI's– Marked as such in a search– Could undergo "advanced" search techniques– Be peer-reviewed and vetted by LDRA

(Linguistic Digital Resource Association)*

*This organization does not exist, as far as I know.

Granularity

• Right now there are no recommendations for the granularity of exposed metadata records– Large archives, for example, have hierarchical

structure, one level of which must be isolated (the IMDI session, for example)

– Cutting-edge archives don't work well with the resource=object model. Their resources are "created" based on the user's needs

Granularity

• The lack of recommendations on this issue inhibits metadata creation

• Granularity makes a big difference as to what content is searchable

• Two different audience's in need of advice– "Real" archives (a.k.a. trusted repositories)– Individuals

Granularity

• Recommendation: EMELD should encourage IMDI and OLAC to devise best-practice recommendations for granularity

The questionnaire

• Two broad kinds of questions:– What kinds of things would you like?– What kinds of would you hate hate?

(Dafydd's Corollary)

The questionnaire

• Part one: Search capabilities– How do you want to conduct your search (google-

style, directory-style, pull-down menus...)?– What kinds of searches are you doing already on

other sites?– Search within results? (We wanted this.)– Thesaurus-based search

The questionnaire

• Part Two: Search content– Free entry (like Google)– Feature-based entry– Statistical questions– Phonetic characters– Geographical search– Time search– ...

The questionnaire

• Part Three: Results– Google-like results– Journal abstract search-like results– Restricted results (only return web sites, .pdf

documents, ...)– ...

The questionnaire

• Format– Online submission– Combination multiple choice (for the uncreative)

and free form (for the creative)– Encourage people to envision the search of the

year 2503

Recommended