Upload
madeleine-hodge
View
213
Download
0
Tags:
Embed Size (px)
Citation preview
Exploring ‘Workspaces’
Tom Visser, SARA compute and networking services, Amsterdam
Garching Workshop 21st September 2010
• Background• Overview of cases• Technical possibilities• Opportunities and risks • Expected results• Proposed approach
The CLARIN-NL connection
• Seeking to create an infrastructure for language resources
• Providing access to tools and technologies• CLARIN-NL and BiG Grid are exploring possibilities• The WHOLE pipeline
– Creating– Curation– Collecting– DO SCIENCE– Depositing
Already
• SARA has developed a client implementation of a Persistent Identifier Service (HANDLE) and has become an EPIC consortium member
• Instance of service currently hosted at SARA• BiG Grid / SURFNET pilot with Short lived
credential service• Activities with Computational Linguistics (e.g.
Named Entity Recognition) & forthcoming Computational Humanities institute (KNAW)
• Series of workshop to find a common ground between BiG Grid and the CLARIN infrastructure
Questions of today
• When is a user workspace service?• Why do we need user workspaces?• What are their characteristics in a distributed
environment?• How do we support processing chains in
distributed environments driven by community environments
• Are there generic frameworks for the execution of distributed processing chains and deployment of web-services
Core problems
• Where to store • How to store• How to access• How to foster collaboration amongst people• How to support: Data discovery, exploration and
exploitation• How to realize such a service• What SLA / service description / responsibilities
What it should be• A temporary storage place (days, weeks, years)
– Global home / global scratch– A ‘logical mount point’
• Accessible by web services• Meaningfully accessible by a human• Autonomy to communities
– Instantiate– Content– Control
• Identifiable• Store digital objects and metadata• Journaling (register interactions)
• Create• Read• Write• Update• Grant access to (Authorization)• List contents• Search contents
– Adopting & offering known best practices and services in the ecosystem
• …
Considered technical possibilities
• iRODS• Cloud platform (SNIA/CDMI)• HADOOP implementation• AMAZON S3 / OpenCloud / Azure /
Risks and opportunities
• Creating something that is only generic - specific• Looking uphill, but what will you know when
you’ve climbed the hill• Knowledge of the community• Epistemological problems• Bootstrapping• Trust
• Proces focus: we are starting a small scale pilot within 1 month, short iterations, keeping everyone involved.
Approach: BiG Grid and Dutch partners
• Many interesting addressable cases– Keyword extraction from dutch audio and film institute– MPI video repository annotations– City of Den Haag government proceedings: minutes and
video alignment (feature extraction)– OCR & Machine learning on dutch handwritings
• Expected results– Common understanding of a workspace service– Bootstrap implementation vertically crossing all layers
• When is a user workspace service?– When it is used and has become an indispensible tool
• Why do we need user workspaces?– To be able to flexibly work with data– Initiate collaborations– Have a trustable storage resource availble
• What are their characteristics in a distributed environment?– Clear core functionality, many service providers, integration
with identity providers • How do we support processing chains in distributed
environments driven by community environments– By having open, known, and easily accessible services
• Are there generic frameworks for the execution of distributed processing chains and deployment of web-services– Yes!
THANK YOU