View
661
Download
2
Tags:
Embed Size (px)
DESCRIPTION
Text and Non-textual Objects: Seamless access for scientists Uwe Rosemann (German National Library of Science and Technology (TIB), Germany) The European High Level Expert Group on Scientific data has formulated the challenges for a scientific infrastructure to be reached by 2030: “Our vision is a scientific e-infrastructure that supports seamless access, use, re-use, and trust of data. In a sense, the physical and technical infrastructure becomes invisible and the data themselves become the infrastructure – a valuable asset, on which science, technology, the economy and society can advance”. Here, “data” is not restricted to primary data but also includes all non-textual material (graphs, spectra, videos, 3D-objects etc.). The German National Library of Science and Technology (TIB) has developed a concept for a national competence center for non-textual materials which is now founded by the German State and by the German Federal Countries. The center has to perform the task: developing solutions and services together with the scientific community to make such data available, citable, sharable and usable, including visual search tools and enhanced content-based retrieval. With solutions such as DataCite and modular development for extraction, indexing and visual searching of new scientific metadata, TIB will accept the challenge. And will make all data accessible to its users fast, convenient and easy to use. The paper shows what special tools are developed by TIB in the context of scientific AV-media, 3D-objects and research data.
Citation preview
Uwe Rosemann
ICIC 2013 Vienna
Textual and non-textual objects:
Seamless access for scientists
2
• Specialized Library for Architecture, Chemistry, Computer Science,
Mathematics, Physics, Engineering Technology
• Financed by Federal Government and all Federal States
• Member of the Leibniz Association
• Global supplier for scientific and technical
information
German National Library of Science and Technology (TIB)
3
Global Network
TechLib
4
Customers
71% 10%
Europe
14% 5%
World USA
Germany
5
Main Services
• Provision of scientific content
• full texts, document delivery, interlibrary loan
• Scientific retrieval
• portal GetInfo
• Long-term preservation
• DOI-Service for research data
• Research and development
6
Jim Gray, eScience Group, Microsoft Research
Changes in the scientific process
7
A gap
• A widening gap in the scientific record between published
research in a text document and the data that underlies it
• As a result, datasets are
• difficult to discover
• difficult to access
• Scientific information gets lost
8
Requirements - Politics
Knowledge is power.
Europe must manage the digital assets its researchers generate.
9
Final report of the High Level Expert Group on Scientific Data.
„Riding the wave“ – How Europe can gain access
from the rising tide of scientific data
10
Strategy – Move beyond text
Simulation
Scientific Films
3D Objects
Text
Research Data
Software
11
Move beyond text – Consequences for TIB
• Research communities produce many types of scientific and technical
information
• Each has its own unique characteristics and life cycle
• Must become capable of accepting and managing new media formats
12
Competence Center for Non-textual Materials I
• Develop a clear strategy for the use and integration of non-textual
materials at the TIB
• Systematically collect non-textual materials from research and teaching
• Define, integrate and establish technical infrastructure
• Define and establish workflows for indexing, cataloguing, digital
preservation, DOI names, licencing
13
Competence Center for Non-textual Materials II
• Develop innovative media-specific portals enabled by e.g. an automated
video analysis with scene, speech, text and image recognition
• Linking non-textual materials to other research information such as full
texts and research data via the specialist portal GetInfo
• Engage in communities, provide support and advice to media providers
TIB will establish its own research capacity
14
• Infrastructure for research data
• Visual search tools for AV-media
• 3D-Objects
• chemOCR
How have we been preparing ?
15
• In 2005, the TIB became a non-commercial DOI registration agency
for research data
• In 2010, the TIB became co-founder of the international DataCite
consortium to establish easier access to scientific research data on the
Internet
Mission
• Citability of research data
• High visibility of the data
• Easy re-use and verification of the data sets
• Increasing quality of published papers
Collaboration – Research Data
16
DataCite Members
17
Example: EHEC virus
18
Example: EHEC virus
19
DOI Services
• Contracts with 60 data centres
• Research Institutes
• Universities
• Libraries
• Publisher
• 776.454 DOI registrations
• 22.533 up to September 2013
20
Research data – Further developments
• KomFor
• Centre of Expertise for Research Data from the „Earth and
Environment“ project
• RADAR
• RADAR - Research Data Repositorium
• Visual Analysis
• VisInfo Methods
21
Zeit [h] T [°C] 1 12 2 13 3 12 4 12 5 13 6 35 7 17 8 11 9 10
10 12 11 13 12 13 13 12 14 12 15 12 16 11 17 11 18 10 19 10 20 11 21 11 22 10 23 12 24 12
Numerical data
22
Visual access to research data
23
• Infrastructure for research data
• Visual search tools for AV-media
• 3D-Objects
• chemOCR
How have we been preparing ?
24
TIB‘s portal for audiovisual media
Project Development of a portal for audiovisual media
Aim Improve access to AV-Media
Time July 2011 – December 2013
Partner Hasso-Plattner Institut for Softwaresystemtechnology GmbH
25
How do I find what I‘m looking for in videos?
Today: Manual annotation of the whole video
TIB‘s portal for audiovisual media
Metadata
• Titel
• Author
• Description
• Publisher
• Publication year
• Rightsholder
• …..
26
source: Scorupka, Sascha, Experiment der Woche, 2011
Future: Manual Annotation plus content-based information
1. Speech
2. Visual features
e.g. Indoor, Experiment, Technology
4. Structural Information
Scenes, Shots, Segments
3. Textual information Leibniz University Hannover
TIB‘s portal for audiovisual media
27
TIB‘s portal for audiovisual media
Media analysis process
Upload
28
TIB‘s portal for audiovisual media
Scene recognition
Hard cut
Kopf, S. Computergestützte Inhaltsanalyse von digitalen Videoarchiven, Mannheim. 2006
Automatic cut detection
→ luminance / contrast
→ colour distribution / colour
histogramm
→ edges
29
TIB‘s portal for audiovisual media
Harald Sack, Hasso-Plattner-Institute for IT-Systems Engineering
this work is copy right ed nine teen thirty six
Automatic speech recognition
Quality of results is dependent upon
• quality of the speaker
• dialects
• background noises
• voice overlaps
30
TIB‘s portal for audiovisual media
Intelligent Character Recognition
Intelligent Character Recognition
(ICR)
• Character/Logo Detection
• Character Filtering
• Character Recognition
31
Method of analysis
Image recognition
Interview, experiment,
animation, lecture
Extracted data is
converted into text
TIB‘s portal for audiovisual media
Automated analysis: Image recognition
32
Visual Concepts
Graphical : Animation
Graphical : Drawing
Graphical : Diagram
Real : Outdoor
Real : Indoor
Real : Lecture /
Conference
Real : Interview
Real : Buildings ...
TIB‘s portal for audiovisual media
Machine learning
using visual features Keyframes Annotation
33
TIB‘s portal for audiovisual media
34
• Infrastructure for research data
• Visual search tools for AV-media
• 3D Objects
• chemOCR
How have we been preparing?
35 35
3D Objects – an excursion to Architecture
36
content based indexing
visual search
Visual search tools
37
segmentation with
form-primitives
extraction of
room connectivity
graphs
Content based indexing
38
3D sketch attributed graph
result visualization
Visual search
39
Further developments
40
• Infrastructure for research data
• Visual search tools for AV-media
• 3D Objects
• chemOCR
How have we been preparing ?
41
Search for chemical structures – how?
?
Chemists are used to drawing
Information retrieval in Chemistry
42
Table with reaction scheme
2a-i: Derivates from the reaction
Chemical structure
Reaction scheme
Chemical Names
Linked entities from the table
Textual and non-textual chemical information
43
image data chemical structure data
CLiDE chemOCR
Non-textual data processing – chemOCR
44
Information retrieval in chemistry Text AND formulas
45
Further subjects
• Open Science Lab
• Ontology
46
Dissemination of scientific and technical information has been a
foundational mission.
The methods have completely changed, but the mission
remains the same.
Conclusion
47
Ultimate Goal:
Interlinking and Search Across All
Types of Digital Assets.
Conclusion
48
GetInfo – Portal for Science and Technology
• 58 m metadata in internal index
• 390 m metadata in external sources
• 900.000 pdf fulltexts
• Data, AV-Media, 3D Objects
49
Development of media-specific portals
BEREITSTELLU
NG
Probado 3D Portal for audiovisual Media
50
Questions?