View
217
Download
2
Tags:
Embed Size (px)
Citation preview
June 2008Approved for Public Release, Distribution Unlimited
Digital Object Storage and Retrieval
(DOSR)Vision
Digital Object Storage and Retrieval
(DOSR)Vision
Josh AlspectorJosh Alspector
04/18/23 Approved for Public Release, Distribution Unlimited
Disclaimer
This presentation discusses areas of technology investigation and interest. It does not relate to any existing DARPA program, nor should it be inferred to anticipate a future DARPA program.
04/18/23 Approved for Public Release, Distribution Unlimited
The Mundaneum
In 1910 Belgians Paul Otlet and future Nobel Peace Prize laureate Henri La Fontaine opened the Palais Mondial, later renamed the Mundaneum.The Mundaneum’s mission was to collect metadata on every book, journal, and periodical ever published and record it in a card file system that embodied what we would call a faceted classification scheme. By 1934 it contained over 15 million entries.Unique identifiers included embedded links to related documents.Staff responded to search requests received by post and telegraph and returned hand-copied cards by post.In 1934 Otlet conceived a global network of “electric telescopes” that would allow people to search and browse through interlinked documents, images, audio and motion picture recordings. He wrote that, “from his armchair, everyone will hear, see, participate, will even be able to applaud, give ovations, sing in the chorus, add his cries of participation to those of all the others.”
Fatal Flaw: Scalability
Documents, Images,
Recordings
“Hyper-linked” Card Catalog
Human Search Engine
Telegraph and postal “network”
“Social Network” Feedback
Mundaneum Infrastructure
04/18/23 Approved for Public Release, Distribution Unlimited
DOSR Vision
Create a resilient, distributed, scalable, and secure network of information that does not require a completely trusted or stable network of processing nodes [employ network overlays, and advanced cryptographic techniques]
Advance the state-of-the art in automated metadata generation and interoperability [apply machine learning techniques]
Automatically get information where it is needed, or may be needed, using less bandwidth and processing. [integrate user models, compact information retrieval encodings, and distributed content delivery]
Reliably track where information goes, and where it came from [encapsulate provenance and audit information in network-maintained virtual objects]
Enable secure, resilient information storage, characterization, retrieval, and collaboration across barriers of time, geography, community of interest, technology, and administrative domain
Text files
Images
Spreadsheets
Videos
Web pagesWeb pages
Automated Metadata Generation
User and Data Models
What we can find defines what we can doWhat we can find defines what we can do
Photos courtesy of U.S. Army, U.S. Navy
04/18/23 Approved for Public Release, Distribution Unlimited
Hard Problems
Automated metadata extraction and generation DoD has many stovepipe systems with limited metadata Automatic extraction of metadata, especially from non-textual information is an unsolved problem
requiring some form of artificial intelligence Email, papers, presentations, forms, databases do not possess a community-maintained mesh of
reciprocal references, so Google-like search, relevance, and ranking algorithms do not work
Scalable security for sharable objects Decentralized (for scalability) key distribution systems present security challenges Protection from known cryptographic and corruption attacks is hard; protection from unknown attacks
is harder Usable secure sharing (as convenient as email) is needed or system won’t be used Scalable, revocable group access to synchronized, encrypted, versioned documents is essential
Scalable replicated storage and parallel data distribution
Globally unique identifiers (GUIDs) for retrieval and update are essential, and must be unbreakable, verifiable, and afford scalable resolution of a retreivable, trackable object
How to track fragmented and replicated objects for persistence and provenance Object replication for secure, scalable, high-bandwidth distribution (secure BitTorrent-style) Enhance resiliency and service in network-poor, areas Respond adaptively to service degradation for high-demand data and large-scale disruptions
Personalization, intelligent agents and user models Intelligent agents needed to locate content near likely users, based on user models User models based on authorization, active input and passive tracking
04/18/23 Approved for Public Release, Distribution Unlimited
Key Capabilities
Architecture and protocols– Protocols for exchanging objects, metadata, and security controls– Mobile agents and federated requests for information
Persistence of digital objects– Distribute replicas and coded fragments– Global, persistent, verifiable, unique identifiers (GUIDs)– Version-controlled, collaborative updates
Trust, security and provenance– Authorized, authenticated access– Decentralized encryption for scalability– Verifiable provenance and tracking of all objects– Resilience to attacks
Scalability– “Scale-free” architecture– Decentralized, peer-to-peer techniques– Manage latency, consistency and security as scale grows
Metadata and search– Extract metadata from video, maps, images– Relevance feedback– Efficient federated search
Accessibility and User Models– User models include authorization, preferences, location, need-to-know– Content finds you without search– Information locally available is personally relevant
Object 1
Version 1
Replicas and fragments
Retrieve latest version from closest fragments or replica
Object 1
Version 2 updateDecentralized,
scalable key distribution
Scalable resources,
storage and participant networks
Needed objects migrate to local server for user
04/18/23 Approved for Public Release, Distribution Unlimited
Interesting Research Ongoing in…
Automated metadata extractionDecentralized, self-configuring, location and routingFederated searchInformation retrievalPersonalization and user modelsProxy re-encryptionScalable security and PKISearch over encrypted indexesSecuring resilient peer-to-peer networks
DOSR Workshop will address these areas
04/18/23 Approved for Public Release, Distribution Unlimited
Preliminary ScheduleJuly 15 Talks
8:30 am Opening remarks – DARPAArchitecture
8:45 am Dr. Robert Kahn - keynote address9:15 am Dr. Peter Lucas – MAYA9:35 am Dr. Daniel Crichton – NASA9:55 am Break
Metadata10:15 am Dr. Ajay Divakaran - Sarnoff Corp.10:35 am Dr. Randal Burns - JHU10:55 am Dr. Shmuel Peleg - HU-J11:15 am Mr. Jason Byassee - Northrop Grumman
Security11:35 am Dr. James Allan - U. Mass-Amherst11:55 am Dr. Rafail Ostrovsky – UCLA12:15 pm Lunch1:40 pm Dr. Urs Muller - Net-Scale Tech.2:00 pm Dr. Matt Staker - IBM Research2:20 pm Dr. Angelos Stavrou - Global InfoTek Inc.2:40 pm Break
User Models3:00 pm Dr. Peter Brusilovsky – U. Pittsburgh3:20 pm Dr. Michael Walfish - UT-Austin3:40 pm Dr. Rafael Alonso - SET Corp.4:00 pm Mr. Peter Haglich - Lockheed Martin
July 15 Posters4:20 pm Break4:40 pm Poster Session 15:20 pm Poster Session 26:00 pm Adjourn
July 16 Breakouts9:00 am Dr. Josh Alspector - DOSR
vision and breakout group instructions9:30 am Breakout group discussions
Noon Lunch
1:30 pm Brief out Group 12:00 pm Brief out Group 2
2:30 Break
2:50 Brief out Group 33:20 Brief out Group 4
3:45 Plenary Session4:15 Adjourn
04/18/23 Approved for Public Release, Distribution Unlimited
Levels of Success
DoD adopts system internallyPortions of system are made available for open-source uses by ApacheLegal, medical, and financial records management firms adopt GUID’s, protocols, and system componentsISPs and media companies adopt GUID’s, protocols, and system components for subscription services Amazon, Google and iTunes use GUID’s and protocols
04/18/23 Approved for Public Release, Distribution Unlimited
Prior Art
Coda (CMU)Cooperative File System (MIT)FARSITE (Microsoft)Grid (Argonne National Laboratory)Lustre (now owned by Sun Microsystems)OceanStore (UC Berkeley)PASIS (CMU)Universal Database (Maya Design)