A Personal Database for Everything Inspired by Memex Gordon Bell, Jim Gemmell, Roger Lueder...
51
A Personal Database for Everything Inspired by Memex www.MyLifeBits.com Gordon Bell, Jim Gemmell, Roger Lueder Original slides: http://research.microsoft.com/en-us/um/people/gbell/Bell_MyLifeBits_Talk_SIGMOD_050614_web.ppt
A Personal Database for Everything Inspired by Memex Gordon Bell, Jim Gemmell, Roger Lueder Original slides:
A Personal Database for Everything Inspired by Memex
www.MyLifeBits.com www.MyLifeBits.com Gordon Bell, Jim Gemmell,
Roger Lueder Original slides:
http://research.microsoft.com/en-us/um/people/gbell/Bell_MyLifeBits_Talk_SIGMOD_050614_web.ppt
http://research.microsoft.com/en-us/um/people/gbell/Bell_MyLifeBits_Talk_SIGMOD_050614_web.ppt
Slide 3
Outline How has the project evolved? How do we use MyLifeBits?
How is it built? How large is the database? What is the vision?
What is left and how can you help?
Slide 4
I am data
Slide 5
Ambience and Presence: Being there while being here Dining at
home on the Orient Express
Slide 6
History: The remote worker re-discovers the PERSONAL
computer
Slide 7
Oct 1998 Can we scan your books and put them online? Raj Reddy
Sure! Dont worry about copyright stuff. Microsoft has lots of
lawyers
Slide 8
1999 Scanning starts in earnest we start to scan
Slide 9
My docs and archive Self.. Biographical X- Employer Employer
X-Employer Project Employer Library/file cab Active Employer
Library/file cab
Slide 10
Jim, I dont need no stinkin database! Gordon, You should be
using a database.
Slide 11
Now that its in Cyberspace How do you remember the 20,000+ file
names? Or in which of 1500 folders they live? Whats about a tool
for finding stuff?
Slide 12
Jan 2001 CACM A Personal Digital Store 16 GB; +2/yr A good
place to stop Began search for search engines, especially for
email. Jim suggests that we build a system that would be easier to
use and have many more capabilities.
Slide 13
Re-discovery of Memex As We May Think, Vannevar Bush, 1945 A
memex is a device in which an individual stores all his books,
records, and communications, and which is mechanized so that it may
be consulted with exceeding speed and flexibility Full-text search,
text & audio annotations, and hyperlinks
Slide 14
2001 Capture goes beyond paper
Slide 15
Even more capture Telephone calls, more video, all web pages
visited, usage logging, radio, TV
Slide 16
2003 - SenseCam
Slide 17
Steve Mann timeline
Slide 18
I sensed Clarkson and Pentland MIT 2001 Visually impaired UW
2004
Slide 19
MyLifeBits Software
Slide 20
Everything goes in a database MyLIfeBits need all the features
of a database (Consistency, Indexing, Pivoting, Queries,
Speed/scalability, Backup, Replication) If we didnt use one, well
eventually create one! Files as blobs; sync with file system for
legacy apps We are part of Jim Grays Bay Area Research Lab SQL
Slide 21
MyLifeBits Software MyLifeBits store database Voice annotation
tool Telephone capture tool TV capture tool TV EPG download tool
Radio capture & EPG PocketPC transfer tool PocketRadio player
Import files MyLifeBits Shell Browser tool Internet IM capture GPS
import & Map display SenseCam Screen saver Text annotation tool
MAPI interface Legacy email client Outlookinterface files Legacy
applications VIBElogging RoomCapture
Slide 22
MyLifeBits Schema (simplified) Images Music Phone calls Items
Links Link types Entity types Resource entities Event types Event
log Events Tasks People Notes Email Saved searches SenseCamData GPS
data Window, key, mouse log Web pages
Slide 23
Demo Clips & Screens
Slide 24
747 Screen
Slide 25
Vue de jour
Slide 26
Reports
Slide 27
Add item to collection(s)
Slide 28
Refine email shell
Slide 29
Refine email shell2
Slide 30
Pivoting: contact> call> t> web page
Slide 31
Refine by classification--dentist
Slide 32
GPS Photo location
Slide 33
SenseCam
Slide 34
Timeline
Slide 35
Google??
Slide 36
The Shape & Size of Gordons LifeBits
Slide 37
MyLifeBits 3/26/2005 206K items 101 GB by number of Items.
Slide 38
MyLifeBits 3/26/05 101 GB 206 K items By Size (GB) Bell Growth:
1GB/month =1 TB/lifetime Size (MB) by Type
Observations about use(rs) 1.On Apps: Search is the killer app
pretty much as Bush described. Screen savers memory refreshers also
provide ambience Where did my day to? 2.Users are unwilling to
spend time managing their computers or data. User-input meta-data
e.g. Dublin Core nave Librarians dream. Meta-data, classification,
etc. must be automatic Great scheme for classification using
facets. It requires work. 3.Time is the most important meta-data.
Photos: place (GPS), subject. 4.Folders are a good and bad idea.
Most users dont know what they are or how they work If used, over
time, they become useless: too many, miss-file, etc. 5.User should
put every information fragment into the system. e.g., to dos, call
backs, business cards numbers, attention events. It pays. 6.Same
information in multiple places always becomes obsolete.
Slide 43
Evolution: Silo Apps on isolated DB islands vs. Cut & Paste
across apps Contacts: email, instant messages, phone,
correspondence Family and organizational relationships Location of
people, organizations, etc. Photo database: who, where, when, what
Money payees, phone, etc. Health providers and caregivers User
written apps in excel or access
Slide 44
Common ground with WinFS: Items, Links & Meta-data
Annotates Caller in Phone Call Photo of Event
Slide 45
PhotoFinder - Shneiderman and Kang
Slide 46
Slide 47
Challenges
Slide 48
The dear appy problem Dear Appy, How committed are you? Please
come back to me. Forever yours truly, Lost and forgotten data Whos
responsible? Media or 8 track cassette, 8 floppy Evolving platform,
file, and database Evolving, incompatible standards & formats
for legacy data that disregard ancestors Evolving and/or
disappearing apps
Slide 49
Automatic classification problem XML on bills and imported
content transactions We need to download classifications rather
than build them Definitions & synonyms should help find what I
want Today it is too expensive to manually classify scanned paper.
E.g. right time meta-data is critical! We hope the system can
classify papers and other documents e.g. bills. Ideally, build
Dublin Core In 10 years we need all documents to appear
electronically & classified with a little help from me
Slide 50
More challenges Dear Appy: Monitoring and automatic migration
of files that are unlikely to be understood on future platforms as
well as platform migration. Get What I Need: Endless, but
evolutionary improvements in search: misspellings, stemming
synonyms Endless frontier of schema and extensions to them for new
applications e.g. making org charts, family relationships. Capture,
Archival and Retrieval of Personal Experiences (CARPE) a whole new
game! Versioning is essential Scaling.. We dont know what happens
at a Terabyte What can, should be, or will be in the cloud? Books
videos Will we be allowed to use such systems? Copyright laws vary:
E.g. ripping CDs, copy of anything, photos, conversations
Slide 51
Challenges Data-types Quantity expanding i.e. info explosion
New capabilities e.g. real time create new data-types Meta-data to
increase value & provide pivots Going beyond a PC to a
distributed environment Network environment, including media center
Into the cloud Periphery smart buildings, objects, Backup,
migration, and caching for beyond a Terabyte Expanding network: PC
> LANs > web > P2P Schema sharing among disparate systems
CARPE (real time data capture) Rooms, phone calls, SenseCam, Health
transducers, etc. Security, privacy, forgetfulness, deniability,
etc.