Upload
assist
View
562
Download
2
Embed Size (px)
Citation preview
Integrating Disparate Data May 27, 2010
Steve Newman – CTO/Gist.com
the WHY? What we believe in…
• All your important people already reside in email, calendar, contact lists, social sites
• The web is a rich source of information about the people you care about
• One tool should exist that can pull all this together in a single, rich, integrated experience
3
Pain Points (External)
• Disparate Data/API sources and protocols– e.g. GNIP
• Change notification (when/what)– e.g. Linked Open Data Dataset Dynamics,
pubsubhub• Standard entity data structures
– e.g. Portable Contacts, vcard, hcard
The Problem (Internal)
• Need a single, disambiguated set of entities where an entity itself contains accurate/disambiguated attributes
• Entity attributes can be sourced from one or more endpoints– Email– Twitter/Facebook – Calendar– Google Contacts, Outlook Contacts, Plaxo– Google Social Graph API– Rapleaf API
The Problem (Internal)
• Now that we have this data, we need to process and make sense of it– Need to support reoccurring updates– Merge and unmerge support– Recursive derivation is a huge win if done
correctly• Historical Tracking is necessary both to
drive operations but also for debugging (and it’s a cool user feature)
6
How we did it
• Enhancers– Execute the request and creation of attribute
data– Can be called synch or asynch– Cached, Logged, Rate Limited
• Meta data about attributes– Source, Source Type, When created, Derived?,
Derived Source, Score• Rules for ‘enhancement’
– Rules for recursion– Scoring methodology (accuracy and relative
prioritization)
Example – Email Enhancer
Data/Time Score State Value
“Brad Feld” vs “Brad”
Key Takeaways
• Worry about integration both external and internal to your application
• Lots of good work on the external issues…take advantage of it!
• Create a strong object model for internal data representation (workers, meta data, engines) so you can perform concise/discrete operations
Additional Info
• GIST API coming out this Summer• Direct interface to Fragments • Standard and Third party Enhancer
support
@stevepnewman, @gist
« We know now that the source of wealth is something specifically human : knowledge. Applied to tasks that we already know how to do, it becomes 'productivity'. Applied to tasks that are new and different we call it 'innovation'. Only knowledge allows us to achieve these two goals. » Peter Drucker
Management challenges of the XXIst Century-1999