10
Integrating Disparate Data May 27, 2010 Steve Newman – CTO/Gist.com

Glue Conference

  • Upload
    assist

  • View
    562

  • Download
    2

Embed Size (px)

Citation preview

Page 1: Glue Conference

Integrating Disparate Data May 27, 2010

Steve Newman – CTO/Gist.com

Page 2: Glue Conference

the WHY? What we believe in…

• All your important people already reside in email, calendar, contact lists, social sites

• The web is a rich source of information about the people you care about

• One tool should exist that can pull all this together in a single, rich, integrated experience

Page 3: Glue Conference

3

Pain Points (External)

• Disparate Data/API sources and protocols– e.g. GNIP

• Change notification (when/what)– e.g. Linked Open Data Dataset Dynamics,

pubsubhub• Standard entity data structures

– e.g. Portable Contacts, vcard, hcard

Page 4: Glue Conference

The Problem (Internal)

• Need a single, disambiguated set of entities where an entity itself contains accurate/disambiguated attributes

• Entity attributes can be sourced from one or more endpoints– Email– Twitter/Facebook – Calendar– Google Contacts, Outlook Contacts, Plaxo– Google Social Graph API– Rapleaf API

Page 5: Glue Conference

The Problem (Internal)

• Now that we have this data, we need to process and make sense of it– Need to support reoccurring updates– Merge and unmerge support– Recursive derivation is a huge win if done

correctly• Historical Tracking is necessary both to

drive operations but also for debugging (and it’s a cool user feature)

Page 6: Glue Conference

6

How we did it

• Enhancers– Execute the request and creation of attribute

data– Can be called synch or asynch– Cached, Logged, Rate Limited

• Meta data about attributes– Source, Source Type, When created, Derived?,

Derived Source, Score• Rules for ‘enhancement’

– Rules for recursion– Scoring methodology (accuracy and relative

prioritization)

Page 7: Glue Conference

Example – Email Enhancer

Data/Time Score State Value

“Brad Feld” vs “Brad”

Page 8: Glue Conference

Key Takeaways

• Worry about integration both external and internal to your application

• Lots of good work on the external issues…take advantage of it!

• Create a strong object model for internal data representation (workers, meta data, engines) so you can perform concise/discrete operations

Page 9: Glue Conference

Additional Info

• GIST API coming out this Summer• Direct interface to Fragments • Standard and Third party Enhancer

support

@stevepnewman, @gist

Page 10: Glue Conference

« We know now that the source of wealth is something specifically human : knowledge. Applied to tasks that we already know how to do, it becomes 'productivity'. Applied to tasks that are new and different we call it 'innovation'. Only knowledge allows us to achieve these two goals. » Peter Drucker

Management challenges of the XXIst Century-1999