Upload
turi-inc
View
138
Download
0
Embed Size (px)
Citation preview
Dato Confidential
Fraud Detection Webinar
Alon PalomboData Scientist
Product Matching Webinar
Dato Confidential
Agenda• Who is Dato?• Data science workflow• What is product matching?• Demo using real public data• Questions
Dato Confidential
Dato: We Intelligent Applications
45+ and growing fast!
Dato Confidential
Customers
Dato Confidential
Data Science workflow
Ingest Transform
Model DeployUnstructured Data
Dato Confidential
What is product matching?• In 2016, global e-commerce sales are expected to
reach $1.92 Trillion.
• Online retailers and price comparison sites curate product catalogues by aggregating from multiple sources.
• Product matching is the task of keeping these catalogues free of duplicates, full of attributes per product, and consistent across different sites.
6
Dato Confidential
DifficultyStructured Attributes
Reviews
Images
Description
Thor, Andreas. "Toward an adaptive String Similarity Measure for Matching Product Offers." GI Jahrestagung (1). 2010.
{Aggregate MultipleSources
Dato Confidential
Definition• Ironically, there are similar names for very similar
problems:• Entity resolution• Record linking• De-duplication• Reference reconciliation• Data matching• and more…
Dato Confidential
Definition• In GraphLab Create we distinguish between Record
Linkage and De-duplication.
• Record Linkage refers to matching structured query records to a fixed set of reference records with the same schema.
• De-duplication refers to assigning an entity label to each row. Records with the same label are likely correspond to the same real-world entity.
Dato Confidential
Product matching demo – using real public data
Dato Confidential
Summary• Product matching is at the heart of e-commerce.• Many relevant similar problems with similar
solutions.• Easy exploration, modeling, and evaluation using
GraphLab Create.
Dato Confidential
Our machine learning course
https://www.coursera.org/learn/ml-foundations