Transcript
Page 1: Reifier Spark Summit 2014 Slides

© Nube Technologies

Fuzzy Matching With Spark

Page 2: Reifier Spark Summit 2014 Slides

© Nube Technologies

About Us

ALICE: This is impossible!

THE MAD HATTER: Only if you believe it is.

Page 3: Reifier Spark Summit 2014 Slides

© Nube Technologies

The problem

According to Gartner, businesses are losing upto 25% potential revenue due to lack of holistic multichannel view of data.

Page 4: Reifier Spark Summit 2014 Slides

© Nube Technologies

The problem

Page 5: Reifier Spark Summit 2014 Slides

© Nube Technologies

Challenges

● Quadratic nature of the problem● No standard notion of similarity● Omissions, typos and other issues

Page 6: Reifier Spark Summit 2014 Slides

© Nube Technologies

Use case - Cross and Upselling

Page 7: Reifier Spark Summit 2014 Slides

© Nube Technologies

Lead Generation

Page 8: Reifier Spark Summit 2014 Slides

© Nube Technologies

BFSI

Personal Credit RatingsFraud detection

Page 9: Reifier Spark Summit 2014 Slides

© Nube Technologies

Other Use Cases

Yellow PagesCatalog and Inventory Management

Page 10: Reifier Spark Summit 2014 Slides

© Nube Technologies

Wishlist

Works with any kind of dataScalableNo manual configuration of rules or algorithms

Page 11: Reifier Spark Summit 2014 Slides

© Nube Technologies

Spark Advantages

● Distributed● Scalable● In memory● Machine Learning● Sampling● No need to orchestrate multiple jobs

Page 12: Reifier Spark Summit 2014 Slides

© Nube Technologies

Reifier - Label

Are these duplicates?(Y/N)

Page 13: Reifier Spark Summit 2014 Slides

© Nube Technologies

Reifier Output

Page 14: Reifier Spark Summit 2014 Slides

© Nube Technologies

Thank You !