15
Combining Vision, Machine Learning and Natural Language Processing to Answer Everyday Questions http://www.qmscientific.com http://www.priceswarm.com Contact: Web Info: [email protected] [email protected] [email protected]

"Combining Vision, Machine Learning and Natural Language Processing to Answer Everyday Questions," a Presentation from QM Scientific

Embed Size (px)

Citation preview

Page 1: "Combining Vision, Machine Learning and Natural Language Processing to Answer Everyday Questions," a Presentation from QM Scientific

Combining Vision, Machine Learning and Natural Language Processing to Answer

Everyday Questions

http://www.qmscientific.com http://www.priceswarm.com

Contact:

Web Info:

[email protected] [email protected] [email protected]

Page 2: "Combining Vision, Machine Learning and Natural Language Processing to Answer Everyday Questions," a Presentation from QM Scientific

The Problem Consumers lack tools and data to answer everyday questions

simply, accurately and in real time

What is the best store to shop at

right now for my list?

Are there cheaper alternatives for products I buy

regularly?

How much do I spend on milk and coffee monthly?

Answers to these questions are buried in a hodgepodge of structured, unstructured, digital and physical data sources

Page 3: "Combining Vision, Machine Learning and Natural Language Processing to Answer Everyday Questions," a Presentation from QM Scientific

Solution: GPU Powered Quazi Platform

Web

In Store

POS

Receipts

Crowd Source

Quazi Platform

Natural Language

Processing

Computer Vision

Machine Learning

GPU

Quazi: combines proprietary Natural Language Processing, Computer Vision and Machine

Learning technology to extract, connect and organize millions of products, prices and consumer preferences from any data source

Page 4: "Combining Vision, Machine Learning and Natural Language Processing to Answer Everyday Questions," a Presentation from QM Scientific

Challenges Understanding Everyday Visual

Data Answering everyday shopping questions simply, accurately and in real time is not easy!

• Data heterogeneity

• Everyday data is unstructured

• Fast big data techniques are

needed to analyze and connect massive distributed sets

Page 5: "Combining Vision, Machine Learning and Natural Language Processing to Answer Everyday Questions," a Presentation from QM Scientific

GPU Accelerated Vision Technology Parallel processing capabilities of modern GPU clusters can enable new vision technologies to enable deeper understanding of everyday data.

Fast and robust information extraction. Making connections between distributed visual and

even non-visual data sources.

Conceptual clustering of objects allow for higher order understanding of a scene.

Construction of visual models from sparse training sets.

Page 6: "Combining Vision, Machine Learning and Natural Language Processing to Answer Everyday Questions," a Presentation from QM Scientific

Computer Vision for Retail: ReSight®

Translation Models

Local Analysis

ReSight exploits QUAZI machine learning models in conjunction with fast visual processing to make sense of retail based images (e.g. receipt data, product images…)

Image capture device

Retail visual data

Global Object

Identification

Text

Information

Image

Information

Entity Models

Conceptual Clustering

QUAZI

Visual Analytics

Page 7: "Combining Vision, Machine Learning and Natural Language Processing to Answer Everyday Questions," a Presentation from QM Scientific

Global Image Analysis

Processing requirements for feature extraction and model prediction at different scales is intensive! GPUs allow for massive parallelization and

simultaneous prediction.

Optimized data structure primitives support highly efficient on-device processing schemes.

0

5

10

15

20

9600 38400 87001 154401 348004

Time (ms)

Number of Pixels

Single GPU Timings for Integral Histogram

Computation

OPENCV GPU Module QMScientific Fast 3D Scan

Multiple weak-learner models are tuned to identify objects of interest in image.

Page 8: "Combining Vision, Machine Learning and Natural Language Processing to Answer Everyday Questions," a Presentation from QM Scientific

Global Object Identification

Ability to learn from very few training examples.

High degree of robustness to lighting, occlusion, and

orientation variations.

Can exploit contextual information that takes into account neighboring regions.

Object based segmentation

Receipt

Background

Robust real-time tracking

Fast graph-based methods provide optimal pixel clustering based on spatial contextual constraints and weak-learner responses.

Unique features:

Page 9: "Combining Vision, Machine Learning and Natural Language Processing to Answer Everyday Questions," a Presentation from QM Scientific

Adaptive Segmentation

Accelerated adaptive blind segmentation methods help identify regions of interest for further feature extraction and analysis. Object recognition engine extracts receipt

information from image

Intelligent segmentation determines regions of interest

Adaptive filtering used to cluster regions of

similarity

Adaptive filtering used to cluster regions of similarity

Fast connected component analysis (CCA) label connected homogenous clusters in a region

Page 10: "Combining Vision, Machine Learning and Natural Language Processing to Answer Everyday Questions," a Presentation from QM Scientific

Local Analysis

Local feature extraction immune to variances in scale and orientation enables better understanding of objects within a region of interest.

Advantages: • Concise representation of objects for fast and robust

classification.

• Effective classification with sparse training examples.

Page 11: "Combining Vision, Machine Learning and Natural Language Processing to Answer Everyday Questions," a Presentation from QM Scientific

fff

Quazi – Combining NLP and Vision

Abbreviation Patterns

Soundex Patterns

Edit Distance Contextual Features

ssl lng gr wht rce

Sunny Select

Long Long

Grams Grain

White White

Rice Rice

Sunny Select Long Grain White Rice

Brand Type Type Main Concept

Intelligent Similarity Search

Sunny Select Long Grain White Rice $3.99

Available @ 5 Lbs.

$0.80 / lb.

OCR engine combines NLP with computer vision and data mining to analyze, enhance and convert raw unstructured

text found in physical data into knowledge.

Page 12: "Combining Vision, Machine Learning and Natural Language Processing to Answer Everyday Questions," a Presentation from QM Scientific

MaraNatha Creamy

Almond Butter

Source: Website

Price: $8.99

Location: San Jose, CA

Chain: Target

MaraNatha Butter

Creamy Almond

Source: Website

Price: $6.99

Location: San Jose, CA

Chain: Walmart

Trader Joes Creamy

Almond Butter

Source: Blog

Price: ?

Location: ?

Chain: Trader Joe’s

Butter Almond

Creamy Unsalted

Source: Receipt

Price: $6.99

Location: San Jose, CA

Chain: Trader Joe’s

MaraNatha Natural

Almond Butter

Source: Webiste

Price: $9.79

Location: San Jose, CA

Chain: Costco

ALMOND

BUTTER

Creamy Crunchy

BRANDS DESCRIPTORS

CREAMY CRUNCHY CREAMY CRUNCHY

Combining Machine Learning + Vision

Page 13: "Combining Vision, Machine Learning and Natural Language Processing to Answer Everyday Questions," a Presentation from QM Scientific

Image Analysis with Fine Granularity

Cereal General Mills Kix

Cereal

Cereal

Background

Combination of object recognition technology, data, and conceptual clustering algorithms allows for deeper image analysis specifically for retail image data.

Page 14: "Combining Vision, Machine Learning and Natural Language Processing to Answer Everyday Questions," a Presentation from QM Scientific

PriceSwarm: Consumer Application

Page 15: "Combining Vision, Machine Learning and Natural Language Processing to Answer Everyday Questions," a Presentation from QM Scientific

Technical Team to Make it Happen

Dr. Hatim F. Alqadah, CTO and Lead Vision Scientist • PhD Electrical Engineering and M.S. Applied Mathematics • 8+ years research and development experience including

postdoctoral research at the Naval Research Laboratory, Physical Acoustics.

• Expertise in 3D sonar/electromagnetic image reconstruction, object recognition /tracking, and image processing.

• 13+ peer-reviewed publications.

Dr. Faris Alqadah, CEO and Lead Data Scientist • PhD Computer Science • Senior Data Scientist @PayPal, Postdoc fellow @Johns

Hopkins. • Expertise in machine learning + data mining. • 12+ peer-reviewed publications, 2 best paper nomination

+award winning PhD