Upload
gada
View
40
Download
0
Tags:
Embed Size (px)
DESCRIPTION
Spatial Business Detection and Recognition from Images. Alexander Darino. Outline. Project Overview. Previous Work Project Objective Anticipated End Result Project Pipeline. Project Overview. Previous Work: Where Am I?. Image. Where Am I?. Latitude, Longitude. Project Objective. - PowerPoint PPT Presentation
Citation preview
Spatial Business Detection and Recognition from Images
Alexander Darino
Outline
• Project Overview
PROJECT OVERVIEW
Previous WorkProject ObjectiveAnticipated End ResultProject Pipeline
Previous Work: Where Am I?
Image Where Am I? Latitude, Longitude
Project Objective
• Given:– Image– Geolocation
• Yield:– Spatial Identification of Businesses in Image– Addresses of Businesses in Image– Information about Businesses in Image
• Ex. Reviews, Categories, Phone Number, etc.
Project Objective
• Given:– Image– Geolocation
• Yield:– Spatial Identification of Businesses in Image– Addresses of Businesses in Image– Information about Businesses in Image
• Ex. Reviews, Categories, Phone Number, etc.
7
Project Pipeline
LatitudeLongitude
Geocoding
ReverseGeocoding
Nearby Businesses
Image Text Extraction
Detected Text
Business Name
Matching
BusinessIdentification
Business Spatial
Detection
Anticipated End Result
BUSINESS SEARCHINGObtaining a List of Candidate Businesses in Image via
10
Business Searching
LatitudeLongitude
Geocoding
ReverseGeocoding
Nearby Businesses
Image Text Extraction
Detected Text
Business Name
Matching
BusinessIdentification
Business Spatial
Detection
Business Searching
• Business Search Services– Google– Yelp– CityGrid (Supplier for Yellow Pages, Super Pages)
• REST-based API• Results in JSON or XML format• Aggregate Results into Facade
{'businesses': [{'address1': '466 Haight St', 'address2': '', 'address3': '', 'avg_rating': 4.0, 'categories': [{'category_filter': 'danceclubs', 'name': 'Dance Clubs', 'search_url': 'http://yelp.com/search?find_loc=466+Haight+St%2C+San+Francisco%2C+CA&cflt=danceclubs'}, {'category_filter': 'lounges', 'name': 'Lounges', 'search_url': 'http://yelp.com/search?find_loc=466+Haight+St%2C+San+Francisco%2C+CA&cflt=lounges'}, {'category_filter': 'tradamerican', 'name': 'American (Traditional)', 'search_url': 'http://yelp.com/search?find_loc=466+Haight+St%2C+San+Francisco%2C+CA&cflt=tradamerican'}], 'city': 'San Francisco', 'distance': 1.8780401945114136, 'id': 'yyqwqfgn1ZmbQYNbl7s5sQ', 'is_closed': False, 'latitude': 37.772201000000003, 'longitude': -122.42992599999999, 'mobile_url': 'http://mobile.yelp.com/biz/yyqwqfgn1ZmbQYNbl7s5sQ', 'name': 'Nickies', 'nearby_url': 'http://yelp.com/search?find_loc=466+Haight+St%2C+San+Francisco%2C+CA', 'neighborhoods': [{'name': 'Hayes Valley', 'url': 'http://yelp.com/search?find_loc=Hayes+Valley%
Business Searching: Results40.441127247181797 -80.002821624487595Denham & Company SalonUllrich's Shoe RepairingNicholas Coffee CoBella Sera On the SquareA & J RibsStarbucks CoffeeJenny Lee BakeryGalardi's 30 Minute CleanersJimmy John's Gourmet SandwichesCharley's Grilled SubsFresh CornerLagondola Pizzeria & RestaurantCamera Repair Service IncPittsburgh Cigar BarOriginal Oyster HouseMixStirs1902 TavernCostanzo'sPittsburgh Silver LlcGraeme StGalardi's 30 Minute CleanersDenham & Co SalonBruegger's Bagel BakeryNicholas Coffee CoMarket SquareFat Tommy's PizzeriaMixstirs CafeGigglesRycon Construction IncGarbera, Dennis C, Dds - Emmert Dental AssocBella Sera on the SquareMancini's Bread CoLas VelasCiao BabyWashington Reprographics IncHighmark Life Insurance CoFischer, Donald R, Md - Highmark Life Insurance CoJimmy John'sLynx Energy Partners IncEmmert Dental Assoc
Business Searching: Evaluation• Strengths
– Aggregated results almost always found Business of interest • Weaknesses
– Each API limits query result set size - this is why we aggregate– Only businesses listed– Not all businesses listed
• Limitations
– Dependent on well-populated, accurate Business Directories– Have only tested for 15 Pittsburgh images - unknown result quality for
rural areas.
EXTRACTING IDENTIFYING TEXTObtaining names of Businesses in Image by
16
Extracting Identifying Text
LatitudeLongitude
Geocoding
ReverseGeocoding
Nearby Businesses
Image Text Extraction
Detected Text
Business Name
Matching
BusinessIdentification
Business Spatial
Detection
Extracting Identifying Text: OCR
• Used Two OCR APIs:– GNU OCR (Ocrad)– GOCR
• OCR APIs highly sensitive to:– Font (only works well with roman font)– Perspective– Scale– Binarization Threshold– Dark on Light vs. Light on Dark (inversion)
Extracting Identifying Text: OCR
• OCR API evaluations– Ocrad - could not yield any meaningful data across
over 200 scale/threshold/inversion combinations– GOCR - produced good results across 10 scales
with and without inversion using threshold automatically determined by Otsu's method
• 98% of Results are garbage!• Examples of GOCR output (next slides)
Extracting Identifying Text: OCR
Extracting Identifying Text: OCRn.c.......o.a...u..............oU..D.oa..e......_RuEGGE..KERy..J...w...........L........M.II.....c..
...i
.......l.
.J
.t...llt...lSHA.P.It..tllt.........._.l...Jy._.c_...._tt.._....t.._.r.........t.t_t.._.._.l..J.r.r.I.
Extracting Identifying Text: OCR
Extracting Identifying Text: OCRu..........._nq......eoR.E.l.e...í....e...n....n....n.e.R.E...e....o._....E.R.E.IKE........I.ltlO.........rE..o......E.....I.K.E.o.....
J.n....c...E.R.E.I.E.......M..E.R.E...E...aJ...Gu.ge..geE.F.._.....E..gE.D...fUlI..lll.lll.IIi.l..Xl..
Extracting Identifying Text: OCR
Extracting Identifying Text: OCR..e_..w.._......D.........uJ.....J.................n......n..........n_..r.l_d..J.ec.m._..n.......J.n.._...tn..ct..._.................D.u.v...e.n....u..
Y.._w.n.n....Jn.......G..o..r..._........J...ml.t..l.tt.l.._w....................._....l....t........j..ilI.i..
Extracting Identifying Text: OCR
Extracting Identifying Text: OCR__.ncu_.l..._..._J...ne......._n._..v.....ra......d_..._.............i..n..UllREsT.unAN...r.c.....r...Tt.rJll......m...c.....n.......
..
.Jn.I..c...r.rESTAU.ANT.r.O....c.cc.
Note: Even though "Tambellini" is a roman font, it is too stretched to be picked up by GOCR
OCR Evaluation
• Strengths– Applicable to expected input of orthogonal images
• Weaknesses– Only works well(-ish) for strictly roman font
• Limitations– Will perform poorly for artistic fonts and business signs
• Conclusion– By itself, OCR is not the best approach towards Business
identification • Reasons: poor recognition, franchises, perspective, etc
BUSINESS NAME MATCHINGMatching Identifying Text to Candidate Business Names via
29
Business Name Matching
LatitudeLongitude
Geocoding
ReverseGeocoding
Nearby Businesses
Image Text Extraction
Detected Text
Business Name
Matching
BusinessIdentification
Business Spatial
Detection
Business Name Matching
• Given: Unreliable fragments of ‘detected text’• Yield: Matching Business Names• Process:
– Filter input: trimming, uselessness (< 2 letters)– Fuzzy String Matching– Voting Scheme: confidence of business appearing
in image
31
Business Name Detection
32
Business Name Matching
• Developed Confidence Attribution Algorithm– Confidence of OCR Token being Name Token
• Example: Confidence of “ESTUANT” representing “RESTAURANT”
• Point-based system– Confidence of Name appearing in Image
• Sum of points of matching OCR Text• Use logarithmically-normalized points to determine
business inclusion threshold
33
Business Name Matching
34
35
Business Name Matching
36
37
Business Name Matching
38
Business Name Matching
39
Business Name Matching
Note: This originally did not appear because it did not exceed the confidence threshold. It now appears because it contributes to the Business Name Identification
SPATIAL BUSINESS IDENTIFICATIONIsolating Identified Images in Image via
41
Business Spatial Identification
LatitudeLongitude
Geocoding
ReverseGeocoding
Nearby Businesses
Image OCR Detected Text
Business Name
Matching
BusinessIdentification
Business Spatial
Detection
42
Business Spatial Identification
43
Business Spatial Identification
Aiken George S Co
Category:Food, GroceryAddress: 218 Forbes AvePittsburgh, PA 15222Phone: (412) 391-6358Rating: 4.5/5 (2 Reviews)
44
Business Spatial Identification
45
Business Spatial Identification
46
Business Spatial Identification
Bruegger's Bagels
Category:BagelsAddress: Market Sq
Pittsburgh, PA 15222Phone: (412) 281-2515Rating: Not Rated
V0.1: EVALUATION
48
Current Approach
LatitudeLongitude
Geocoding
ReverseGeocoding
Nearby Businesses
Image OCR Detected Text
Business Name
Matching
BusinessIdentification
Business Spatial
Detection
49
Weaknesses to Current Approach
LatitudeLongitude
Geocoding
ReverseGeocoding
Nearby Businesses
Image OCR Detected Text
Business Name
Matching
BusinessIdentification
Business Spatial
Detection
50
Weaknesses to Current Approach
Lots of Garbage
51
Weaknesses to Current Approach
Fragmented Word Detection
52
Weaknesses to Current ApproachFails with
non-orthogonal perspective
Did I already mention lots of
garbage?
53
Weaknesses to Current Approach
Fails withnon-roman text
Not scale-invariant
54
ALTERNATIVES TO OCR
55
Alternative #1: Image Matching
LatitudeLongitude
Geocoding
ReverseGeocoding
Nearby Businesses
Image
Match to Storefront
Image
BusinessIdentification
Business Spatial
Detection
56
Alternative #1: Image Matching
Alternative #1: Evaluation
• Weaknesses:– Low Availability of Storefront Images (< 50% Avg)
• George Aiken area businesses with photos: 18/35• Brueggers area businesses with photos: 22/40• Tambellini area businesses with photos: 8/22
– Available Images too small (100 x 100)– Computationally Expensive
• Conclusion: Not a viable solution
58
Alternative #2: Template Matching
• Tambellini• Tambellini• Tambellini• Tambellini• Tambellini• Tambellini• Tambellini• Tambellini
59
Alternative #2: Template Matching
LatitudeLongitude
Geocoding
ReverseGeocoding
Nearby Businesses
Image
Render Templates of Business Names in Different Fonts
Business SpatialDetection
Image Matching(eg. SIFT, HAAR)
Template Images
Business Identification
60
Alternative #2: Template Matching
OCR• Not Scale Invariant• Unbounded Search• Fragmented Recognition• Roman-only font
Alternative #2• Scale Invariant• Bounded Search• Whole-word recognition• All fonts
Subsequent Attempts
Alternative #3: Scene Text Recognition
• State of the Art:– STR ≠ OCR– Far superior to our ‘naïve’ approaches to STR (ie. OCR,
Image matching, SIFT)• OCR only works for highly controlled environments• STR works for unconditioned environments
– Scale invariant– Color/intensity invariant– Lexicon-Assisted
Alternative: Scene Text Recognition
• No STR implementations readily available• Have contacted several groups specialized in
STR – unable to assist us in providing implementation for research purposes
• Had to resort to implement STR from scratch
SCENE TEXT RECOGNITIONThe long and perilous journey of implementing
STR Implementation
• STR Implementation: “Automatic Detection and Recognition of Signs From Natural Scenes”
Multiresolution-based potential
characters detection
Character/layout geometry and color properties analysis
Local affine rectification
Refined Detection
MULTIRESOLUTION-BASED POTENTIAL CHARACTERS DETECTION
Candidate Text Detection via
STR Implementation
• STR Implementation: “Automatic Detection and Recognition of Signs From Natural Scenes”
Multiresolution-based potential
characters detection
Character/layout geometry and color properties analysis
Local affine rectification
Refined Detection
Multiresolution-based potential characters detection
• Laplacian-of-Guassian Edge Detection• Dice image/edges into Patches
– Combine patches with similar properties into regions
– Obtain bounding box of region as candidate text– Properties include:
• Mean• Variance• Intensity
Multiresolution-based potential characters detection
Multiresolution-based potential characters detectionPatches qualify if:
Multiresolution-based potential characters detection
Multiresolution-based potential characters detection
Multiresolution-based potential characters detection
Multiresolution-based potential characters detection
Problems with Current Approach
• Too much “bleeding”• Unstable edge-data due
to unpredictability of location of edge patch relative to edge itself
New Approach
• Each edge pixel gets an N x N edge patch (eg. 3x3)
• Edge patches overlap– Tighter boundary boxes– More region consistency– More robust to
resolution changes– Able to use tighter
thresholds
New Approach
New Approach
New Approach
New Approach
New Approach
New Approach
New Approach
New Approach
New Approach
New Approach
New Approach
New Approach
New Challenges!
Text Detection Problem #1
How do I know that two regions are close enough together that they might be part of the same character?• Center of bounding box?• Moment of regions?• Nearest Neighbor?• Connectedness?All have severe weaknesses
Text Detection Problem #2
How do I know that two characters are close enough to be considered a part of the same word?
Easier version of the last problem, but still hard!
CHARACTER/LAYOUT GEOMETRY AND COLOR PROPERTIES ANALYSIS
STR Implementation
• STR Implementation: “Automatic Detection and Recognition of Signs From Natural Scenes”
Multiresolution-based potential
characters detection
Character/layout geometry and color properties analysis
Local affine rectification
Refined Detection
Color Properties Analysis
• Implemented Gaussian Mixture Model (GMM) to obtain μ and σ of foreground/background for: R/G/B/H/I
• Calculated Confidences that component (RGBHI) can be used to recognize characters
Multiresolution-based potential
characters detection
Character/layout geometry and color properties analysis
Local affine rectification
Refined Detection
Color Properties Analysis
• Assumed Invariant: High contrast between foreground/background of characters in sign
• Choose the channel (R/G/B/H/I) that is best suited for use with character recognition
Original
Greenμ1=172.337447154472 μ2=255 𝐶𝑜𝑛𝑓𝑖𝑑𝑒𝑛𝑐𝑒=0.017056947503074 σ1=4.8463 σ2=0.2000
Blueμ1=122.673512195122 μ2=255 𝐶𝑜𝑛𝑓𝑖𝑑𝑒𝑛𝑐𝑒=0.021524159560500 σ1=6.1478 σ2=0.2000
Hueμ1=106.601736628811 μ2=0 𝐶𝑜𝑛𝑓𝑖𝑑𝑒𝑛𝑐𝑒=0.017897920959170 σ1=5.9561 σ2=0.2000
Intensityμ1=145.658856368567 μ2=255 𝐶𝑜𝑛𝑓𝑖𝑑𝑒𝑛𝑐𝑒=0.051403296762968 σ1=2.1271 σ2=0.2000
Mistake: This should only be done on individual characters, not words
Color Analysis: Evaluation
• Highest confidence observed to be channel best suited for OCR…
• …Did I just say OCR?
YES!(I did.)
OPTICAL CHARACTER RECOGNITIONA second shot at
OPTICAL CHARACTERRECOGNITION II
(and this time, it’s personal)
Refined Detection
• Generate alphabet templates in different fonts• Resize templates; Divide into grid• Apply several 2D Gabor filters to each grid patch
– Different orientations, frequencies, variances– For each pixel, yields real/imaginary component of
transformation• Feed data into Linear Discriminant Analysis
– Reduces features and forms classifier at same time
2D Gabor Filter
• Convolution of Gaussian x Sine wave
Live Demonstration
TrainingClassification
Thank You!