View
212
Download
0
Embed Size (px)
Citation preview
Query Segmentation and Structured Annotation via NLP
Rifat Reza JoyePanagiotis Papadimitriou
Problem• Caloricious.com:
– Semantic search engine for food items• Free-text queries over structured data
– Query: gluten free high protein bars– Data: Each food item is database record with attributes name, brand,
category, nutrients, allergens, ..• Query segmentation and structured annotation
gluten free high protein bars
ALLERGEN NUTRIENT CATEGORY
1st ApproachMEMM with Synthetic Training Data
• Seems as instance of NER• Problem: No labeled queries to train MEMM• Solution: Generate synthetic labeled queries– Query study in 100 queries
• 96% queries contain 1–3 segments.• One of the segments in 98% queries refers to Name or
Category or Brand
– Algorithm• Pick a food item at random• Pick 1-3 attributes and generate a query
2nd ApproachSegmentation & MaxEnt Classification
Query Segmentation• Train language model on
structured data text• Use model to find segment
probabilities• Find the ML segmentation
through DP
Segment Annotation• Annotate each segment
with an attribute using MaxEnt classifier
• Training: For each attribute training examples come from the corresponding entries of database productsgluten free high protein bars
gluten free high protein bars
Results
1st Approach 2nd Approach (2-grams) 2nd Approach (3-grams)0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
Accuracy of Segment Classification
Conclusions – Future Work
• Combination of Language Model, Dynamic Programming and MaxEnt classification provides very good accuracy without labeled data
• It would be interesting to compare with NER on a big labeled set
• We also plan to compare with the state-of-the art algorithm in the context of a research submission.
More Results…
• Evangelos
• March 12, 2011 @ 9.14am
• 19.5 inches
• 6lbs 11oz