7
Query Segmentation and Structured Annotation via NLP Rifat Reza Joye Panagiotis Papadimitriou

Query Segmentation and Structured Annotation via NLP Rifat Reza Joye Panagiotis Papadimitriou

  • View
    212

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Query Segmentation and Structured Annotation via NLP Rifat Reza Joye Panagiotis Papadimitriou

Query Segmentation and Structured Annotation via NLP

Rifat Reza JoyePanagiotis Papadimitriou

Page 2: Query Segmentation and Structured Annotation via NLP Rifat Reza Joye Panagiotis Papadimitriou

Problem• Caloricious.com:

– Semantic search engine for food items• Free-text queries over structured data

– Query: gluten free high protein bars– Data: Each food item is database record with attributes name, brand,

category, nutrients, allergens, ..• Query segmentation and structured annotation

gluten free high protein bars

ALLERGEN NUTRIENT CATEGORY

Page 3: Query Segmentation and Structured Annotation via NLP Rifat Reza Joye Panagiotis Papadimitriou

1st ApproachMEMM with Synthetic Training Data

• Seems as instance of NER• Problem: No labeled queries to train MEMM• Solution: Generate synthetic labeled queries– Query study in 100 queries

• 96% queries contain 1–3 segments.• One of the segments in 98% queries refers to Name or

Category or Brand

– Algorithm• Pick a food item at random• Pick 1-3 attributes and generate a query

Page 4: Query Segmentation and Structured Annotation via NLP Rifat Reza Joye Panagiotis Papadimitriou

2nd ApproachSegmentation & MaxEnt Classification

Query Segmentation• Train language model on

structured data text• Use model to find segment

probabilities• Find the ML segmentation

through DP

Segment Annotation• Annotate each segment

with an attribute using MaxEnt classifier

• Training: For each attribute training examples come from the corresponding entries of database productsgluten free high protein bars

gluten free high protein bars

Page 5: Query Segmentation and Structured Annotation via NLP Rifat Reza Joye Panagiotis Papadimitriou

Results

1st Approach 2nd Approach (2-grams) 2nd Approach (3-grams)0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

Accuracy of Segment Classification

Page 6: Query Segmentation and Structured Annotation via NLP Rifat Reza Joye Panagiotis Papadimitriou

Conclusions – Future Work

• Combination of Language Model, Dynamic Programming and MaxEnt classification provides very good accuracy without labeled data

• It would be interesting to compare with NER on a big labeled set

• We also plan to compare with the state-of-the art algorithm in the context of a research submission.

Page 7: Query Segmentation and Structured Annotation via NLP Rifat Reza Joye Panagiotis Papadimitriou

More Results…

• Evangelos

• March 12, 2011 @ 9.14am

• 19.5 inches

• 6lbs 11oz