View
212
Download
0
Embed Size (px)
Citation preview
04/18/23 20:57 1
CSE 574 Extracting, Managing & Personalizing Web Information
• Staffing– Dan Weld– Raphael Hoffmann
• Content – Intersection of AI, ML, DB & HCI
• Student Responsibilities– Reading, Reports, Discussion– Project (for those taking 3 credits)
Why Information Extraction• Next-Generation Search
– Citeseer, Google scholar, MSRA Libra– Google product search– Flipdog– Zvents– Zoominfo
• Question Answering
04/18/23 20:57 3
Making Structured Content • Information Extraction
– E.g. Google Scholar– Cons: Noisy
• Communal Content Creation– E.g. Wikipedia– Cons: Bootstrapping & Incentives
04/18/23 20:57 9
Why Managing ?• Select• Store, Index, Aggregate• Search, Query, Explore• Share, Collaborate, “Publish”
Example: Personalized Portalscf DBlife, Rexa, Dontcheva UIST-07
04/18/23 20:57 10
Preliminary Schedule• Information Extraction
– Traditional Machine Learning Approaches– Self-Supervised Methods– Other Issues: Coreference & Ontology
• Collaborative Content Creation & UI Issues– Applying Contraints from Interaction to Learning– Decision Theoretic Interaction– Faceted Interfaces
• Community Information Management – Extraction over Evolving Text– Data Provenance – Mashups & Personalized Web
• Next-Generation Search – Inference, Textual Entailment, Machine Reading – Entity Search
04/18/23 20:57 19
04/18/23 20:57 20
For next time• Read
– Agichtein, Gravano. Snowball: Extracting Relations from Large Plain-Text Collections.
• Add yourself to mailing list• Look at papers on website wiki
– Add new ones– Add summary (different from report)– Notate if you wish to present one
• Think about project / (form a group?)