Upload
truonglien
View
222
Download
1
Embed Size (px)
Citation preview
Craig Knoblock University of Southern California 1
Information AgentsInformation Agents
(Based on an invited talk (Based on an invited talk at IJCAIat IJCAI’’03)03)
Craig A. KnoblockCraig A. Knoblock
University of Southern CaliforniaUniversity of Southern California
Craig KnoblockCraig Knoblock University of Southern CaliforniaUniversity of Southern California 22
IntroductionIntroductionThe Web is a tremendous resource, but The Web is a tremendous resource, but designed for browsingdesigned for browsing•• Sites provide limited capabilities for Sites provide limited capabilities for
personalizationpersonalization•• Few sites are designed to be integrated with Few sites are designed to be integrated with
othersothers
Goal: Develop technology to rapidly Goal: Develop technology to rapidly construct personal software agentsconstruct personal software agents•• Build agents that can perform retrieval, Build agents that can perform retrieval,
integration, and monitoring tasks on any integration, and monitoring tasks on any online sourceonline source
Craig KnoblockCraig Knoblock University of Southern CaliforniaUniversity of Southern California 33
OutlineOutline
The Electric Elves: Information The Electric Elves: Information agents for monitoring travelagents for monitoring travelWrapping online sourcesWrapping online sourcesLinking records across sourcesLinking records across sourcesEfficiently executing agent plansEfficiently executing agent plansConclusionsConclusions
Craig KnoblockCraig Knoblock University of Southern CaliforniaUniversity of Southern California 44
OutlineOutline
The Electric Elves: Information The Electric Elves: Information agents for monitoring travelagents for monitoring travelWrapping online sourcesWrapping online sourcesLinking records across sourcesLinking records across sourcesEfficiently executing agent plansEfficiently executing agent plansConclusionsConclusions
Craig KnoblockCraig Knoblock University of Southern CaliforniaUniversity of Southern California 55
Electric Elves ProjectElectric Elves Project[[ChalupskyChalupsky et al, 2001]et al, 2001]
Elves project goal: Apply agent technology to Elves project goal: Apply agent technology to support human organizationssupport human organizations• Develop software agents that automate routine tasks • Enable software agents and humans to work together• Support coordination of tasks
Applications: Office Elves and Travel ElvesApplications: Office Elves and Travel Elves
W W WA g e n t P r o x i e sF o r P e o p l e
I n f o r m a t i o nA g e n t s
O n t o l o g y - b a s e dM a t c h m a k e r s
GRID
Craig KnoblockCraig Knoblock University of Southern CaliforniaUniversity of Southern California 66
Agents for Monitoring TravelAgents for Monitoring Travel[Ambite et al, 2002][Ambite et al, 2002]
•• Office Elves created as an application of the Office Elves created as an application of the Electric Elves Electric Elves
•• Given travel itinerary, generates set of agents for Given travel itinerary, generates set of agents for anticipating travelanticipating travel--related failures and related failures and opportunities:opportunities:•• Price changesPrice changes•• Schedule changesSchedule changes•• Flight delays & cancellationsFlight delays & cancellations•• Earlier and close connectionsEarlier and close connections•• Finding the closest restaurant given GPS coordinatesFinding the closest restaurant given GPS coordinates
Craig KnoblockCraig Knoblock University of Southern CaliforniaUniversity of Southern California 77
Travel AssistantTravel Assistant
Craig KnoblockCraig Knoblock University of Southern CaliforniaUniversity of Southern California 88
Monitoring Travel Plans Monitoring Travel Plans
Craig KnoblockCraig Knoblock University of Southern CaliforniaUniversity of Southern California 99
Agents Deployed to Agents Deployed to Monitor Travel ItineraryMonitor Travel Itinerary
TravelItinerary W W W
A g e n t P r o x i e sF o r P e o p l e
I n f o r m a t i o nA g e n t s
O n t o l o g y - b a s e dM a t c h m a k e r s
GRID
Flight Prices &Schedules
WeatherFlight Status
Restaurants
Craig KnoblockCraig Knoblock University of Southern CaliforniaUniversity of Southern California 1010
Monitoring AgentsMonitoring AgentsFlightFlight--Status Agent: Status Agent: •• Flight delayed message:Flight delayed message:
Your United Airlines flight 190 has been delayed. Your United Airlines flight 190 has been delayed.
It was originally scheduled to depart at 11:45 AM It was originally scheduled to depart at 11:45 AM and is now scheduled to depart at 12:30 PM. and is now scheduled to depart at 12:30 PM.
The new arrival time is 7:59 PM.The new arrival time is 7:59 PM.
•• Flight cancelled message:Flight cancelled message:Your Delta Air Lines flight 200 has been cancelled.Your Delta Air Lines flight 200 has been cancelled.
•• Fax to hotel message:Fax to hotel message:Attention: Registration Desk Attention: Registration Desk
I am sending this message on behalf of David I am sending this message on behalf of David PynadathPynadath, who has a reservation at your hotel. David , who has a reservation at your hotel. David PynadathPynadath is on United Airlines 190, which is now is on United Airlines 190, which is now scheduled to arrive at IAD at 7:59 PM. Since the scheduled to arrive at IAD at 7:59 PM. Since the flight will be arriving late, I would like to request flight will be arriving late, I would like to request that you indicate this in the reservation so that the that you indicate this in the reservation so that the room is not given away. room is not given away.
Craig KnoblockCraig Knoblock University of Southern CaliforniaUniversity of Southern California 1111
Monitoring AgentsMonitoring AgentsAirfare Agent: Airfare dropped messageAirfare Agent: Airfare dropped messageThe airfare for your American Airlines itineraryThe airfare for your American Airlines itinerary
(IAD (IAD -- LAX) dropped to $281.LAX) dropped to $281.
EarlierEarlier--Flight Agent: Earlier flights messageFlight Agent: Earlier flights messageThe status of your currently scheduled flight is:The status of your currently scheduled flight is:
# 190 LAX (11:45 AM) # 190 LAX (11:45 AM) -- IAD (7:29 PM) 45 minutes Late IAD (7:29 PM) 45 minutes Late
If you would like to return earlier, the followingIf you would like to return earlier, the following
United Airlines flights will arrive earlier than your United Airlines flights will arrive earlier than your
scheduled flights:scheduled flights:
# 946 LAX (8:31 AM) # 946 LAX (8:31 AM) -- IAD (3:35 PM) 11 minutes LateIAD (3:35 PM) 11 minutes Late
----------------
# 388 LAX (9:25 AM) # 388 LAX (9:25 AM) -- DEN (12:25 PM) 10 minutes Late DEN (12:25 PM) 10 minutes Late
# 1534 DEN (1:20 PM) # 1534 DEN (1:20 PM) -- IAD (6:06 PM) On TimeIAD (6:06 PM) On Time
Craig KnoblockCraig Knoblock University of Southern CaliforniaUniversity of Southern California 1212
250
750
1250
1750
2250
12/8/2002 12/13/2002 12/18/2002 12/23/2002 12/28/2002 1/2/2003 1/7/2003Date
Pric
e
American Airlines flights192 & 223, LAX-BOS, departing on Jan. 2 & 9
Learning to Make Predictions:Learning to Make Predictions:To Buy or Not To BuyTo Buy or Not To Buy
Agents can go beyond gathering and monitoring Agents can go beyond gathering and monitoring online sourcesonline sources•• They can help make decisions by exploiting the wealth They can help make decisions by exploiting the wealth
of online informationof online information
Craig KnoblockCraig Knoblock University of Southern CaliforniaUniversity of Southern California 1313
Learning to Make Predictions:Learning to Make Predictions:To Buy or Not To BuyTo Buy or Not To Buy
Agents can go beyond gathering and monitoring Agents can go beyond gathering and monitoring online sourcesonline sources•• They can help make decisions by exploiting the wealth They can help make decisions by exploiting the wealth
of online informationof online information
Developed a learning system, Hamlet, to predict Developed a learning system, Hamlet, to predict whether it is better to wait or buy whether it is better to wait or buy [[EtzioniEtzioni, Knoblock, , Knoblock, TuchindaTuchinda, Yates, KDD, Yates, KDD’’03]03]
Collected data on airline prices over several Collected data on airline prices over several monthsmonthsLearned a model of the pricingLearned a model of the pricingIn our simulation on collected data, Hamlet In our simulation on collected data, Hamlet saved $198,074 out of a possible $320,572 saved $198,074 out of a possible $320,572 (61.8% of optimal)(61.8% of optimal)
Craig KnoblockCraig Knoblock University of Southern CaliforniaUniversity of Southern California 1414
Related Agent SystemsRelated Agent Systems
Some notable deployed systemsSome notable deployed systems•• Internet Internet SoftbotSoftbot [[EtzioniEtzioni & Weld, & Weld, ’’94]94]•• BargainFinderBargainFinder [[KrulwichKrulwich, , ’’96]96]•• ShopBotShopBot [[PerkowitzPerkowitz et al. et al. ’’96]96]•• Warren [Decker et al., Warren [Decker et al., ’’97]97]•• Electric Elves [Electric Elves [ChalupskyChalupsky et al., et al., ’’01]01]•• and many othersand many others……
Craig KnoblockCraig Knoblock University of Southern CaliforniaUniversity of Southern California 1515
OutlineOutline
The Electric Elves: Information The Electric Elves: Information agents for monitoring travelagents for monitoring travelWrapping online sourcesWrapping online sourcesLinking records across sourcesLinking records across sourcesEfficiently executing agent plansEfficiently executing agent plansConclusionsConclusions
Craig KnoblockCraig Knoblock University of Southern CaliforniaUniversity of Southern California 1616
Wrappers for Live Access Wrappers for Live Access to Online Sourcesto Online Sources
HTML sources turned into agentHTML sources turned into agent--friendly friendly sourcessources
<YAHOO_WEATHER>- <ROW>
<TEMP>25</TEMP> <OUTLOOK>Sunny</OUTLOOK> <HI>32</HI> <LO>19</LO> <APPARTEMP>25</ APPARTEMP > <HUMIDITY>35%</HUMIDITY> <WIND>E/10 km/h</WIND> <VISIBILITY>20 km</VISIBILITY> <DEWPOINT>9</DEWPOINT> <BAROMETER>959 mb</BAROMETER> </ROW>
</YAHOO_WEATHER>
Wrapper
Craig KnoblockCraig Knoblock University of Southern CaliforniaUniversity of Southern California 1717
Extraction RulesExtraction RulesWrapper defined by a set of extraction Wrapper defined by a set of extraction rulesrulesExtraction rule: sequence of landmarksExtraction rule: sequence of landmarks•• Define both beginning and end of required Define both beginning and end of required
information on the pageinformation on the page
Name: Joel’s <p> Phone: <i> (310) 777-1111 </i><p> Review: …
SkipTo(Phone) SkipTo(<i>) SkipTo(</i>)
Craig KnoblockCraig Knoblock University of Southern CaliforniaUniversity of Southern California 1818
Learning the Extraction RulesLearning the Extraction Rules[[MusleaMuslea, Minton, & Knoblock, 01], Minton, & Knoblock, 01]
Hierarchical wrapper inductionHierarchical wrapper induction•• Decomposes a hard problem into several easier Decomposes a hard problem into several easier
onesones•• Extracts items independently of each otherExtracts items independently of each other
InductiveLearningSystem
ExtractionRules
EC Tree
Labeled Pages
GUI
InductiveLearningSystem
ExtractionRules
EC TreeEC Tree
Labeled Pages
GUI
Craig KnoblockCraig Knoblock University of Southern CaliforniaUniversity of Southern California 1919
Search Space: SkipTo( ( )
… SkipTo(Phone) SkipTo(:) SkipTo( ( ) ...
SkipTo( <b> ( ) ... SkipTo(Phone) SkipTo( ( ) ... SkipTo(:) SkipTo(()
Training Examples:Name: Del Taco <p> Phone (toll free) : <b> ( 800 ) 123-4567 </b><p>Cuisine ...
Name: Burger King <p> Phone : ( 310 ) 987-9876 <p> Cuisine: …
Example of Rule InductionExample of Rule Induction
Craig KnoblockCraig Knoblock University of Southern CaliforniaUniversity of Southern California 2020
Active Learning Active Learning
Problem: May require large number of Problem: May require large number of examples to achieve high accuracyexamples to achieve high accuracyExploit active learningExploit active learning•• System selects most informative examples to System selects most informative examples to
labellabel•• Want to achieve 100% accuracy with as few Want to achieve 100% accuracy with as few
examples as possibleexamples as possible
Craig KnoblockCraig Knoblock University of Southern CaliforniaUniversity of Southern California 2121
Name: Joel’s <p> Phone: (310) 777-1111 <p>Review: The chef…
SkipTo( Phone: )
Name: Kim’s <p> Phone: (213) 757-1111 <p>Review: Korean …
Name: Chez Jean <p> Phone: (310) 666-1111 <p> Review: …Name: Burger King <p> Phone:(818) 789-1211 <p> Review: ...Name: Café del Rey <p> Phone: (310) 111-1111 <p> Review: ...Name: KFC <p> Phone:<b> (800) 111-7171 </b> <p> Review:...
Unlabeled Examples
Training Examples
Which Example to Label NextWhich Example to Label Next
Craig KnoblockCraig Knoblock University of Southern CaliforniaUniversity of Southern California 2222
MultiMulti--view Learningview Learning[[MusleaMuslea, Minton, Knoblock , Minton, Knoblock ’’00]00]
Two ways to find start of the phone number:
Name: KFC <p> Phone: (310) 111-1111 <p> Review: Fried chicken …
SkipTo( Phone: ) BackTo( ( Nmb ) )
Craig KnoblockCraig Knoblock University of Southern CaliforniaUniversity of Southern California 2323
++-
-+-
RULE 1 RULE 2
Unlabeled data
+-
Labeled data
MultiMulti--view Learning: Coview Learning: Co--TestingTesting
Craig KnoblockCraig Knoblock University of Southern CaliforniaUniversity of Southern California 2424
Name: Joel’s <p> Phone: (310) 777-1111 <p>Review: ...SkipTo( Phone: )
Name: Kim’s <p> Phone: (213) 757-1111 <p>Review: ...
BackTo( (Nmb) )
CoCo--Testing for Wrapper InductionTesting for Wrapper Induction
Name: Chez Jean <p> Phone: (310) 666-1111 <p> Review: ...
Name: KFC <p> Phone:<b> (800) 111-7171 </b> <p> Review:...
Name: Burger King <p> Phone: (818) 789-1211 <p> Review: ...
Name: Café del Rey <p> Phone: (310) 111-1111 <p> Review: ...
Craig KnoblockCraig Knoblock University of Southern CaliforniaUniversity of Southern California 2525
SkipTo(Phone:) BackTo( (Nmb) )
… Phone: (800) 171-1771 <p> Fax: (111) 111-1111 <p> Review: …
… Phone:<i> (800) 555-5555 </i><p> Review: A century ago (1891) …
… Phone: (800) 171-1771 <p> Fax: (111) 111-1111 <p> Review: …
Not All Queries are Not All Queries are Equally InformativeEqually Informative
Craig KnoblockCraig Knoblock University of Southern CaliforniaUniversity of Southern California 2626
Learn Learn ““content descriptioncontent description”” for item to be extractedfor item to be extracted
•• Too general for extractionToo general for extraction( ( NmbNmb ) ) NmbNmb –– NmbNmb cancan’’t tell a t tell a phone numberphone number from a from a fax fax numbernumber
•• Useful at Useful at discriminatingdiscriminating among among query candidatesquery candidates
•• Learned content descriptionsLearned content descriptionsStarts with: Starts with: (( NmbNmb ))Ends with: Ends with: NmbNmb –– NmbNmbContains: Contains: NmbNmb PunctPunctLength: Length: [6,6][6,6]
Weak ViewsWeak Views[[MusleaMuslea, Minton, Knoblock , Minton, Knoblock ’’03]03]
Craig KnoblockCraig Knoblock University of Southern CaliforniaUniversity of Southern California 2727
NaNaïïve & Aggressive Cove & Aggressive Co--TestingTesting
NaNaïïve Cove Co--Testing:Testing:•• Query: randomly chosen contention pointQuery: randomly chosen contention point•• Output: rule with fewest mistakes on queriesOutput: rule with fewest mistakes on queries
Aggressive CoAggressive Co--Testing:Testing:•• Query: contention point that most violates Query: contention point that most violates
weak viewweak view•• Output: committee vote (2 rules + weak view)Output: committee vote (2 rules + weak view)
Craig KnoblockCraig Knoblock University of Southern CaliforniaUniversity of Southern California 2828
Results for Random SamplingResults for Random Sampling33 most difficult of the 140 tasks from [33 most difficult of the 140 tasks from [KushmerickKushmerick ’’97]97]
0
5
10
15
20
3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20+
Examples to 100% accuracy
Random samplingExtraction
Tasks
Craig KnoblockCraig Knoblock University of Southern CaliforniaUniversity of Southern California 2929
Results for Active LearningResults for Active Learning
0
5
10
15
20
3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20+
Examples to 100% accuracy
Naïve Co-Testing Random samplingExtraction
Tasks
Craig KnoblockCraig Knoblock University of Southern CaliforniaUniversity of Southern California 3030
Results for Active Learning Results for Active Learning with Weak Viewswith Weak Views
0
5
10
15
20
3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20+
Examples to 100% accuracy
Aggressive Co-Testing Naïve Co-Testing Random sampling
ExtractionTasks
Craig KnoblockCraig Knoblock University of Southern CaliforniaUniversity of Southern California 3131
OutlineOutline
The Electric Elves: Information The Electric Elves: Information agents for monitoring travelagents for monitoring travelWrapping online sourcesWrapping online sourcesLinking records across sourcesLinking records across sourcesEfficiently executing agent plansEfficiently executing agent plansConclusionsConclusions
Craig KnoblockCraig Knoblock University of Southern CaliforniaUniversity of Southern California 3232
Record Linkage Record Linkage (Object Consolidation)(Object Consolidation)
Name Street Phone
Art’s Deli 12224 Ventura Boulevard 818-756-4124
Teresa's 80 Montague St. 718-520-2910
Steakhouse The 128 Fremont St. 702-382-1600
Les Celebrites 155 W. 58th St. 212-484-5113
Name Street Phone
Art’s Delicatessen 12224 Ventura Blvd. 818/755-4100
Teresa's 103 1st Ave. between 6th and 7th Sts. 212/228-0604
Binion's Coffee Shop 128 Fremont St. 702/382-1600
Les Celebrites 160 Central Park S 212/484-5113
Zagat’s Restaurants Dept. of Health
Craig KnoblockCraig Knoblock University of Southern CaliforniaUniversity of Southern California 3333
Active Learning to Active Learning to Determine Matched RecordsDetermine Matched Records
[[TejadaTejada, Knoblock, Minton , Knoblock, Minton ’’01,01,’’02]02]
Learn importance of attributes for matching recordsLearn importance of attributes for matching records
Zagat’s
Dept of Health
Art’s Deli 12224 Ventura Boulevard 818-756-4124
Art’s Delicatessen 12224 Ventura Blvd. 818/755-4100
Name Street Phone
Mapping rules:
Name > .9 & Street > .87 => mapped
Name > .95 & Phone > .96 => mapped
Craig KnoblockCraig Knoblock University of Southern CaliforniaUniversity of Southern California 3434
Mapping Rule LearnerMapping Rule Learner
Set of MappedObjects
Choose initial examples
Generate committee of learners
Learn Rules
ClassifyExamples
Votes Votes Votes
Choose Example
USERLearn Rules
ClassifyExamples
Learn Rules
ClassifyExamples
Label
Label
Craig KnoblockCraig Knoblock University of Southern CaliforniaUniversity of Southern California 3535
Committee DisagreementCommittee DisagreementChooses an example based on the Chooses an example based on the disagreement of the query committeedisagreement of the query committee
CPK, California Pizza Kitchen is the most CPK, California Pizza Kitchen is the most informative exampleinformative example
Art’s Deli, Art’s DelicatessenCPK, California Pizza KitchenCa’Brea, La Brea Bakery
Yes Yes YesYes No Yes No No No
Examples M1 M2 M3Committee
Craig KnoblockCraig Knoblock University of Southern CaliforniaUniversity of Southern California 3636
Exploiting Secondary Exploiting Secondary Sources for Record LinkageSources for Record Linkage[[MichalowskiMichalowski, Thakkar, Knoblock , Thakkar, Knoblock ’’03]03]
Primary data source may be insufficient to Primary data source may be insufficient to determine mappingsdetermine mappingsSecondary sources can help reduce the Secondary sources can help reduce the uncertaintyuncertaintyExamples of secondary sourcesExamples of secondary sources•• GeocoderGeocoder
Maps street addresses into lat/long coordinatesMaps street addresses into lat/long coordinates•• Business directories Business directories
Provide company officers and locationsProvide company officers and locations•• Area code updatesArea code updates
Provide changes in area codes over timeProvide changes in area codes over time
Craig KnoblockCraig Knoblock University of Southern CaliforniaUniversity of Southern California 3737
Missing MatchesMissing Matches
Record Linkage
Matched Records
Craig KnoblockCraig Knoblock University of Southern CaliforniaUniversity of Southern California 3838
Exploiting a Exploiting a GeocoderGeocoder
Record LinkageMatched Records
Secondary Source
26 Beach Cafe26 Washington St.Venice, CA
26 Beach Cafe26 Washington Boulevard Marina Del Rey, Calif
Craig KnoblockCraig Knoblock University of Southern CaliforniaUniversity of Southern California 3939
Preliminary Results:Preliminary Results:Secondary SourcesSecondary Sources
Secondary source reduces the depth of the decision tree that needs to be learned
# Labeled Examples
Total Correct Matches
Precision RecallAverage DT Depth Precision Recall
Average DT Depth
25 109 51% 33% 5 66% 51% 135 109 73% 57% 8 81% 68% 350 109 83% 81% 10 85% 85% 3
Without Secondary Source With Secondary Source
Craig KnoblockCraig Knoblock University of Southern CaliforniaUniversity of Southern California 4040
OutlineOutline
The Electric Elves: Information The Electric Elves: Information agents for monitoring travelagents for monitoring travelWrapping online sourcesWrapping online sourcesLinking records across sourcesLinking records across sourcesEfficiently executing agent plansEfficiently executing agent plansConclusionsConclusions
Craig KnoblockCraig Knoblock University of Southern CaliforniaUniversity of Southern California 4141
Efficiently Executing Efficiently Executing Agent PlansAgent Plans
ProblemProblem•• Information gathering may involve accessing Information gathering may involve accessing
and integrating data from many sourcesand integrating data from many sources•• Total time to execute these plans may be large Total time to execute these plans may be large
Why?Why?•• Slow remote sourcesSlow remote sources•• Unpredictable network latenciesUnpredictable network latencies•• Binding patterns Binding patterns
Source cannot be queried until a previous query has Source cannot be queried until a previous query has been answeredbeen answered
•• Result: execution is often I/OResult: execution is often I/O--boundbound
Craig KnoblockCraig Knoblock University of Southern CaliforniaUniversity of Southern California 4242
Theseus Agent Execution SystemTheseus Agent Execution System[[BarishBarish & Knoblock, JAIR& Knoblock, JAIR’’05]05]
Plan languagePlan language and and execution systemexecution system for Webfor Web--based information integrationbased information integration•• Expressive enough for monitoring a variety of sourcesExpressive enough for monitoring a variety of sources•• Efficient enough for realEfficient enough for real--time monitoringtime monitoring
TheseusExecutor
PLAN myplan {INPUT: xOUTPUT: y
BODY {Op (x : y)
}}
010101010101100001110110101111010101010101
PlanInput Data
Craig KnoblockCraig Knoblock University of Southern CaliforniaUniversity of Southern California 4343
Streaming DataflowStreaming DataflowPlans consist of a network of operatorsPlans consist of a network of operators•• ExamplesExamples: : WrapperWrapper, , SelectSelect, etc., etc.•• Operators produce and consume dataOperators produce and consume data•• Operators Operators ““firefire”” upon any input dataupon any input data
Data passed as Data passed as tuplestuples of a relationof a relation
Wrapper
Select
Join
WrapperAddress
100 Main St., Santa Monica, 90292520 4th St. Santa Monica, 90292
2 Ocean Blvd, Venice, 90292
City State Max PriceSanta Monica CA 200000
Input relation Output relationPlan
Craig KnoblockCraig Knoblock University of Southern CaliforniaUniversity of Southern California 4444
MUL
MUL
ADD
a b c d
Parallelism in Streaming DataflowParallelism in Streaming DataflowDataflowDataflow•• Operations scheduled by data Operations scheduled by data
availabilityavailabilityIndependent operations execute in parallelIndependent operations execute in parallelMaximizes horizontal parallelismMaximizes horizontal parallelism
•• ExampleExample: computing : computing (a*b) + (c*d)
StreamingStreaming•• Operations emit data as soon as Operations emit data as soon as
possiblepossibleIndependent data processed in parallelIndependent data processed in parallelMaximizes vertical parallelismMaximizes vertical parallelism
Producer
Consumer
MUL MUL
ADD
a b c d
Craig KnoblockCraig Knoblock University of Southern CaliforniaUniversity of Southern California 4545
CarInfoCarInfo AgentAgent
Agent for recommending used cars:Agent for recommending used cars:•• Combine information fromCombine information from
Prices of used carsPrices of used carsSafety ratingsSafety ratingsReviews Reviews
•• Example:Example:2002 Midsize coupe/hatchback2002 Midsize coupe/hatchback$4K$4K--$12K, $12K, No No OldsmobilesOldsmobiles
Craig KnoblockCraig Knoblock University of Southern CaliforniaUniversity of Southern California 4646
The The CarInfoCarInfo agentagent1. Locate cars that
meet criteria- Edmunds.com
Craig KnoblockCraig Knoblock University of Southern CaliforniaUniversity of Southern California 4747
The The CarInfoCarInfo agentagent1. Locate cars that
meet criteria- Edmunds.com
2. Filter outOldsmobiles
Craig KnoblockCraig Knoblock University of Southern CaliforniaUniversity of Southern California 4848
The The CarInfoCarInfo agentagent1. Locate cars that
meet criteria- Edmunds.com
2. Filter outOldsmobiles
3. Gather safetyreviews for each
- NHSTA.gov
Craig KnoblockCraig Knoblock University of Southern CaliforniaUniversity of Southern California 4949
The The CarInfoCarInfo agentagent1. Locate cars that
meet criteria- Edmunds.com
2. Filter outOldsmobiles
3. Gather safetyreviews for each
- NHSTA.gov
4. Gather detailedreviews of each
- ConsumerGuide.com
Craig KnoblockCraig Knoblock University of Southern CaliforniaUniversity of Southern California 5050
ConsumerGuideConsumerGuide NavigationNavigationRequires navigating through multiple pagesRequires navigating through multiple pages
Craig KnoblockCraig Knoblock University of Southern CaliforniaUniversity of Southern California 5151
DataflowDataflow--style style CarInfoCarInfo agent planagent plan
WRAPPERConsumerGuide
Search
(Midsize coupe/hatchback,$4000 to $12000,2002)
((http://cg.com/summ/20812.htm),other summary review URLs)
((http://cg.com/full/20812.htm),other full review URLs)
search criteria
WRAPPERConsumerGuide
Summary
WRAPPERConsumerGuide
Full Review
(car reviews)WRAPPER
EdmundsSearch
((Oldsmobile Alero), (Dodge Stratus), (Pontiac Grand Am), (Mercury Cougar))
JOIN
SELECTmaker !=
"Oldsmobile"
WRAPPERNHTSASearch
(safety reports)
((Dodge Stratus), (Pontiac Grand Am), (Mercury Cougar))
Craig KnoblockCraig Knoblock University of Southern CaliforniaUniversity of Southern California 5252
Speculative ExecutionSpeculative Execution[[BarishBarish & Knoblock & Knoblock ’’02, 02, ’’03]03]
Basic ideaBasic idea•• Exploit idle resources to execute future Exploit idle resources to execute future
instructions in advance of when they are instructions in advance of when they are normally issuednormally issued
ChallengesChallenges•• How to augment plans for speculationHow to augment plans for speculation•• How to ensure correctness and fairnessHow to ensure correctness and fairness•• How to decide what to speculate onHow to decide what to speculate on
Craig KnoblockCraig Knoblock University of Southern CaliforniaUniversity of Southern California 5353
How to speculate?How to speculate?General problemGeneral problem•• Means for issuing and confirming predictionsMeans for issuing and confirming predictions
Two new operatorsTwo new operators•• SpeculateSpeculate: Makes predictions based on "hints": Makes predictions based on "hints"
•• ConfirmConfirm: Prevents errant results from exiting plan: Prevents errant results from exiting plan
Speculateanswers
hints
confirmations
predictions/additions
Confirm confirmations
probable resultsactual results
Craig KnoblockCraig Knoblock University of Southern CaliforniaUniversity of Southern California 5454
JSW
WWW
W
BEFORE
Example: Example: CarInfoCarInfo•• Predict cars based on search criteriaPredict cars based on search criteria•• Makes practical sense: Makes practical sense:
Same criteria yields same carsSame criteria yields same cars
How to speculate?How to speculate?
Craig KnoblockCraig Knoblock University of Southern CaliforniaUniversity of Southern California 5555
AFTER
How to speculate?How to speculate?Example: Example: CarInfoCarInfo•• Predict cars based on search criteriaPredict cars based on search criteria•• Makes practical sense: Makes practical sense:
Same criteria yields same carsSame criteria yields same cars
JSW
WSpeculate
hintspredictions/additions
confirmationsanswers
W
Confirm
W
W
Craig KnoblockCraig Knoblock University of Southern CaliforniaUniversity of Southern California 5656
Detailed exampleDetailed example
JSW
WSpeculate
W
Confirm
W
W
2002Midsize coupe$4000-$12000
Time = 0.0 sec
Craig KnoblockCraig Knoblock University of Southern CaliforniaUniversity of Southern California 5757
Issuing predictionsIssuing predictions
JSW
WSpeculate
W
Confirm
W
W
Oldsmobile Alero T1Dodge Stratus T2Pontiac Grand Am T3Mercury Cougar T4
Time = 0.1 sec
Craig KnoblockCraig Knoblock University of Southern CaliforniaUniversity of Southern California 5858
Speculative parallelismSpeculative parallelism
JSW
WSpeculate
W
Confirm
W
W
Dodge Stratus T2Pontiac Grand Am T3Mercury Cougar T4
Time = 0.2 sec
Craig KnoblockCraig Knoblock University of Southern CaliforniaUniversity of Southern California 5959
Answers to hintsAnswers to hints
JSW
WSpeculate
W
Confirm
W
W
Oldsmobile AleroDodge StratusPontiac Grand AmMercury Cougar Time = 1.0 sec
Craig KnoblockCraig Knoblock University of Southern CaliforniaUniversity of Southern California 6060
Continued processingContinued processing
JSW
WSpeculate
W
Confirm
W
W
T1T2T3T4
Time = 1.1 sec
Additions (corrections), if any
Craig KnoblockCraig Knoblock University of Southern CaliforniaUniversity of Southern California 6161
Generation of final resultsGeneration of final results
JSW
WSpeculate
W
Confirm
W
W
Dodge Stratus (safety) (review) T2Pontiac Grand Am (safety) (review) T3Mercury Cougar (safety) (review) T4
Time = 3.2 sec
Craig KnoblockCraig Knoblock University of Southern CaliforniaUniversity of Southern California 6262
Confirmation of resultsConfirmation of results
JSW
WSpeculate
W
Confirm
W
W
Dodge Stratus (safety) (review)Pontiac Grand Am (safety) (review)Mercury Cougar (safety) (review)
Time = 3.3 sec
Craig KnoblockCraig Knoblock University of Southern CaliforniaUniversity of Southern California 6363
Safety and fairnessSafety and fairnessSafetySafety•• ConfirmConfirm blocks predictions (and results of) from blocks predictions (and results of) from
exiting plan before verificationexiting plan before verification
FairnessFairness•• CPUCPU
Speculative operations use "speculative threads"Speculative operations use "speculative threads"•• Lower priority threadsLower priority threads
•• Memory and bandwidthMemory and bandwidthSpeculative operations allocate "speculative resources"Speculative operations allocate "speculative resources"
•• Drawn from "speculative pool" of memory / objectsDrawn from "speculative pool" of memory / objects
Craig KnoblockCraig Knoblock University of Southern CaliforniaUniversity of Southern California 6464
Cascading SpeculationCascading SpeculationUse predicted cars to speculate about the Use predicted cars to speculate about the ConsumerGuideConsumerGuide summary and full URLssummary and full URLs
Optimistic performanceOptimistic performance•• Execution time:Execution time: maxmax {{1.2, 1.4, 1.5, 1.61.2, 1.4, 1.5, 1.6}} == 1.6 sec1.6 sec•• Speedup over streaming dataflow:Speedup over streaming dataflow: (4.2/1.6)(4.2/1.6) == 2.632.63
W
J
SW
W
SPECCONF
SPEC
W
WSPEC
Craig KnoblockCraig Knoblock University of Southern CaliforniaUniversity of Southern California 6565
Automatic plan transformationAutomatic plan transformationAgent plans are automatically modified for Agent plans are automatically modified for speculative executionspeculative execution•• Successive runs of the plan benefit Successive runs of the plan benefit
Even with different input dataEven with different input data
Leverage Amdahl's Law: Leverage Amdahl's Law: •• Consider optimizing only the most expensive Consider optimizing only the most expensive
path (path (MEPMEP))
Algorithm continually refines MEP Algorithm continually refines MEP •• Until overhead of further optimization Until overhead of further optimization
outweighs benefitsoutweighs benefits
Craig KnoblockCraig Knoblock University of Southern CaliforniaUniversity of Southern California 6666
Learning for Speculative ExecutionLearning for Speculative ExecutionCachingCaching•• Associate a hint with a predicted valueAssociate a hint with a predicted value
2002 Midsize coupe 4K2002 Midsize coupe 4K--12K12KOlds Olds AleroAlero, Dodge Stratus, Pontiac Grand Am, Mercury Cougar, Dodge Stratus, Pontiac Grand Am, Mercury Cougar
ClassificationClassification•• Use features of a hint to predict valueUse features of a hint to predict value
EXAMPLEEXAMPLE:: Predicting car list from EdmundsPredicting car list from Edmunds
type = SUV : (Nissan Pathfinder, Ford Explorer)type = Midsize : :...min <= 10000 : (Olds Alero, Dodge Stratus)
min > 10000 : (Honda Accord, Toyota Camry)
Cache
Decision list
Year Type Min Max Car list2002 Midsize 8000 15000 (Oldmobile Alero, Dodge Stratus)2002 Midsize 7500 14500 (Oldmobile Alero, Dodge Stratus)2002 SUV 14000 20000 (Nissan Pathfinder, Ford Explorer)2001 Midsize 11000 18000 (Honda Accord, Toyota Camry)2002 SUV 18000 22000 (Nissan Pathfinder, Ford Explorer)
Craig KnoblockCraig Knoblock University of Southern CaliforniaUniversity of Southern California 6767
1"http://cg.com/summary/" :
ε : COPY
3"." :
2ε : COPY
Learning for Speculative ExecutionLearning for Speculative ExecutionTransductionTransduction•• Transducers are FSM that translate hints into predictionsTransducers are FSM that translate hints into predictions
http://cg.com/summary/20812.htm
http://cg.com/full/20812.htm
To create full review URL:
1. Insert "http://cg.com/full/"
2. Extract & insert the dynamic part of the summary URL (e.g., 20812)
3. Insert ".htm"
Craig KnoblockCraig Knoblock University of Southern CaliforniaUniversity of Southern California 6868
Speculation Results: Last TupleSpeculation Results: Last Tuple
0
1000
2000
3000
4000
5000
6000
7000
8000
CarInfo RepInfo TheaterLoc FlightStatus StockInfo
Tim
e to
last
tupl
e (m
s)
No speculation50% correct100% correct
Craig KnoblockCraig Knoblock University of Southern CaliforniaUniversity of Southern California 6969
Related WorkRelated WorkWrapper learningWrapper learning•• Supervised [Supervised [KushmerickKushmerick ’’97, Hsu & Dung 97, Hsu & Dung ’’98]98]•• Unsupervised [Unsupervised [LermanLerman et al. et al. ’’01, 01, CrescenziCrescenzi ’’01]01]
Record linkageRecord linkage•• Learning [Cohen Learning [Cohen ’’00, 00, SarawagiSarawagi & & BhamidipatyBhamidipaty ’’02]02]•• Statistics [Winkler Statistics [Winkler ’’98]98]•• Name matching [Name matching [BilenkoBilenko et al. et al. ’’03, Cohen et al. 03, Cohen et al. ’’03]03]
Efficient plan executionEfficient plan execution•• Network query engines [Ives et al. 1999, Network query engines [Ives et al. 1999, NaughtonNaughton et et
al. 2000, al. 2000, HellersteinHellerstein et al. 2001]et al. 2001]•• Agent execution systems [Agent execution systems [FirbyFirby ’’94, Myers et al. 1996]94, Myers et al. 1996]
Craig KnoblockCraig Knoblock University of Southern CaliforniaUniversity of Southern California 7070
Outline of talkOutline of talk
The Electric Elves: Information The Electric Elves: Information agents for monitoring travelagents for monitoring travelWrapping online sourcesWrapping online sourcesLinking records across sourcesLinking records across sourcesEfficiently executing agent plansEfficiently executing agent plansConclusionsConclusions
Craig KnoblockCraig Knoblock University of Southern CaliforniaUniversity of Southern California 7171
ConclusionsConclusionsWeb provides the ideal environment for Web provides the ideal environment for developing and testing software agentsdeveloping and testing software agents•• Noted by Noted by EtzioniEtzioni, AAAI, AAAI’’96 in his talk on 96 in his talk on SoftbotsSoftbots
Yet few have seized this opportunityYet few have seized this opportunity……why?why?Like robotics, wide variety of hard technical Like robotics, wide variety of hard technical problemsproblemsWith Web Services, the Semantic Web, etc. the With Web Services, the Semantic Web, etc. the infrastructure is improvinginfrastructure is improvingGreat opportunity for AIGreat opportunity for AI•• Ability to demonstrate and test technologies in a realAbility to demonstrate and test technologies in a real--
world settingworld setting•• Opportunity to apply technologies to make a difference Opportunity to apply technologies to make a difference
in peoplein people’’s livess lives
Craig KnoblockCraig Knoblock University of Southern CaliforniaUniversity of Southern California 7272
Conclusions (cont.)Conclusions (cont.)Many interesting technical challenges for building Many interesting technical challenges for building software agents:software agents:•• Wrapping online sourcesWrapping online sources•• Linking records across sitesLinking records across sites•• Efficiently executing agent plansEfficiently executing agent plans•• Extraction from text documentsExtraction from text documents•• Aligning Aligning ontologiesontologies across sourcesacross sources•• Planning to integrate data sourcesPlanning to integrate data sources•• Learning to improve performance and capabilitiesLearning to improve performance and capabilities•• Integrating these capabilities in a robust architecture that Integrating these capabilities in a robust architecture that
can:can:Respond to failuresRespond to failuresExplain its behaviorExplain its behaviorCommunicate appropriatelyCommunicate appropriately
Craig KnoblockCraig Knoblock University of Southern CaliforniaUniversity of Southern California 7373
Want to Learn More?Want to Learn More?
I teach a course on Information I teach a course on Information Integration on the WebIntegration on the WebSyllabus, readings, and slides Syllabus, readings, and slides available from:available from:•• http://www.isi.edu/infohttp://www.isi.edu/info--
agents/courses/csci548/agents/courses/csci548/