Click here to load reader

Search Engine Query Suggestion Application

Embed Size (px)

Citation preview

1. WebMining Projectwork How to suggest the query youd like to input after 2. The WebLog AnonID Query QueryTime ItemRank ClickURL 142rentdirect.com 01/03/2006 07:17 142www.prescriptionfortime.com 12/03/2006 12:31 142staple.com 17/03/2006 21:19 142staple.com 17/03/2006 21:19 142www.newyorklawyersite.com 18/03/2006 08:02 142www.newyorklawyersite.com 18/03/2006 08:03 142westchester.gov 20/03/2006 03:55 1 http://www.westchesterg ov.com 142space.comhttp 24/03/2006 20:51 The WebLog is AOL weblog made available to public in 2006 3. The goal Building a query suggestion application exploting the information observed on the AOL WebLog. Constrains: 1) the application relies on observed queries 2) The application needs to be fast! 4. The approach Exploiting the relation between typed queries and clicked URL by AOL users: If two queries share a lot or URLs then they are strongly related to each other 5. a lot of URLs. Several approaches can be followed for linking observed queries to clicked URLs Weve been inspired by Query-URL Bipartite Based Approach to Personalized Query Recommendation paper by Li, Yang, Liu, Kitsuregawa, Proceedings of the Twenty-Third AAAI Conference on Artificial Intelligence (2008) 6. Idea 1/2 Let q(i) be the i-th query and u(k) be the k-th clicked url after a query is typed A Bipartite Graph can be built such that for each q(i) belonging to the query set, a link to a subsequent clicked url u(k) can be defined 7. Idea 2/2 Once a Bipartite Graph has been built, a relation between any query belonging to the query set can be established accordingly to the clicked URLs. An Affinity Graph over the query set can be defined consequently, where the edges between two queries have to be weighted in order to exploit it in a suggestion task 8. Weighting the Edges , = = () () = () + = () Let q(i) be the i-th query and u(k) be the k-th clicked url after a query is typed w(i,j) is equal to 1 if once q(i) or q(j) are passed the same URLs are clicked w(i,j) is equal to 0 if once q(i) or q(j) are passed, all the clicked URLs dont match 9. Managing over-clicked URLs On the AOL 2006 WebLog dataset there exist a number of URLs which are over-clicked by users, independently of the query they type before clicking them. 0 20000 40000 60000 80000 100000 120000 140000 160000 180000 -foot-and-mouth- http://books.stores.ebay.ie http://dixonmayfair.com http://grounds-mag.com http://local.infospace.com http://p072.ezboard.com http://shop.treonauts.com http://vipcams.literotica.com http://www.acbarandgrill.com http://www.alyandaj.com http://www.assplundering.com http://www.beardieagilitydie http://www.bodo.com http://www.calnhs.org http://www.chantcd.com http://www.clubunlimited.com http://www.creativeforecasti http://www.dennys.com http://www.duplicolor.com http://www.esilvercart.com http://www.fitzandfloyd.com http://www.gamecubecheats http://www.grandmashandsb http://www.henrymedical.com http://www.i-m-t.demon.co.uk http://www.jacksonsoccer.com http://www.keyloggers.com http://www.leesburg2day.com http://www.madison.k12.ky.us http://www.mercy.net http://www.mp3sugar.com http://www.netads.com http://www.oceanviewinnan http://www.partsforlifts.com http://www.poetsgraves.co.uk http://www.radio-3.ru http://www.robotstorehk.com http://www.scotfest.com http://www.skinashoba.com http://www.starktaxes.com http://www.talktorusty.com http://www.theremyreport.c http://www.trollcarnival.com http://www.vcta.com http://www.welovedolls.com http://www.xandocosi.com URLs Click Count 10. Managing over-clicked URLs Those URLS generate a noise in the query recommendation algorithm. For this reason we selected only those URLs having less than 1,000 clicks 0 100 200 300 400 500 600 700 800 900 1000 -foot-and-mouth- http://blackdicksmovies.deluxep http://dallasnative.com http://freescreensaver.ezthemes http://jingdong.en.alibaba.com http://mtv-spring- http://pub25.bravenet.com http://store.vegas.com http://westsideconnection.org http://www.acsu.buffalo.edu http://www.amarula.com http://www.asht.org http://www.bathandmore.com http://www.blackmanlaw.com http://www.buerge.com http://www.caswells.com http://www.chsb.org http://www.colts.com http://www.ctahperd.org http://www.dewattoport.com http://www.dvdworldonline.com http://www.ericdaugherty.com http://www.findlayfpc.org http://www.frugalhaus.com http://www.gniarmls.com http://www.hankingroup.com http://www.homerwood.com http://www.incomemax.com http://www.jesusandkidz.com http://www.kinray.com http://www.lemassif.com http://www.machinetools.net.tw http://www.medrekforum.com http://www.montgomerycollege. http://www.natalbelo.com http://www.northlouisianaskydiv http://www.orientvisual.com http://www.performancedogsina http://www.pptbackgrounds.fsn http://www.ravc.com http://www.rodssteak- http://www.scms.ca http://www.simplysiestakey.com http://www.sportsstats.com http://www.supersprings.com http://www.thebeverlyhillscouri http://www.tombraidermovie.com http://www.ulqini.de http://www.virtualict.com http://www.whipnspur.com http://www.yardleylondon.com URLs Click Count 11. Affinity Graph Representation Once the edge weight is computed, for each query q(i) we built a main dictionay having key = q(i) and value equal to an ordered dictionary. The ordered dictionary has keys equals to the queries sharing at least 1 URL with q(i) and values equal to w(i,j). The main dictionary is used to feed the query suggestion API and provide a reliable result in milliseconds. 12. Demo for those who cant enjoy it the LIVE one 13. Thanks! Andrea Gigli https://about.me/andrea.gigli