Upload
jesalnmistry
View
115
Download
6
Tags:
Embed Size (px)
DESCRIPTION
this is the presentation for paper presentation
Citation preview
Presented By:
Aamir Mushtaq
Jesal Mistry
Kapil Tekwani
Neville Shah
1
Visual Representation of Knowledge Articles as Dynamic Interactive Connected Graph Nodes
2
Introduction ACM Keywords Some Wikipedia Statistics Algorithm used Mathematical Model Feasibility Analysis Proposed UI References Paper Publications Motivation of the Project undertaken Questions
Overview
Introduction
3
• Online knowledge articles have become increasingly popular
• Eg - Wikipedia is used by students, educators, professionals etc
• Problem faced:
• Article topics to be studied are not easy to understand
• Take too much time
• Have too much content
• Possible solution: Create a graphical visualization of knowledge articles.
• Enables users to obtain an easily understandable overview of an article
4
H. Information Systems H.2 Database Management
H.2.8 Database Applications Data Mining
H.3 Information Storage and Retrieval H.3.3 Information Search and Retrieval
Information filtering Query formulation
H.3.5 Online Information Services Web based services
H.5 Information Interfaces and Presentation H.5.4 Hypertext/Hypermedia
Architecture Navigation
I. Computing Methodologies I.2 Artificial Intelligence
I.2.7 Natural Language Processing
o Text Analysis I.2.8 Problem Solving, Control Methods & Search
Dynamic Programming Graph & Tree Search Strategies
ACM Keywords
Some Wikipedia Statistics
5
• No. of page views on average, per month (Dec 2011, en) = 7.958 Billion
• Translating to approx. 265 M/d or 11.1 M/h or 184 k / m or 3.1 k/s
• Assume 5 m/article translates to ~ 920 k minutes
6
Algorithm Used
1. Input to system = URL of Knowledge article (consider Wikipedia)
2. Select the document from Wikipedia dump / Scrape corresponding to the input URL. (document = Natural language words + Keywords + Links)
3. Eliminate Natural language words.
4. Count section-wise occurrences of keywords, store using tables and calculate weight.Ex: weight of particular keyword in doc = 0.7*cs1+0.5*cs2+0.3*cs3
5. Create a table for Links in that document, if there is a link for a particular keyword it will add to the weight of that keyword.
6. Create a threshold for keywords or links to be displayed based on weight.
7
Algorithm Used (cont’d)
7. Depending on current depth, pre-decided window size to select top keyword/links for next level.Example: 20 for 0th level, 10 for 1st level, 5 for 2nd level.(tuning required)
8. For efficient searching of accurate data we will be working across the depth i.e. at next levels if the keyword (present in previous level doc) is occurring many times (say 100), it will add weight to the corresponding keyword in the previous table.
9 Output will be graphical representation of keywords.If node (keyword) is a link, it will be connected to another node (keyword) of next level else stop at that level.
8
Mathematical ModelLet S be the system. S = {Uinp, U, D, Q, Wt, Kw, TKw,S , TKw,Wg , TU,Kw , TU,Kw,Wg} Uinp = URL identifier (input to the system) D = database of the WWW, containing webpages as documents di.D = {d1, d2, d3,..., dn} where di is a WWW document (webpage). Q = set of all possible queries.Q = {q1, q2, q3,..., qn} where qi is any given query to be fired on the database.
Wt = set of words of a particular document.Wt = {w1, w2,..., wn} where wi ϵ di, for 1<= i <= n Kw = set of keywords W⊆ t, obtained after Fel
Kw = {k1, k2,…, km} where ki W⊆ t, for 1<= i <= m U = extracted URLs from document di
U = {u1, u2,..., un} where ui ϵ di
9
Mathematical Model
TKw,S = table of keywords and sectional counts, obtained after Fcnt
TKw,S = {<k1, sA1, sB1, sC1>, <k2, sA2, sB2, sC2>, …, <km, sA3, sB3, sC3>}
TKw,Wg = table of keywords and associated weights, obtained after Fw
TKw,Wg = {<k1,wg1>, <k2, wg2>, … ,<km, wgm>} TU,Kw = table of urls in U mapped with the keywords and weights table TKw,Wg
obtained after Fmap
TU,Kw = {<un , km, wgm >} Ut is a mapping of keywords and their respective <U>
10
Mathematical Model
Fel (WT{<w1, w2, ... , wn>}) = KW
Fel eliminates all natural language elements from the <WT> part and resultant set of words are the keywords that are identified in the <KW> list / set.
Fcnt ( Kw {<k1, k2, ... , kn >}) = TKw,S
Fcnt returns an array of tuples of keywords and their respective sectional counts {<km, s1, s2, s3>} which would be used in the calculation of weights of keywords. And provide the TKw,S as input of Fw .
Fw ( TKw,S {<km, sAm, sB m, sC
m >}) = TKw,Wg
Fw takes the TKw,S obtained by the function Fcnt as input and calculates the weight associated with each keyword and returns array of tuples of keywords and weights. {<km, wgm>}
Fmap ( U{<u1, u2, … un>} ,TKw,Wg{<k1,wg1>, <k2,wg2> ,…,<km, wgm>}) = TU,Kw,Wg
Fmap takes the U< u1, u2...un > and TKw,Wg <km, wgm> as input and it maps the keywords with the respective Urls in the di and returns an array of urls with their mapped keywords and Urls.
Fwin (lvl) = {<5> v <10>v
<20>}Fwin is a window function that returns the size of the
window that is dependent on the depth/ level that we are in.
Functions:
11
Feasibility Analysis
NP – Hard: Number of keywords and links not known while scanning wiki Processing power at server not determined in advance Ranking algorithm exponential in nature Solution not determined in polynomial time
NP – Complete: Assign ranks to keywords and links, using ranking algorithm Use threshold value to limit links Approximate processing power calculated to scan documents Thus converted NP – Hard to NP – Complete
12
Proposed UI – shows the output of a search
13
Backward References:
1. Schonhofen, P.; "Identifying Document Topics Using the Wikipedia Category Network," Web Intelligence, 2006. WI 2006. IEEE/WIC/ACM International Conference on , vol., no., pp.456-462, 18-22 Dec. 2006
2. Cheong-Iao Pang; Biuk-Aghai, R.P.; , "Map-like Wikipedia overview visualization," Collaboration Technologies and Systems (CTS), 2011 International Conference on , vol., no., pp.53-60, 23-27 May 2011
3. Boukhelifa, N.; Chevalier, F.; Fekete, J.; , "Real-time aggregation of Wikipedia data for visual analytics," Visual Analytics Science and Technology (VAST), 2010 IEEE Symposium on , vol., no., pp.147-154, 25-26 Oct. 2010
4. Lamberti, F.; Sanna, A.; Demartini, C.; , "A Relation-Based Page Rank Algorithm for Semantic Web Search Engines," Knowledge and Data Engineering, IEEE Transactions on , vol.21, no.1, pp.123-136, Jan. 2009
5. Prato, A.; Ronchetti, M.; , "Using Wikipedia as a Reference for Extracting Semantic Information from a Text," Advances in Semantic Processing, 2009. SEMAPRO '09. Third International Conference on , vol., no., pp.56-61, 11-16 Oct. 2009
References
14
Forward References:
1. Cheong-Iao Pang; Biuk-Aghai, R.P.; , "Map-like Wikipedia overview visualization," Collaboration Technologies and Systems (CTS), 2011 International Conference on , vol., no., pp.53-60, 23-27 May 2011.
2. Pirrone, R.; Pipitone, A.; Russo, G.; “Semantic sense extraction from Wikipedia pages,” Human System Interactions (HSI), 2010 3rd Conference on, vol., no., pp. 543-547, 13-15 May 2010
3. Wikipedia data from Wikipedia links: http://stats.wikimedia.org/EN/TablesPageViewsMonthlyCombined.htm
4. Wikipedia database download in xml format: http://dumps.wikimedia.org/ derived from http://en.wikipedia.org/wiki/wikipedia:Database_Download
5. Wikitools from mediaWiki in url: http://en.wikipedia.org/wiki/MediaWiki
6. Wikipedia Categorization from Wikipedia website: http://en.wikipedia.org/wiki/Wikipedia:Categorization
References (cont’d)
15
Paper Publications
Paper Title:
Visual Representation of Knowledge Articles as Dynamic Interactive Connected Graph Nodes.
Name of Conference where paper submitted:European Modeling Symposium 2011, EMS2011Informatics and Computational Intelligence 2011, ICI2011Education and e-learning conference 2011, EeL2011International Conference on Information Systems and Technologies 2012, ICIST2012
Name of Conference where paper Accepted:European Modeling Symposium 2011, EMS2011Informatics and Computational Intelligence 2011, ICI2011Education and e-learning conference 2011, EeL2011International Conference on Information Systems and Technologies 2012, ICIST2012
Name of Journal where paper Accepted: International Foundation for Modern Education and Scientific Research (INFOMESR)
Motivation of the Project undertaken
16
Project Motivation: Provide a user friendly solution to problem mentioned in introduction Project overall saves man hours (a picture is worth a thousand words) Visualization and interactivity enhances interest and makes learning fun Knowledge articles assimilated easily and quickly. Overview of a topic obtained with minimum reading Time spent reading minimised
Personal Motivation Learn new technologies Learn SDLC Project management skills
17
Thank YouAny Questions?