7

INFORMATION RETRIEVAL PROJECT Creation of clusters of concepts that represent a domain corpus.

INFORMATION RETRIEVAL PROJECT Creation of clusters of concepts that represent a domain corpus

Download PPTX Report

Upload
allison-stevenson
View
214
Download
0

Embed Size (px)

Citation preview

Page 1: INFORMATION RETRIEVAL PROJECT Creation of clusters of concepts that represent a domain corpus

INFORMATION RETRIEVAL PROJECTCreation of clusters of concepts that represent a domain corpus.

Page 2: INFORMATION RETRIEVAL PROJECT Creation of clusters of concepts that represent a domain corpus

Background• Vector Space Model.• Knowledge-Based Vector Space Model. • Wikipedia as a knowledge domain.• BOW indexing versus knowledge-based indexing.• Indexing Wikipedia.• Wikipedia-based concept clustering

Page 3: INFORMATION RETRIEVAL PROJECT Creation of clusters of concepts that represent a domain corpus

Knowledge-based VSM for text Clustering

• Problem Definition:

• Creating clusters of related concepts, each cluster represents a specific knowledge domain.

• Creation of The knowledge-based Vectors for documents in a given corpus based on term similarity measures in each document.

Page 4: INFORMATION RETRIEVAL PROJECT Creation of clusters of concepts that represent a domain corpus

Given:• Wikipedia index.• Working Code for Knowledge-based corpus indexes.• Working code to define term-term relatedness weight. • Working Similarity code “To extract a similar document to

an existing one from Wikipedia”.• Algorithm for Document Clustering based on the

Wikipedia structure”.

Page 5: INFORMATION RETRIEVAL PROJECT Creation of clusters of concepts that represent a domain corpus

Email me @• [email protected]

• [email protected]

mailto:[email protected]

mailto:[email protected]

Page 6: INFORMATION RETRIEVAL PROJECT Creation of clusters of concepts that represent a domain corpus

Required To implement:

• Building a knowledge-based VSM Index for documents in two different domain corpuses using the term similarity code given.

• Implementation of the Wikipedia Structure-based given clustering Algorithm.

Page 7: INFORMATION RETRIEVAL PROJECT Creation of clusters of concepts that represent a domain corpus

Tools that will be used• Wikipedia Database Dumps. (MySql Database).• JWPL API to access the Wikipedia database dumps.• Lucene API to build indexes.• Assistance and codes will be provided to help using the

APIs

Corpus Linguistics: history - schplaf.org · IntroductionEarly corpus linguisticsChomsky’s criticismOther argumentsThe revival of corpus linguisticsConclusion Corpus Linguistics:

Corpus Linguistics: history - schplaf.org · IntroductionEarly corpus linguisticsChomsky’s criticismOther argumentsThe revival of corpus linguisticsConclusion Corpus Linguistics:

Documents

Industry clusters 1 Background Controversy Cluster drivers Identifying clusters Clusters and economic performance Clusters and policy

Industry clusters 1 Background Controversy Cluster drivers Identifying clusters Clusters and economic performance Clusters and policy

Documents

InTeReC: In-text Reference Corpus for Applying NLP to ...In Proceedings of the First Workshop on Bibliometric-enhanced Information Retrieval co-located with 37th European Conference

InTeReC: In-text Reference Corpus for Applying NLP to ...In Proceedings of the First Workshop on Bibliometric-enhanced Information Retrieval co-located with 37th European Conference

Documents

Media Retrieval Information Retrieval Image Retrieval Video Retrieval Audio Retrieval Information Retrieval Image Retrieval Video Retrieval Audio Retrieval

Media Retrieval Information Retrieval Image Retrieval Video Retrieval Audio Retrieval Information Retrieval Image Retrieval Video Retrieval Audio Retrieval

Documents

Information Retrieval: Introduction · Information Retrieval: Introduction De nition of Information Retrieval De nition of Information Retrieval Information Retrieval (IR) is about

Information Retrieval: Introduction · Information Retrieval: Introduction De nition of Information Retrieval De nition of Information Retrieval Information Retrieval (IR) is about

Documents

Holistic corpus-based dialectology - scielo.br · técnicas de análise multivariada (tais como escalagem multidimensional, análise de clusters, e análise de componente principal),

Holistic corpus-based dialectology - scielo.br · técnicas de análise multivariada (tais como escalagem multidimensional, análise de clusters, e análise de componente principal),

Documents

From Facsimile to Content Based Retrieval: the Electronic Corpus of

From Facsimile to Content Based Retrieval: the Electronic Corpus of

Documents

Metallic Clusters, Metallic Clusters, MesoscopicMesoscopic ... Presentations3/Zachariah AFOSR Workshop 21 January 2015.pdfMetallic Clusters, Metallic Clusters, MesoscopicMesoscopic

Metallic Clusters, Metallic Clusters, MesoscopicMesoscopic ... Presentations3/Zachariah AFOSR Workshop 21 January 2015.pdfMetallic Clusters, Metallic Clusters, MesoscopicMesoscopic

Documents

Corpus 03 Corpus Analysis. Corpus analysis Annotation –Lemmatization –Tagging –Parsing Corpus analysis –Listing –Sorting –Counting –Concordancing Tools

Corpus 03 Corpus Analysis. Corpus analysis Annotation –Lemmatization –Tagging –Parsing Corpus analysis –Listing –Sorting –Counting –Concordancing Tools

Documents

Massive multi lingual corpus compilation: Acquis ...nl.ijs.si/et/Bib/LTC05-acquis.pdf · retrieval, multilingual lexicon extraction, sense disambiguation, etc. The value of a parallel

Massive multi lingual corpus compilation: Acquis ...nl.ijs.si/et/Bib/LTC05-acquis.pdf · retrieval, multilingual lexicon extraction, sense disambiguation, etc. The value of a parallel

Documents

Web search engines Rooted in Information Retrieval (IR) systems Prepare a keyword index for corpus Respond to keyword queries with a ranked list of documents

Web search engines Rooted in Information Retrieval (IR) systems Prepare a keyword index for corpus Respond to keyword queries with a ranked list of documents

Documents

Travelling through time with corpus annotation softwareucrel.lancs.ac.uk/people/paul/publications/RaysonSlide...Corpus linguistics Information Retrieval Natural language processing

Travelling through time with corpus annotation softwareucrel.lancs.ac.uk/people/paul/publications/RaysonSlide...Corpus linguistics Information Retrieval Natural language processing

Documents

PARALLEL TEXT RETRIEVAL ON PC CLUSTERS - … · ABSTRACT PARALLEL TEXT RETRIEVAL ON PC CLUSTERS Ayt ul C˘atal M.S. in Computer Engineering Supervisor: Prof. Dr. Cevdet Aykanat September,

PARALLEL TEXT RETRIEVAL ON PC CLUSTERS - … · ABSTRACT PARALLEL TEXT RETRIEVAL ON PC CLUSTERS Ayt ul C˘atal M.S. in Computer Engineering Supervisor: Prof. Dr. Cevdet Aykanat September,

Documents

Automatic Derivation of Nouns from Adjectives in Pashto · 2016. 11. 22. · •Pashto corpus was created using the corpus improvement tool XML Aware Indexing and Retrieval Architecture

Automatic Derivation of Nouns from Adjectives in Pashto · 2016. 11. 22. · •Pashto corpus was created using the corpus improvement tool XML Aware Indexing and Retrieval Architecture

Documents

Best Practices - A Contractors Perspective at Corpus ... Practices - A Contractors Perspective at Corpus Christi Army ... • Synergy: When all systems are ... person storage and retrieval

Best Practices - A Contractors Perspective at Corpus ... Practices - A Contractors Perspective at Corpus Christi Army ... • Synergy: When all systems are ... person storage and retrieval

Documents

iceb2019.johogo.comiceb2019.johogo.com/papers/ICEB_2019_paper_63_ok.doc · Web viewIn order to clarify the existing clusters within the corpus, the keywords illustrated in the frequency

iceb2019.johogo.comiceb2019.johogo.com/papers/ICEB_2019_paper_63_ok.doc · Web viewIn order to clarify the existing clusters within the corpus, the keywords illustrated in the frequency

Documents

Corpus Linguistics - Introduction to Corpus Linguistics ... · Corpus Linguistics Introduction to Corpus Linguistics Corpora, Creation & Applications Niko Schenk Institut fur England-

Corpus Linguistics - Introduction to Corpus Linguistics ... · Corpus Linguistics Introduction to Corpus Linguistics Corpora, Creation & Applications Niko Schenk Institut fur England-

Documents

CORPUS Types. 2 Plan I. Un corpus de transition II. Types de corpus

CORPUS Types. 2 Plan I. Un corpus de transition II. Types de corpus

Documents

Groups, Clusters and Clusters of Clusters

Groups, Clusters and Clusters of Clusters

Documents

corpus analytic methods.pdf - unisi.it analytic methods.pdf · Corpus Analysis methods in inter-disciplinary applications ... phraseology . Corpus Analysis 4.1 ... Corpus Analysis

corpus analytic methods.pdf - unisi.it analytic methods.pdf · Corpus Analysis methods in inter-disciplinary applications ... phraseology . Corpus Analysis 4.1 ... Corpus Analysis

Documents

Vocabulary analysis: A corpus based study of “analyze ...€¦ · Vocabulary analysis: A corpus based study of “analyze” clusters and collocates in academic and spoken discourse

Vocabulary analysis: A corpus based study of “analyze ...€¦ · Vocabulary analysis: A corpus based study of “analyze” clusters and collocates in academic and spoken discourse

Documents

EUROPEAN MARITIME CLUSTERS GLOBAL RENDS • · PDF fileinformation storage and retrieval system without written permission of the owner of this copyright. ... Figure 56: ENAPS Performance

EUROPEAN MARITIME CLUSTERS GLOBAL RENDS • · PDF fileinformation storage and retrieval system without written permission of the owner of this copyright. ... Figure 56: ENAPS Performance

Documents

From Facsimile to Content Based Retrieval: the Electronic ...dalitz/data/publications/... · From Facsimile to Content Based Retrieval: the Electronic Corpus of Lute Music Christoph

From Facsimile to Content Based Retrieval: the Electronic ...dalitz/data/publications/... · From Facsimile to Content Based Retrieval: the Electronic Corpus of Lute Music Christoph

Documents

Corpus annotation for corpus linguistics (nov2009)

Corpus annotation for corpus linguistics (nov2009)

Education

Measures from Information Retrieval to Find the Words which are Characteristic of a Corpus

Measures from Information Retrieval to Find the Words which are Characteristic of a Corpus

Documents

Fairness & Discrimination in Retrieval & …an information access system mediates an information consumer’s interaction with a large corpus of information items. generalizes information

Fairness & Discrimination in Retrieval & …an information access system mediates an information consumer’s interaction with a large corpus of information items. generalizes information

Documents

Review of some Terminology Management Systems for the ... · concordance software program: frequency and KWIC lists generation, clusters and collocates retrieval, concordance plots

Review of some Terminology Management Systems for the ... · concordance software program: frequency and KWIC lists generation, clusters and collocates retrieval, concordance plots

Documents

Retrieval Model Overview Boolean Retrieval Retrieval INFO 4300 / CS 4300 ! Retrieval models – Older models » Boolean retrieval » Vector Space model – Probabilistic Models »

Retrieval Model Overview Boolean Retrieval Retrieval INFO 4300 / CS 4300 ! Retrieval models – Older models » Boolean retrieval » Vector Space model – Probabilistic Models »

Documents

HPI Potsdam, Winter Term 2012-13 INTRODUCTION TO ... · Differences to database systems Information retrieval systems Databases Corpus Unstructured, semi- structured information (text,

HPI Potsdam, Winter Term 2012-13 INTRODUCTION TO ... · Differences to database systems Information retrieval systems Databases Corpus Unstructured, semi- structured information (text,

Documents

RST Signalling Corpus: A corpus of signals of coherence ...mtaboada/docs/publications/Das... · RST Signalling Corpus: A corpus of ... (to appear) RST Signalling Corpus: A corpus

RST Signalling Corpus: A corpus of signals of coherence ...mtaboada/docs/publications/Das... · RST Signalling Corpus: A corpus of ... (to appear) RST Signalling Corpus: A corpus

Documents