View
360
Download
0
Category
Tags:
Preview:
Citation preview
Charting the Digital Library Evaluation Domain with a Semantically Enhanced
Mining Methodology
S
Eleni A!ontzi,1 Giannis Kazadeis,1 Leonidas Papachristopoulos,2Michalis Sfakakis,2 Giannis Tsakonas,2 Christos Papatheodorou2
13th ACM/IEEE Joint Conference on Digital Libraries, July 22-26, Indianapolis, IN, USA
1. Department of Informatics,Athens University of Economics & Business
2. Database & Information Systems Group, Department of Archives & Library
Science, Ionian University
aim & scope of research
aim & scope of research
• To propose a methodology for discovering patterns in the scienti!c literature.
aim & scope of research
• To propose a methodology for discovering patterns in the scienti!c literature.
• Our case study is performed in the digital library evaluation domain and its conference literature.
aim & scope of research
• To propose a methodology for discovering patterns in the scienti!c literature.
• Our case study is performed in the digital library evaluation domain and its conference literature.
• We question:
aim & scope of research
• To propose a methodology for discovering patterns in the scienti!c literature.
• Our case study is performed in the digital library evaluation domain and its conference literature.
• We question:- how we select relevant studies,
aim & scope of research
• To propose a methodology for discovering patterns in the scienti!c literature.
• Our case study is performed in the digital library evaluation domain and its conference literature.
• We question:- how we select relevant studies,- how we annotate them,
aim & scope of research
• To propose a methodology for discovering patterns in the scienti!c literature.
• Our case study is performed in the digital library evaluation domain and its conference literature.
• We question:- how we select relevant studies,- how we annotate them,- how we discover these patterns,
aim & scope of research
• To propose a methodology for discovering patterns in the scienti!c literature.
• Our case study is performed in the digital library evaluation domain and its conference literature.
• We question:- how we select relevant studies,- how we annotate them,- how we discover these patterns,in an effective, machine-operated way, in order to have reusable and interpretable data?
why
why
• Abundance of scienti!c information
why
• Abundance of scienti!c information• Limitations of existing tools, such as reusability
why
• Abundance of scienti!c information• Limitations of existing tools, such as reusability• Lack of contextualized analytic tools
why
• Abundance of scienti!c information• Limitations of existing tools, such as reusability• Lack of contextualized analytic tools• Supervised automated processes
panorama
panorama
1. Document classi!cation to identify relevant papers
panorama
1. Document classi!cation to identify relevant papers- We use a corpus of 1,824 papers from the JCDL and ECDL
(now TPDL) conferences, era 2001-2011.
panorama
1. Document classi!cation to identify relevant papers- We use a corpus of 1,824 papers from the JCDL and ECDL
(now TPDL) conferences, era 2001-2011.2. Semantic annotation processes to mark up important concepts
panorama
1. Document classi!cation to identify relevant papers- We use a corpus of 1,824 papers from the JCDL and ECDL
(now TPDL) conferences, era 2001-2011.2. Semantic annotation processes to mark up important concepts
- We use a schema for semantic annotation, the Digital Library Evaluation Ontology, and a semantic annotation tool, GoNTogle.
panorama
1. Document classi!cation to identify relevant papers- We use a corpus of 1,824 papers from the JCDL and ECDL
(now TPDL) conferences, era 2001-2011.2. Semantic annotation processes to mark up important concepts
- We use a schema for semantic annotation, the Digital Library Evaluation Ontology, and a semantic annotation tool, GoNTogle.
3. Clustering to form coherent groups (K=11)
panorama
1. Document classi!cation to identify relevant papers- We use a corpus of 1,824 papers from the JCDL and ECDL
(now TPDL) conferences, era 2001-2011.2. Semantic annotation processes to mark up important concepts
- We use a schema for semantic annotation, the Digital Library Evaluation Ontology, and a semantic annotation tool, GoNTogle.
3. Clustering to form coherent groups (K=11)4. Interpretation with the assistance of the ontology schema
panorama
1. Document classi!cation to identify relevant papers- We use a corpus of 1,824 papers from the JCDL and ECDL
(now TPDL) conferences, era 2001-2011.2. Semantic annotation processes to mark up important concepts
- We use a schema for semantic annotation, the Digital Library Evaluation Ontology, and a semantic annotation tool, GoNTogle.
3. Clustering to form coherent groups (K=11)4. Interpretation with the assistance of the ontology schema
panorama
1. Document classi!cation to identify relevant papers- We use a corpus of 1,824 papers from the JCDL and ECDL
(now TPDL) conferences, era 2001-2011.2. Semantic annotation processes to mark up important concepts
- We use a schema for semantic annotation, the Digital Library Evaluation Ontology, and a semantic annotation tool, GoNTogle.
3. Clustering to form coherent groups (K=11)4. Interpretation with the assistance of the ontology schema
• During this process we perform benchmarking tests to qualify speci!c components to effectively automate the exploration of the literature and the discovery of research patterns.
part 1
how we identify relevant studies
training phase
training phase
• e aim was to train a classi!er to identify relevant papers.
training phase
• e aim was to train a classi!er to identify relevant papers.• Categorization
training phase
• e aim was to train a classi!er to identify relevant papers.• Categorization
- two researchers categorized, a third one supervised
training phase
• e aim was to train a classi!er to identify relevant papers.• Categorization
- two researchers categorized, a third one supervised- descriptors: title, abstract & author keywords
training phase
• e aim was to train a classi!er to identify relevant papers.• Categorization
- two researchers categorized, a third one supervised- descriptors: title, abstract & author keywords- rater’s agreement: 82.96% for JCDL, 78% for ECDL
training phase
• e aim was to train a classi!er to identify relevant papers.• Categorization
- two researchers categorized, a third one supervised- descriptors: title, abstract & author keywords- rater’s agreement: 82.96% for JCDL, 78% for ECDL - inter-rater agreement: moderate levels of Cohen’s Kappa
training phase
• e aim was to train a classi!er to identify relevant papers.• Categorization
- two researchers categorized, a third one supervised- descriptors: title, abstract & author keywords- rater’s agreement: 82.96% for JCDL, 78% for ECDL - inter-rater agreement: moderate levels of Cohen’s Kappa- 12% positive # 88% negative
training phase
• e aim was to train a classi!er to identify relevant papers.• Categorization
- two researchers categorized, a third one supervised- descriptors: title, abstract & author keywords- rater’s agreement: 82.96% for JCDL, 78% for ECDL - inter-rater agreement: moderate levels of Cohen’s Kappa- 12% positive # 88% negative
• Skewness of data addressed via resampling:
training phase
• e aim was to train a classi!er to identify relevant papers.• Categorization
- two researchers categorized, a third one supervised- descriptors: title, abstract & author keywords- rater’s agreement: 82.96% for JCDL, 78% for ECDL - inter-rater agreement: moderate levels of Cohen’s Kappa- 12% positive # 88% negative
• Skewness of data addressed via resampling: - under-sampling (Tomek Links)
training phase
• e aim was to train a classi!er to identify relevant papers.• Categorization
- two researchers categorized, a third one supervised- descriptors: title, abstract & author keywords- rater’s agreement: 82.96% for JCDL, 78% for ECDL - inter-rater agreement: moderate levels of Cohen’s Kappa- 12% positive # 88% negative
• Skewness of data addressed via resampling: - under-sampling (Tomek Links)- over-sampling (random over-sampling)
corpus de!nition
corpus de!nition
• Classi!cation algorithm: Naïve Bayes
corpus de!nition
• Classi!cation algorithm: Naïve Bayes• Two sub-sets: a development (75%) and a test (25%)
corpus de!nition
• Classi!cation algorithm: Naïve Bayes• Two sub-sets: a development (75%) and a test (25%)• Ten-fold validation: the development set was randomly divided
to 10 equal; 9/10 as training set and 1/10 as test set.
corpus de!nition
• Classi!cation algorithm: Naïve Bayes• Two sub-sets: a development (75%) and a test (25%)• Ten-fold validation: the development set was randomly divided
to 10 equal; 9/10 as training set and 1/10 as test set.
0
0.2
0.4
0.6
0.8
1.0
0 0.2 0.4 0.6 0.8 1.0
corpus de!nition
• Classi!cation algorithm: Naïve Bayes• Two sub-sets: a development (75%) and a test (25%)• Ten-fold validation: the development set was randomly divided
to 10 equal; 9/10 as training set and 1/10 as test set.
0
0.2
0.4
0.6
0.8
1.0
0 0.2 0.4 0.6 0.8 1.0
TestDevelopment
corpus de!nition
• Classi!cation algorithm: Naïve Bayes• Two sub-sets: a development (75%) and a test (25%)• Ten-fold validation: the development set was randomly divided
to 10 equal; 9/10 as training set and 1/10 as test set.
0
0.2
0.4
0.6
0.8
1.0
0 0.2 0.4 0.6 0.8 1.0
TestDevelopment
fp rate
corpus de!nition
• Classi!cation algorithm: Naïve Bayes• Two sub-sets: a development (75%) and a test (25%)• Ten-fold validation: the development set was randomly divided
to 10 equal; 9/10 as training set and 1/10 as test set.
0
0.2
0.4
0.6
0.8
1.0
0 0.2 0.4 0.6 0.8 1.0
TestDevelopment
fp rate
tp rate
corpus de!nition
• Classi!cation algorithm: Naïve Bayes• Two sub-sets: a development (75%) and a test (25%)• Ten-fold validation: the development set was randomly divided
to 10 equal; 9/10 as training set and 1/10 as test set.
0
0.2
0.4
0.6
0.8
1.0
0 0.2 0.4 0.6 0.8 1.0
TestDevelopment
fp rate
tp rate
part 2
how we annotate
the schema - DiLEO
the schema - DiLEO
• DiLEO aims to conceptualize the DL evaluation domain by exploring its key entities, their attributes and their relationships.
the schema - DiLEO
• DiLEO aims to conceptualize the DL evaluation domain by exploring its key entities, their attributes and their relationships.
• A two layered ontology:
the schema - DiLEO
• DiLEO aims to conceptualize the DL evaluation domain by exploring its key entities, their attributes and their relationships.
• A two layered ontology:- Strategic level: consists of a set of classes related with the
scope and aim of an evaluation.
the schema - DiLEO
• DiLEO aims to conceptualize the DL evaluation domain by exploring its key entities, their attributes and their relationships.
• A two layered ontology:- Strategic level: consists of a set of classes related with the
scope and aim of an evaluation.- Procedural level: consists of classes dealing with practical
issues.
the instrument - GoNTogle
the instrument - GoNTogle
the instrument - GoNTogle
• We used GoNTogle to generate a RDFS knowledge base.
the instrument - GoNTogle
• We used GoNTogle to generate a RDFS knowledge base.
• GoNTogle uses the weighted k-NN algorithm to support either manual, or automated ontology-based annotation.
the instrument - GoNTogle
• We used GoNTogle to generate a RDFS knowledge base.
• GoNTogle uses the weighted k-NN algorithm to support either manual, or automated ontology-based annotation.
the instrument - GoNTogle
• We used GoNTogle to generate a RDFS knowledge base.
• GoNTogle uses the weighted k-NN algorithm to support either manual, or automated ontology-based annotation.
• http://bit.ly/12nlryh
the process - 1/3
the process - 1/3
• GoNTogle estimates a score for each class/subclass, calculating its presence in the k nearest neighbors.
the process - 1/3
• GoNTogle estimates a score for each class/subclass, calculating its presence in the k nearest neighbors.
• We set a score threshold above which a class is assigned to a new instance (optimal score: 0.18).
the process - 1/3
• GoNTogle estimates a score for each class/subclass, calculating its presence in the k nearest neighbors.
• We set a score threshold above which a class is assigned to a new instance (optimal score: 0.18).
• e user is presented with a ranked list of the suggested classes/subclasses and their score ranging from 0 to 1.
the process - 1/3
• GoNTogle estimates a score for each class/subclass, calculating its presence in the k nearest neighbors.
• We set a score threshold above which a class is assigned to a new instance (optimal score: 0.18).
• e user is presented with a ranked list of the suggested classes/subclasses and their score ranging from 0 to 1.
• 2,672 annotations were manually generated.
the process - 2/3
the process - 2/3
• RDFS statements were processed to construct a new data set (removal of stopwords, symbols, lowercasing, etc.)
the process - 2/3
• RDFS statements were processed to construct a new data set (removal of stopwords, symbols, lowercasing, etc.)
• Experiments both with un-stemmed (4,880 features) and stemmed (3,257 features) words.
the process - 2/3
• RDFS statements were processed to construct a new data set (removal of stopwords, symbols, lowercasing, etc.)
• Experiments both with un-stemmed (4,880 features) and stemmed (3,257 features) words.
• Multi-label classi!cation via the ML framework Meka.
the process - 2/3
• RDFS statements were processed to construct a new data set (removal of stopwords, symbols, lowercasing, etc.)
• Experiments both with un-stemmed (4,880 features) and stemmed (3,257 features) words.
• Multi-label classi!cation via the ML framework Meka.• Four methods
- binary representation
- Label powersets- RAkEL- ML-kNN
• Four algorithms- Naïve Bayes - Multinomial
Naïve Bayes - k-Nearest-
Neighbors- Support Vector
Machines
• Four metrics- Hamming Loss- Accuracy- One-error- F1 macro
the process - 3/3
the process - 3/3
• Performance tests were repeated using GoNTogle.
the process - 3/3
• Performance tests were repeated using GoNTogle. • GoNTogle’s algorithm achieves good results in relation to the
tested multi-label classi!cation algorithms.
the process - 3/3
• Performance tests were repeated using GoNTogle. • GoNTogle’s algorithm achieves good results in relation to the
tested multi-label classi!cation algorithms.
0
0.2
0.4
0.6
0.8
1.0
Hamming Loss Accuracy One - Error F1 macro
0.44
0.27
0.63
0.02
0.390.29
0.49
0.02
the process - 3/3
• Performance tests were repeated using GoNTogle. • GoNTogle’s algorithm achieves good results in relation to the
tested multi-label classi!cation algorithms.
0
0.2
0.4
0.6
0.8
1.0
Hamming Loss Accuracy One - Error F1 macro
0.44
0.27
0.63
0.02
0.390.29
0.49
0.02GoNTogleMeka
part 3
how we discover
clustering - 1/3
clustering - 1/3
• e !nal data set consists of 224 vectors of 53 features
clustering - 1/3
• e !nal data set consists of 224 vectors of 53 features- represents the assigned annotations from the DiLEO
vocabulary to the document corpus.
clustering - 1/3
• e !nal data set consists of 224 vectors of 53 features- represents the assigned annotations from the DiLEO
vocabulary to the document corpus.• We represent the annotated documents by 2 vector models:
clustering - 1/3
• e !nal data set consists of 224 vectors of 53 features- represents the assigned annotations from the DiLEO
vocabulary to the document corpus.• We represent the annotated documents by 2 vector models:
- binary: fi has the value of 1, if the respective to fi subclass is assigned to the document m, otherwise 0.
clustering - 1/3
• e !nal data set consists of 224 vectors of 53 features- represents the assigned annotations from the DiLEO
vocabulary to the document corpus.• We represent the annotated documents by 2 vector models:
- binary: fi has the value of 1, if the respective to fi subclass is assigned to the document m, otherwise 0.
- tf-idf: feature frequency ffi of fi in all vectors is equal to 1 when the respective subclass is annotated to the respective document m; idfi is the inverse document frequency of the feature i in documents M.
clustering - 2/3
clustering - 2/3
• We cluster the vector representations of the annotations by applying 2 clustering algorithms:
clustering - 2/3
• We cluster the vector representations of the annotations by applying 2 clustering algorithms:- K-Means: partitions M data points to K clusters. e rate of
decrease peaked for K near 11 when plotted the Objective function (cost or error) for various values of K.
clustering - 2/3
• We cluster the vector representations of the annotations by applying 2 clustering algorithms:- K-Means: partitions M data points to K clusters. e rate of
decrease peaked for K near 11 when plotted the Objective function (cost or error) for various values of K.
- Agglomerative Hierarchical Clustering: a ‘bottom up’ built hierarchy of clusters.
clustering - 3/3
clustering - 3/3
• We assess each feature of each cluster using the frequency increase metric.
clustering - 3/3
• We assess each feature of each cluster using the frequency increase metric.- it calculates the increase of the frequency of a feature fi in the
cluster k (cfi,k) compared to its document frequency dfi in the entire data set
clustering - 3/3
• We assess each feature of each cluster using the frequency increase metric.- it calculates the increase of the frequency of a feature fi in the
cluster k (cfi,k) compared to its document frequency dfi in the entire data set
• We select the threshold a that maximizes the F1-measure, the harmonic mean of Coverage and Dissimilarity mean.
clustering - 3/3
• We assess each feature of each cluster using the frequency increase metric.- it calculates the increase of the frequency of a feature fi in the
cluster k (cfi,k) compared to its document frequency dfi in the entire data set
• We select the threshold a that maximizes the F1-measure, the harmonic mean of Coverage and Dissimilarity mean.- Coverage: the proportion of features participating in the
clusters to the total number of features
clustering - 3/3
• We assess each feature of each cluster using the frequency increase metric.- it calculates the increase of the frequency of a feature fi in the
cluster k (cfi,k) compared to its document frequency dfi in the entire data set
• We select the threshold a that maximizes the F1-measure, the harmonic mean of Coverage and Dissimilarity mean.- Coverage: the proportion of features participating in the
clusters to the total number of features- Dissimilarity mean: the average of the distinctiveness of the
clusters, de!ned in terms of the dissimilarity di,j between all the possible pairs of the clusters.
metrics - F1-measure
metrics - F1-measure
0
0.1
0.2
0.3
0.4
0.6
0.7
0.8
0.9
1.0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
metrics - F1-measure
0
0.1
0.2
0.3
0.4
0.6
0.7
0.8
0.9
1.0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
K-Means tf-idf K-Means binary Hierarchical tf-idf
part 4
how (and what) we interpret
Levels
patterns
hasDimensionsType
isAimingAt
Research Questions
isSupporting/isSupportedBy
hasPerformed/isPerformedIn
isUsedIn/isUsingFindings
CriteriaMetrics Factors
Means Types
Criteria Categories
hasConstituent/ isConstituting
Dimensionstechnical excellence
Instrumentssoftware
Activityreport
Goalsdesign
Subjectshuman agents
Dimension Type
summative
Meanssurvey studies
isParticipatingIn
Means laboratory studies
Characteristicscount
Characteristicsdiscipline
Dimensionseffectiveness
Objects
PROCEDURAL LAYER
STRATEGIC LAYER
K-Means tf-idf
patternsResearch
Questions
hasPerformed/isPerformedIn
Findings
CriteriaMetrics Factors
Criteria Categories
hasConstituent/ isConstituting
isParticipatingIn
Instruments
Dimensionseffectiveness
Dimensions Types
meanssurvey studies
means laboratory studies
Characteristics
Goaldescribe
means typequantitative
hasMeansType
activityrecord
activitycompare
Levelinterface
isAimingAt
isAffecting/isAffectedBy
Objects
Subjectshuman agents
PROCEDURAL LAYER
STRATEGIC LAYER
Hierarchical
part 5
conclusions
conclusions
conclusions
• e patterns re$ect and - up to a point - con!rm the anecdotally evident research practices of DL researchers.
conclusions
• e patterns re$ect and - up to a point - con!rm the anecdotally evident research practices of DL researchers.
• Patterns have similar properties to a map.
conclusions
• e patterns re$ect and - up to a point - con!rm the anecdotally evident research practices of DL researchers.
• Patterns have similar properties to a map. - ey can provide the main and the alternative routes one can
follow to reach to a destination, taking into account several practical parameters that might not know.
conclusions
• e patterns re$ect and - up to a point - con!rm the anecdotally evident research practices of DL researchers.
• Patterns have similar properties to a map. - ey can provide the main and the alternative routes one can
follow to reach to a destination, taking into account several practical parameters that might not know.
• By exploring previous pro!les, one can weight all the available options.
conclusions
• e patterns re$ect and - up to a point - con!rm the anecdotally evident research practices of DL researchers.
• Patterns have similar properties to a map. - ey can provide the main and the alternative routes one can
follow to reach to a destination, taking into account several practical parameters that might not know.
• By exploring previous pro!les, one can weight all the available options.
• is approach can extend other coding methodologies in terms of transparency, standardization and reusability.
ank you for your attention.
questions?
Recommended