27
www.know-center.at November 17th, 2011 Information Quality in Social Media Presentation at UNSL Elisabeth Lex

Information Quality Assessment in the WIQ-EI EU Project

Embed Size (px)

DESCRIPTION

http://www.dirinfo.unsl.edu.ar/noticias/articulo/charla-dra-elisabeth-lex-know-center-austria.html

Citation preview

Page 1: Information Quality Assessment in the WIQ-EI EU Project

www.know-center.atNovember 17th, 2011

Information Quality in Social MediaPresentation at UNSL

Elisabeth Lex

Page 2: Information Quality Assessment in the WIQ-EI EU Project

2

www.know-center.at

Agenda

The Know-Center

The WIQ-EI project

Why Information Quality on the Web?

Selected Results

Conclusion

Page 3: Information Quality Assessment in the WIQ-EI EU Project

3

www.know-center.at

The Know Center – We are...

Austria’s Competence Center for Knowledge Management and Knowledge Technologies

Link between Science and Industry

A multi-disciplinary team of 40+ Scientists and Developers

Over 575 publications since 2001

100 Master theses, 26 Phd theses, 4 habilitations

Editors of 2 Journals: Journal of Universal Knowledge Management, Journal of Universal Computer Science

Organizer of the International Conference on Knowledge Management and Knowledge Technologies (I-KNOW)

Page 4: Information Quality Assessment in the WIQ-EI EU Project

4

www.know-center.at

The Know Center

2 Areas of Research:

Knowledge Relationship Discovery:

Detecting semantic entities, semantic relations in unstructured data

Cross-language and cross-domain search and retrieval Automatic analysis of information structure and quality User interfaces for visual analysis of large information

repositories

Knowledge Services:

Web 2.0, Collective Intelligence and Social Network Analysis Semantic Technologies, Semantic Web, Semantic Retrieval Communication and Collaboration Technologies Mobile Technologies

Page 5: Information Quality Assessment in the WIQ-EI EU Project

5

www.know-center.at

The WIQ-EI Project - Goals

Web Information Quality Evaluation Initiative

3 Objectives:

Development of Web Content Information Quality Measures

Plagiarism Detection and Authorship Attribution

Multilingual Opinion and Sentiment Mining

Derive algorithms, tools and test data sets

Page 6: Information Quality Assessment in the WIQ-EI EU Project

6

www.know-center.at

The WIQ-EI Project - Implementation

On a global scale:

Researcher exchanges between organisations from European (Austria, Germany, Spain, Greece) and non European countries with expertise in topic relevant fields (Argentina, Mexico, India)

Carry out research secondments, training and dissemination activites, challenges, workshops

Page 7: Information Quality Assessment in the WIQ-EI EU Project

7

www.know-center.at

Agenda

The Know-Center

Why Information Quality on the Web?

Selected Results

Conclusion

Page 8: Information Quality Assessment in the WIQ-EI EU Project

8

www.know-center.at

Introduction

On the Web - large amount of potentially useful content

Navigating is challenging

Web is changing: User Generated Content, Social Media

Page 9: Information Quality Assessment in the WIQ-EI EU Project

9

www.know-center.at

Introduction

On the Web - large amount of potentially useful content

Navigating is challenging

Web is changing: User Generated Content, Social Media

- Social media up to date- Wide audience, highly dynamic- Open to (almost) anyone- Powerful e.g. for media resonance analysis

Page 10: Information Quality Assessment in the WIQ-EI EU Project

10

www.know-center.at

Introduction

On the Web - large amount of potentially useful content

Navigating is challenging

Web is changing: User Generated Content, Social Media

- Social media up to date- Wide audience, highly dynamic- Open to (almost) anyone- Powerful e.g. for media resonance analysis

Information Quality of Social Media is questionable!

Page 11: Information Quality Assessment in the WIQ-EI EU Project

11

www.know-center.at

What is Information Quality?

A multi-dimensional concept [Klein, 2001]

Different Types of Information Quality (IQ) [Knight2005]

E.g. [Wang1996]:

Intrinsic IQ: Accuracy, Objectivity, Believability, Reputation

Accessibility IQ: Accessibility, Security

Contextual IQ: Relevancy, Value-Added, Timeliness, Completness, Amount of Information, Presence of Author information [Katerattanakul1999]

Representational IQ: Interpretability, Ease of Understanding, Concise Representation, Consistent Representation

Page 12: Information Quality Assessment in the WIQ-EI EU Project

12

www.know-center.at

Information Quality – Link to Information Retrieval, Data Mining

The Information Retrieval Process

Page 13: Information Quality Assessment in the WIQ-EI EU Project

13

www.know-center.at

Information Quality – Link to Information Retrieval, Text Mining

Text Mining

The Information Retrieval Process

Page 14: Information Quality Assessment in the WIQ-EI EU Project

14

www.know-center.at

Information Quality – Link to Information Retrieval, Data Mining

The Information Retrieval Process

Text Mining

Enables to retrieve core information from unstructured text

- Information Extraction

- Clustering

- ...

Page 15: Information Quality Assessment in the WIQ-EI EU Project

15

www.know-center.at

Information Quality – Link to Information Retrieval, Data Mining

Faceted Search

The Information Retrieval Process

Text Mining

Enables to retrieve core information from unstructured text

- Information Extraction

- Clustering

- ...

Page 16: Information Quality Assessment in the WIQ-EI EU Project

16

www.know-center.at

Information Quality – Link to Information Retrieval, Data Mining

Faceted Search

The Information Retrieval Process

Text Mining

Page 17: Information Quality Assessment in the WIQ-EI EU Project

17

www.know-center.at

Information Quality – Link to Information Retrieval, Data Mining

Faceted Search

IQ Dimensions:- Objectivity- Accuracy...

The Information Retrieval Process

Text Mining

Page 18: Information Quality Assessment in the WIQ-EI EU Project

18

www.know-center.at

Our work – Focus on Media Domain

Goal: Assess intrinsic Information Quality in social media, traditional media, arbitrary Web content

Several IQ dimensions:

Objectivity

Emotionality

Credibility

Readibility

Indepth versus Shallow

Expert versus Non-Expert

Personal versus Official

Page 19: Information Quality Assessment in the WIQ-EI EU Project

19

www.know-center.at

Agenda

The Know-Center

Why Information Quality in Media Domain?

Selected Results

Conclusion

Page 20: Information Quality Assessment in the WIQ-EI EU Project

20

www.know-center.at

ResultsInformation Quality Dimension: Objectivity

Task:

Objectivity Classification in Blogs

Use features based on style properties:

Dataset: Trec Blogs08 - 83 blogs, 12844 blog posts

Results:

Accuracy of 87% for Objectivity Classification in Blogs

Page 21: Information Quality Assessment in the WIQ-EI EU Project

21

www.know-center.at

ResultsInformation Quality Dimension: Credibility

Rank blogs by credibility

Compare blogs with credible source:

Quantity structure Content similarity: Nouns, Verbs+ Adjectives

Dataset: APA news articles, crawled blogs

Results:

Average precision of 83% for blog credibility ranking

Correlation between quantity structures of blogs and news

e.g. Query “Frankreich”, Pearson Correlation Coeff: 0.79

[Juffinger, Granitzer, Lex 2009] Blog credibility ranking by exploiting verified content. In Proc. of WICOW in at WWW‘2009.

Page 22: Information Quality Assessment in the WIQ-EI EU Project

22

www.know-center.at

ResultsWeb Genre and Quality Classification

ECML/PKDD Discovery Challenge 2010

Task 1: Web Genre and Quality Facets

News/Editorial, Educational, Discussion, Commercial, Personal/Leisure, Web Spam

Bias, Trustworthiness, Neutrality

Task 2: English Content Quality: Combination of Facets Quality Score

Task 3: Multilingual Content Quality: German, French

Dataset: English, German, French Web hosts: NLP Features, Content Features, Terms, Links

Approach: Ensemble Classifier Approach (J48, CFC, SVM)

Page 23: Information Quality Assessment in the WIQ-EI EU Project

23

www.know-center.at

Combined Quality Score

Use Case: Web Archival

Page 24: Information Quality Assessment in the WIQ-EI EU Project

24

www.know-center.at

ResultsWeb Genre and Quality Classification

Challenges:

Unbalanced and low quality training data (Training data contained also Hungarian, Czech,.. Hosts)

News and Educational hard to separate

Too few training data for German and French hosts

Results:

Methods performs best for Educational/Research (NDCG 0.688), Commercial (0.694), and Personal/Leisure (0.583)

English quality task: NDCG 0.844

Multilingual quality task: Use topic independent features from English hosts

German: NDCG 0.792 French: NDCG: 0.823

[Lex et al., 2010]. Assessing the quality of Web content. In Proceedings of the ECML/PKDD Discovery Challenge.

Page 25: Information Quality Assessment in the WIQ-EI EU Project

25

www.know-center.at

Agenda

The Know-Center

Why Information Quality in Social Media?

Selected Results

Conclusion

Page 26: Information Quality Assessment in the WIQ-EI EU Project

26

www.know-center.at

ConclusionsSummary

Information Quality (IQ) consists of multiple dimensions

Depends on Use Case

BUT: Several dimensions are commonly agreed upon

IQ dimensions can be combined in one quality score

Supervised Classification often used to assess IQ

However, training data needed!

Simple style based features suited to assess IQ dimensions

Page 27: Information Quality Assessment in the WIQ-EI EU Project

27

www.know-center.at

Thank you for your attention!