33
Tiziano Piccardi, Michele Catasta, Leila Zia, Robert West

Tiziano Piccardi, Michele Catasta, Leila Zia, Robert West · Awards: 2.45... Merging phase Learning2Rank: weighted sum sort Categories_of(Leonardo DiCaprio) = {American_male_film_actors,

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Tiziano Piccardi, Michele Catasta, Leila Zia, Robert West · Awards: 2.45... Merging phase Learning2Rank: weighted sum sort Categories_of(Leonardo DiCaprio) = {American_male_film_actors,

Tiziano Piccardi, Michele Catasta, Leila Zia, Robert West

Page 2: Tiziano Piccardi, Michele Catasta, Leila Zia, Robert West · Awards: 2.45... Merging phase Learning2Rank: weighted sum sort Categories_of(Leonardo DiCaprio) = {American_male_film_actors,
Page 3: Tiziano Piccardi, Michele Catasta, Leila Zia, Robert West · Awards: 2.45... Merging phase Learning2Rank: weighted sum sort Categories_of(Leonardo DiCaprio) = {American_male_film_actors,

Poor structureLimited content

No referencesNot clear

Not integrated

Page 4: Tiziano Piccardi, Michele Catasta, Leila Zia, Robert West · Awards: 2.45... Merging phase Learning2Rank: weighted sum sort Categories_of(Leonardo DiCaprio) = {American_male_film_actors,
Page 5: Tiziano Piccardi, Michele Catasta, Leila Zia, Robert West · Awards: 2.45... Merging phase Learning2Rank: weighted sum sort Categories_of(Leonardo DiCaprio) = {American_male_film_actors,

?

Page 6: Tiziano Piccardi, Michele Catasta, Leila Zia, Robert West · Awards: 2.45... Merging phase Learning2Rank: weighted sum sort Categories_of(Leonardo DiCaprio) = {American_male_film_actors,

Help to structure the

content

Page 7: Tiziano Piccardi, Michele Catasta, Leila Zia, Robert West · Awards: 2.45... Merging phase Learning2Rank: weighted sum sort Categories_of(Leonardo DiCaprio) = {American_male_film_actors,

We use the content of the articleto generate recommendations

We use the category networkto generate recommendations

Page 8: Tiziano Piccardi, Michele Catasta, Leila Zia, Robert West · Awards: 2.45... Merging phase Learning2Rank: weighted sum sort Categories_of(Leonardo DiCaprio) = {American_male_film_actors,
Page 9: Tiziano Piccardi, Michele Catasta, Leila Zia, Robert West · Awards: 2.45... Merging phase Learning2Rank: weighted sum sort Categories_of(Leonardo DiCaprio) = {American_male_film_actors,

Latent Dirichlet Allocation (LDA)200 topics

Topic 1 Topic 2 Topic 3

Article topics0.1 0.05 0.7

+ + + ...

Final recommendation

Topic 1 (0.1)Topic 2 (0.05)Topic 3 (0.7)...

Page 10: Tiziano Piccardi, Michele Catasta, Leila Zia, Robert West · Awards: 2.45... Merging phase Learning2Rank: weighted sum sort Categories_of(Leonardo DiCaprio) = {American_male_film_actors,

Collaborative Filtering

Based on matrix factorization with Alternating Least Squares

One row per article and one column per section

1 if the section S appears in the article A

Page 11: Tiziano Piccardi, Michele Catasta, Leila Zia, Robert West · Awards: 2.45... Merging phase Learning2Rank: weighted sum sort Categories_of(Leonardo DiCaprio) = {American_male_film_actors,

Limitation:

The article-based approach cannot generate recommendations for new

articles!

Page 12: Tiziano Piccardi, Michele Catasta, Leila Zia, Robert West · Awards: 2.45... Merging phase Learning2Rank: weighted sum sort Categories_of(Leonardo DiCaprio) = {American_male_film_actors,
Page 13: Tiziano Piccardi, Michele Catasta, Leila Zia, Robert West · Awards: 2.45... Merging phase Learning2Rank: weighted sum sort Categories_of(Leonardo DiCaprio) = {American_male_film_actors,

Intuition:

Articles in the same category share similar

sections

We can use the categories to generate templates for

the editorsCategory:American epic

films

{Plot, Cast, Production}

Page 14: Tiziano Piccardi, Michele Catasta, Leila Zia, Robert West · Awards: 2.45... Merging phase Learning2Rank: weighted sum sort Categories_of(Leonardo DiCaprio) = {American_male_film_actors,

Taxonomic assumption

Categories are organised in a hierarchical structure

Frequent sections on the children may be relevant

for the parent

Page 15: Tiziano Piccardi, Michele Catasta, Leila Zia, Robert West · Awards: 2.45... Merging phase Learning2Rank: weighted sum sort Categories_of(Leonardo DiCaprio) = {American_male_film_actors,

Wait, it’s not so easy…

Page 16: Tiziano Piccardi, Michele Catasta, Leila Zia, Robert West · Awards: 2.45... Merging phase Learning2Rank: weighted sum sort Categories_of(Leonardo DiCaprio) = {American_male_film_actors,

Government ➜ Public administration ➜ Public economics ➜ Economic policy ➜ Government

Peter Eades, Xuemin Lin, W.F. Smyth

Removed 4k edges out of ~4M

Page 17: Tiziano Piccardi, Michele Catasta, Leila Zia, Robert West · Awards: 2.45... Merging phase Learning2Rank: weighted sum sort Categories_of(Leonardo DiCaprio) = {American_male_film_actors,

Categories with heterogeneous articlesmust be removed

Page 18: Tiziano Piccardi, Michele Catasta, Leila Zia, Robert West · Awards: 2.45... Merging phase Learning2Rank: weighted sum sort Categories_of(Leonardo DiCaprio) = {American_male_film_actors,

Distribution of the article types in a category

We assigned 55 top level types to the articles

Category C

Page 19: Tiziano Piccardi, Michele Catasta, Leila Zia, Robert West · Awards: 2.45... Merging phase Learning2Rank: weighted sum sort Categories_of(Leonardo DiCaprio) = {American_male_film_actors,

to select the categories to keep

in the network

0.774

0.568

Page 20: Tiziano Piccardi, Michele Catasta, Leila Zia, Robert West · Awards: 2.45... Merging phase Learning2Rank: weighted sum sort Categories_of(Leonardo DiCaprio) = {American_male_film_actors,
Page 21: Tiziano Piccardi, Michele Catasta, Leila Zia, Robert West · Awards: 2.45... Merging phase Learning2Rank: weighted sum sort Categories_of(Leonardo DiCaprio) = {American_male_film_actors,

P(S1 | CAT1) = 2/7

Example:

American_male_film_actorsFilmography: 0.59

Career: 0.47Personal life: 0.38

Filmography appears in 59% of the articles in the category

“American_male_film_actors”

Probability P(S | C) of observing section S in category C

Category–Section counts

Page 22: Tiziano Piccardi, Michele Catasta, Leila Zia, Robert West · Awards: 2.45... Merging phase Learning2Rank: weighted sum sort Categories_of(Leonardo DiCaprio) = {American_male_film_actors,

Collaborative Filtering

Based on matrix factorization with Alternating Least Squares

One row per category and one column per section

Ratings defined asP(S1 | CAT1)

Page 23: Tiziano Piccardi, Michele Catasta, Leila Zia, Robert West · Awards: 2.45... Merging phase Learning2Rank: weighted sum sort Categories_of(Leonardo DiCaprio) = {American_male_film_actors,

American_male_film_actorsFilmography: 0.59Career: 0.47Personal life: 0.38...

American_film_producersFilmography: 0.39Career: 0.34Personal life: 0.26...

Living_peopleCareer: 0.25Personal life: 0.17Biography: 0.13...

Cj ∈ Categories_of(Leonardo

DiCaprio)Leonardo DiCaprio

Career: 9.98Filmography: 9.51Personal life: 9.48

Early life: 8.40Awards: 2.45

...

Merging phase

Learning2Rank: weighted sum

sort

Categories_of(Leonardo DiCaprio) = {American_male_film_actors,

American_film_producersLiving_people}

Page 24: Tiziano Piccardi, Michele Catasta, Leila Zia, Robert West · Awards: 2.45... Merging phase Learning2Rank: weighted sum sort Categories_of(Leonardo DiCaprio) = {American_male_film_actors,

Evaluation

English Wikipedia - September 20175.5M articles

300K sections

Page 25: Tiziano Piccardi, Michele Catasta, Leila Zia, Robert West · Awards: 2.45... Merging phase Learning2Rank: weighted sum sort Categories_of(Leonardo DiCaprio) = {American_male_film_actors,

Collaborative filtering

Precision < 0.2%Recall < 1.5%

Cold start problem: in average 3.4 sections

Page 26: Tiziano Piccardi, Michele Catasta, Leila Zia, Robert West · Awards: 2.45... Merging phase Learning2Rank: weighted sum sort Categories_of(Leonardo DiCaprio) = {American_male_film_actors,

Topic modeling

Precision@10 = 6%(upper bound 28%)

Recall@10 = 26%(upper bound 98%)

Page 27: Tiziano Piccardi, Michele Catasta, Leila Zia, Robert West · Awards: 2.45... Merging phase Learning2Rank: weighted sum sort Categories_of(Leonardo DiCaprio) = {American_male_film_actors,

Collaborative filtering

Precision@10 = 13%(upper bound 28%)

Recall@10 = 49%(upper bound 98%)

Page 28: Tiziano Piccardi, Michele Catasta, Leila Zia, Robert West · Awards: 2.45... Merging phase Learning2Rank: weighted sum sort Categories_of(Leonardo DiCaprio) = {American_male_film_actors,

Category–Section counts

Precision@10 = 20%(upper bound 28%)

Recall@10 = 72%(upper bound 98%)

Page 29: Tiziano Piccardi, Michele Catasta, Leila Zia, Robert West · Awards: 2.45... Merging phase Learning2Rank: weighted sum sort Categories_of(Leonardo DiCaprio) = {American_male_film_actors,

Automatic evaluation has limitations!

The testing set contains articles with the problem we want to solve

few sections | inconsistent | different syntax

Human evaluation

Wikipedia editors Crowd-workers

Page 30: Tiziano Piccardi, Michele Catasta, Leila Zia, Robert West · Awards: 2.45... Merging phase Learning2Rank: weighted sum sort Categories_of(Leonardo DiCaprio) = {American_male_film_actors,
Page 31: Tiziano Piccardi, Michele Catasta, Leila Zia, Robert West · Awards: 2.45... Merging phase Learning2Rank: weighted sum sort Categories_of(Leonardo DiCaprio) = {American_male_film_actors,

Category–Section counts

Wikipedia editors:Precision@10 = 72%

Crowd-workers: Precision@10 = 81%

Page 32: Tiziano Piccardi, Michele Catasta, Leila Zia, Robert West · Awards: 2.45... Merging phase Learning2Rank: weighted sum sort Categories_of(Leonardo DiCaprio) = {American_male_film_actors,

● Introduced the section recommendation problem● Explored several methods using

○ features derived from the raw input article○ Wikipedia’s category network

● Learned that category network is key in offering useful recommendations

● We developed a methodology to prune the category network

https://github.com/epfl-dlab/structuring-wikipedia-articles

https://meta.wikimedia.org/wiki/Recommendation_API

Page 33: Tiziano Piccardi, Michele Catasta, Leila Zia, Robert West · Awards: 2.45... Merging phase Learning2Rank: weighted sum sort Categories_of(Leonardo DiCaprio) = {American_male_film_actors,

[email protected]

@tizianopiccardi

https://github.com/epfl-dlab/structuring-wikipedia-articles

https://meta.wikimedia.org/wiki/Recommendation_API