Fragrances Grimal 15-11-10

Embed Size (px)

Citation preview

  • 8/8/2019 Fragrances Grimal 15-11-10

    1/50

    Reunion FRAGRANCES

    Clement Grimal (sup. Eric Gaussier et Gilles Bisson)

    15 novembre 2010

  • 8/8/2019 Fragrances Grimal 15-11-10

    2/50

    Outline

    1 Improvements of -Sim

    2 Experiments

    3 Tensorial extension

  • 8/8/2019 Fragrances Grimal 15-11-10

    3/50

    Improvements of -Sim Experiments Tensorial extension

    The text mining context

    Document#1:

    A contruction found in villagesand in the suburbs of biggertown, used to house a family.

    Document#2:

    A building which main purposeis to provide accomodation tohuman beings.

    Clement Grimal (LIG) Reunion FRAGRANCES 15 novembre 2010 3 / 25

  • 8/8/2019 Fragrances Grimal 15-11-10

    4/50

    Improvements of -Sim Experiments Tensorial extension

    The text mining context

    Document#1:

    A contruction found in villagesand in the suburbs of biggertown, used to house a family.

    Document#2:

    A building which main purposeis to provide accomodation tohuman beings.

    With a classical clustering approach:No shared terms between the two documents

    Similarity(Document#1, Document#2) = 0

    Clement Grimal (LIG) Reunion FRAGRANCES 15 novembre 2010 3 / 25

  • 8/8/2019 Fragrances Grimal 15-11-10

    5/50

    Improvements of -Sim Experiments Tensorial extension

    The text mining context

    Document#1:

    A contruction found in villagesand in the suburbs of biggertown, used to house a family.

    Document#2:

    A building which main purposeis to provide accomodation tohuman beings.

    With a classical clustering approach:No shared terms between the two documents

    Similarity(Document#1, Document#2) = 0

    Using co-clustering:Clustering of the terms

    Similarity(Document#1, Document#2) > 0

    Clement Grimal (LIG) Reunion FRAGRANCES 15 novembre 2010 3 / 25

    I f S E i T i l i

  • 8/8/2019 Fragrances Grimal 15-11-10

    6/50

    Improvements of -Sim Experiments Tensorial extension

    Notations

    M: documents/words matrix of r rows and c columns

    mi: = [mi1 . . .mic]: row vector describing document i

    m:j = [m1j . . .mrj ]: column vector describing word j

    SR: square similarity matrix (documents) of size r, with srij [0, 1]

    SC: square similarity matrix (words) of size c, with scij [0, 1]

    Fs(mij , mkl) [0, 1]: similarity function between two elements of the datamatrix M

    Clement Grimal (LIG) Reunion FRAGRANCES 15 novembre 2010 4 / 25

    I t f S E i t T i l t i

  • 8/8/2019 Fragrances Grimal 15-11-10

    7/50

    Improvements of -Sim Experiments Tensorial extension

    Notations

    M: documents/words matrix of r rows and c columns

    mi: = [mi1 . . .mic]: row vector describing document i

    m:j = [m1j . . .mrj ]: column vector describing word j

    SR: square similarity matrix (documents) of size r, with srij [0, 1]

    SC: square similarity matrix (words) of size c, with scij [0, 1]

    Fs(mij , mkl) [0, 1]: similarity function between two elements of the datamatrix M

    Two documents are similar if they contain similar words. Two words are similar if they appear in similar documents.

    Clement Grimal (LIG) Reunion FRAGRANCES 15 novembre 2010 4 / 25

    Improvements of Sim Experiments Tensorial extension

  • 8/8/2019 Fragrances Grimal 15-11-10

    8/50

    Improvements of -Sim Experiments Tensorial extension

    Notations

    M: documents/words matrix of r rows and c columns

    mi: = [mi1 . . .mic]: row vector describing document i

    m:j = [m1j . . .mrj ]: column vector describing word j

    SR: square similarity matrix (documents) of size r, with srij [0, 1]

    SC: square similarity matrix (words) of size c, with scij [0, 1]

    Fs(mij , mkl) [0, 1]: similarity function between two elements of the datamatrix M

    Two documents are similar if they contain similar words. Two words are similar if they appear in similar documents.

    Joint construction of the two similarity matrices SR and SC.

    Clement Grimal (LIG) Reunion FRAGRANCES 15 novembre 2010 4 / 25

    Improvements of Sim Experiments Tensorial extension

  • 8/8/2019 Fragrances Grimal 15-11-10

    9/50

    Improvements of -Sim Experiments Tensorial extension

    Similarity between two documents

    Classical approach: similarity = f(shared words)

    Sim(mi:,mj:) = Fs(mi1, mj1) + + Fs(mic, mjc)

    Clement Grimal (LIG) Reunion FRAGRANCES 15 novembre 2010 5 / 25

  • 8/8/2019 Fragrances Grimal 15-11-10

    10/50

    Improvements of -Sim Experiments Tensorial extension

  • 8/8/2019 Fragrances Grimal 15-11-10

    11/50

    p S m p

    Similarity between two documents

    Classical approach: similarity = f(shared words)

    Sim(mi:,mj:) = Fs(mi1, mj1) + + Fs(mic, mjc)

    Using SC (usually, scii = 1) and the k-norm:

    Sim(mi:,mj:) =k

    c

    l=1

    (Fs(mil, mjl))k scll

    Now comparing every pair of words:

    Simk(mi:,mj:) =k

    c

    l=1

    cn=1

    (Fs (mil, mjn))k scln

    Clement Grimal (LIG) Reunion FRAGRANCES 15 novembre 2010 5 / 25

    Improvements of -Sim Experiments Tensorial extension

  • 8/8/2019 Fragrances Grimal 15-11-10

    12/50

    p p

    If Fs(mij , mkl) = mij mkl:

    Simk(mi:,mj:) =k

    (mi:)k SC mT

    j:

    k

    Clement Grimal (LIG) Reunion FRAGRANCES 15 novembre 2010 6 / 25

    Improvements of -Sim Experiments Tensorial extension

  • 8/8/2019 Fragrances Grimal 15-11-10

    13/50

    If Fs(mij , mkl) = mij mkl:

    Simk(mi:,mj:) =k

    (mi:)k SC mT

    j:

    k

    Then we need to normalize this similarity:

    srij =

    k(mi:)

    k

    SCm

    T

    j:k

    N(mi:,mj:) [0, 1]

    Clement Grimal (LIG) Reunion FRAGRANCES 15 novembre 2010 6 / 25

    Improvements of -Sim Experiments Tensorial extension

  • 8/8/2019 Fragrances Grimal 15-11-10

    14/50

    If Fs(mij , mkl) = mij mkl:

    Simk(mi:,mj:) =k

    (mi:)k SC mT

    j:

    k

    Then we need to normalize this similarity:

    srij =

    k(mi:)

    k

    SCm

    T

    j:k

    N(mi:,mj:) [0, 1]

    With special values for k, SC and N, we have: Jaccard: SC = I, k = 1, N = mi:1 + mj:1 mi:m

    T

    j:

    Dice: SC = 2I, k = 1, N = mi:1 + mj:1

    Generalized Cosine: SC > 0, k = 1, N =mi:SC

    mj:SC

    Classical -Sim : k = 1, N = |mi:| |mj:|

    Clement Grimal (LIG) Reunion FRAGRANCES 15 novembre 2010 6 / 25

    Improvements of -Sim Experiments Tensorial extension

  • 8/8/2019 Fragrances Grimal 15-11-10

    15/50

    Parameter k

    The generalized -Sim

    i, j 1..r, srij =Simk(mi:,mj:)

    Simk(mi:,mi:)

    Simk(mj:,mj:)

    i, j 1..c, scij =Simk(m:i,m:j)

    Simk(m:i,m:i)

    Simk(m:j ,m:j)

    Clement Grimal (LIG) Reunion FRAGRANCES 15 novembre 2010 7 / 25

    Improvements of -Sim Experiments Tensorial extension

  • 8/8/2019 Fragrances Grimal 15-11-10

    16/50

    Parameter k

    The generalized -Sim

    i, j 1..r, srij =Simk(mi:,mj:)

    Simk(mi:,mi:)

    Simk(mj:,mj:)

    i, j 1..c, scij =Simk(m:i,m:j)

    Simk(m:i,m:i)

    Simk(m:j ,m:j)

    For k = 1, SR and SC are not positive semi-definite...

    Clement Grimal (LIG) Reunion FRAGRANCES 15 novembre 2010 7 / 25

    Improvements of -Sim Experiments Tensorial extension

  • 8/8/2019 Fragrances Grimal 15-11-10

    17/50

    Parameter k

    The generalized -Sim

    i, j 1..r, srij =Simk(mi:,mj:)

    Simk(mi:,mi:)

    Simk(mj:,mj:)

    i, j 1..c, scij =Simk(m:i,m:j)

    Simk(m:i,m:i)

    Simk(m:j ,m:j)

    For k = 1, SR and SC are not positive semi-definite...We are not defining aninner product so it is not a generalized cosine measure...

    Clement Grimal (LIG) Reunion FRAGRANCES 15 novembre 2010 7 / 25

    Improvements of -Sim Experiments Tensorial extension

  • 8/8/2019 Fragrances Grimal 15-11-10

    18/50

    Another step: pruning

    In such a corpus...

    Many words are not specific enough, and creates a lot of irrelevant similarities.

    These similarities can be considered as noise.

    Clement Grimal (LIG) Reunion FRAGRANCES 15 novembre 2010 8 / 25

    Improvements of -Sim Experiments Tensorial extension

  • 8/8/2019 Fragrances Grimal 15-11-10

    19/50

    Another step: pruning

    In such a corpus...

    Many words are not specific enough, and creates a lot of irrelevant similarities.

    These similarities can be considered as noise.

    How to deal with it?

    Hypothetis: these irrelevant similarities are small.

    At each iteration, we remove the smallest p% of the similarity matrices.

    Clement Grimal (LIG) Reunion FRAGRANCES 15 novembre 2010 8 / 25

    Improvements of -Sim Experiments Tensorial extension

  • 8/8/2019 Fragrances Grimal 15-11-10

    20/50

    Meaning of an iteration

    Clement Grimal (LIG) Reunion FRAGRANCES 15 novembre 2010 9 / 25

    Improvements of -Sim Experiments Tensorial extension

  • 8/8/2019 Fragrances Grimal 15-11-10

    21/50

    Meaning of an iteration

    Clement Grimal (LIG) Reunion FRAGRANCES 15 novembre 2010 9 / 25

    Improvements of -Sim Experiments Tensorial extension

  • 8/8/2019 Fragrances Grimal 15-11-10

    22/50

    Meaning of an iteration

    Clement Grimal (LIG) Reunion FRAGRANCES 15 novembre 2010 9 / 25

    Improvements of -Sim Experiments Tensorial extension

  • 8/8/2019 Fragrances Grimal 15-11-10

    23/50

    Meaning of an iteration

    Clement Grimal (LIG) Reunion FRAGRANCES 15 novembre 2010 9 / 25

    Improvements of -Sim Experiments Tensorial extension

  • 8/8/2019 Fragrances Grimal 15-11-10

    24/50

    Methods

    Five similarity measures -Sim (with or without k and p) [Hussain et al.(2010)] Cosine LSA (Latent Semantic Analysis) [Deerwester et al.(1990)] SNOS (Similarity in Non-Othogonal Space) [Liu et al.(2004)] CTK (Commute Time Kernel) [Yen et al.(2009)]

    + Ascendant Hierarchical Clustering, with Wards index

    Three co-clustering methods ITCC (Information Theoric Co-Clustering) [Dhillon et al.(2003)] BVD (Block Value Decomposition) [Long et al.(2005)] RSN (k-partite graph partioning algorithm) [Long et al.(2006)]

    Clement Grimal (LIG) Reunion FRAGRANCES 15 novembre 2010 10 / 25

    Improvements of -Sim Experiments Tensorial extension

  • 8/8/2019 Fragrances Grimal 15-11-10

    25/50

    Methodology and Data

    Methodology We randomly select subsets of documents already labeled

    We measure the quality of the clusters using the micro-averaged precision

    The subsets:Name Newsgroups included #clusters. #docs.

    M2 talk.politics.mideast, talk.politics.misc 2 500M5 comp.graphics, rec.motorcycles, rec.sport.baseball, sci.space,

    talk.politics.mideast5 500

    M10 alt.atheism, comp.sys.mac.hardware, misc.forsale, rec.autos,rec.sport.hockey, sci.crypt, sci.electronics, sci.med, sci.space,talk.politics.gun

    10 500

    NG1 rec.sports.baseball, rec.sports.hockey 2 400NG2 comp.os.ms-windows.misc, comp.windows.x, rec.motorcycles,

    sci.crypt, sci.space5 1000

    NG3 comp.os.ms-windows.misc, comp.windows.x, misc.forsale,rec.motorcycles, sci.crypt, sci.space, talk.politics.mideast,talk.religion.misc

    8 1600

    Clement Grimal (LIG) Reunion FRAGRANCES 15 novembre 2010 11 / 25

    Improvements of -Sim Experiments Tensorial extension

  • 8/8/2019 Fragrances Grimal 15-11-10

    26/50

    Results

    M2 M5 M10 NG1 NG2 NG3

    Cosine Pr 0.600.00 0.630.07 0.490.06 0.900.11 0.600.10 0.590.04

    LSA Pr 0.920.02 0.870.06 0.590.07 0.960.01 0.820.03 0.740.03

    ITCC Pr 0.790.06 0.490.10 0.290.02 0.690.09 0.630.06 0.590.05

    BVD Pr best: 0.95 best: 0.93 best: 0.67 - - -

    RSN NMI - - - 0.640.16 0.750.07 0.700.04

    SNOS Pr 0.550.02 0.250.02 0.240.06 0.510.01 0.240.02 0.220.05

    CTK Pr 0.940.01 0.940.01 0.710.01 0.960.01 0.900.01 0.870.01

    CTK* Pr 0.95 0.95 0.80 0.98 0.91 0.87

    -SimPr 0.910.09 0.960.00 0.690.05 0.960.01 0.920.01 0.790.06

    NMI 0.760.06 0.790.02 0.720.03

    -SimpPr 0.940.01 0.960.00 0.730.03 0.970.01 0.920.01 0.840.05

    NMI 0.780.05 0.790.02 0.730.02

    -Sim1 Pr 0.95 0.00 0.960.02 0.780.03 0.970.02 0.94 0.01 0.860.05

    NMI 0.850.07 0.830.03 0.790.03

    -Sim1pPr 0.95 0.00 0.97 0.01 0.780.03 0.98 0.01 0.94 0.01 0.870.05

    NMI 0.860.04 0.830.03 0.800.02

    -Sim0.8Pr 0.95 0.00 0.97 0.01 0.790.02 0.98 0.01 0.94 0.01 0.90 0.01

    NMI 0.870.05 0.840.02 0.810.02

    -Sim0.8pPr 0.95 0.00 0.97 0.01 0.80 0.04 0.98 0.00 0.94 0.01 0.90 0.02

    NMI 0.880.03 0.850.02 0.810.03

    Clement Grimal (LIG) Reunion FRAGRANCES 15 novembre 2010 12 / 25

    Improvements of -Sim Experiments Tensorial extension

  • 8/8/2019 Fragrances Grimal 15-11-10

    27/50

    An other preprocessing

    Too many words...

    Initially, we have 100.000 words, which is too much, both consideringspatial and time complexity.

    We decided to select only 2.000 words.

    Previsously, it was done using mutual information and the documentslabels...

    Clement Grimal (LIG) Reunion FRAGRANCES 15 novembre 2010 13 / 25

    Improvements of -Sim Experiments Tensorial extension

  • 8/8/2019 Fragrances Grimal 15-11-10

    28/50

    An other preprocessing

    Too many words...

    Initially, we have 100.000 words, which is too much, both consideringspatial and time complexity.

    We decided to select only 2.000 words.

    Previsously, it was done using mutual information and the documentslabels... A supervised preprocessing for an unsupervised task?!

    Clement Grimal (LIG) Reunion FRAGRANCES 15 novembre 2010 13 / 25

    Improvements of -Sim Experiments Tensorial extension

  • 8/8/2019 Fragrances Grimal 15-11-10

    29/50

    An other preprocessing

    Too many words...

    Initially, we have 100.000 words, which is too much, both consideringspatial and time complexity.

    We decided to select only 2.000 words.

    Previsously, it was done using mutual information and the documentslabels... A supervised preprocessing for an unsupervised task?!

    Unsupervised feature selection

    We decided to choose the 2.000 words with the PAM (Partitioning AroundMedods) algorithm.

    Clement Grimal (LIG) Reunion FRAGRANCES 15 novembre 2010 13 / 25

  • 8/8/2019 Fragrances Grimal 15-11-10

    30/50

    Improvements of -Sim Experiments Tensorial extension

  • 8/8/2019 Fragrances Grimal 15-11-10

    31/50

    Influence ofk

    Clement Grimal (LIG) Reunion FRAGRANCES 15 novembre 2010 15 / 25

    Improvements of -Sim Experiments Tensorial extension

  • 8/8/2019 Fragrances Grimal 15-11-10

    32/50

    Influence ofp

    Clement Grimal (LIG) Reunion FRAGRANCES 15 novembre 2010 16 / 25

    Improvements of -Sim Experiments Tensorial extension

  • 8/8/2019 Fragrances Grimal 15-11-10

    33/50

    Conclusion for -Sim

    Generalization of the method

    Exploration of different normed spaces (k)

    Pruning of the similarity matrices (p)

    Very good experimental results

    Clement Grimal (LIG) Reunion FRAGRANCES 15 novembre 2010 17 / 25

  • 8/8/2019 Fragrances Grimal 15-11-10

    34/50

    Improvements of -Sim Experiments Tensorial extension

  • 8/8/2019 Fragrances Grimal 15-11-10

    35/50

    Preliminaries

    DataFolksonomy data with users, ressources and tags:

    Mi,j,k = 1 if user i describes ressource j using tag k.

    Clement Grimal (LIG) Reunion FRAGRANCES 15 novembre 2010 18 / 25

    Improvements of -Sim Experiments Tensorial extension

  • 8/8/2019 Fragrances Grimal 15-11-10

    36/50

    Preliminaries

    DataFolksonomy data with users, ressources and tags:

    Mi,j,k = 1 if user i describes ressource j using tag k.

    Additonal notations M: data tensor describing the relation between users, ressources and tags.

    Mi::: slice of the tensor describing user i M:j:: slice of the tensor describing ressource j M::k: slice of the tensor describing tag k

    SU: square similarity matrix for users

    SR: square similarity matrix for ressources

    ST: square similarity matrix for tags

    Clement Grimal (LIG) Reunion FRAGRANCES 15 novembre 2010 18 / 25

  • 8/8/2019 Fragrances Grimal 15-11-10

    37/50

    Improvements of -Sim Experiments Tensorial extension

    T i l i f S

  • 8/8/2019 Fragrances Grimal 15-11-10

    38/50

    Tensorial extension of -Sim

    -Sim computes similarities between vectors:

    Having a data tensor users, ressources, tags, we want to compute thesimilarities between users, described by matrices (slices of the tensor). Rows areressourcres and columns are tags.

    Clement Grimal (LIG) Reunion FRAGRANCES 15 novembre 2010 19 / 25

    Improvements of -Sim Experiments Tensorial extension

    Th i l h

  • 8/8/2019 Fragrances Grimal 15-11-10

    39/50

    The simplest approach

    Two users are similar if they use the same tags to describe the same ressources.

    Sim(ui, ui ) =

    nr

    j=1

    nt

    k=1

    Fs(mijk ,mijk) srjj stkk

    =

    nr

    j=1

    nt

    k=1

    Fs(mijk ,mijk)

    Clement Grimal (LIG) Reunion FRAGRANCES 15 novembre 2010 20 / 25

    Improvements of -Sim Experiments Tensorial extension

    Th i l t h

  • 8/8/2019 Fragrances Grimal 15-11-10

    40/50

    The simplest approach

    Two users are similar if they use the same tags to describe the same ressources.

    Sim(ui, ui ) =

    nr

    j=1

    nt

    k=1

    Fs(mijk ,mijk) srjj stkk

    =

    nr

    j=1

    nt

    k=1

    Fs(mijk ,mijk)

    Complexity: n2 comparisons for every pair of users n4

    Clement Grimal (LIG) Reunion FRAGRANCES 15 novembre 2010 20 / 25

    Improvements of -Sim

    Experiments Tensorial extension

    Th f ll h

  • 8/8/2019 Fragrances Grimal 15-11-10

    41/50

    The full approach

    Two users are similar if they use similar tags to describe similar ressources.

    Sim(ui, u

    i

    ) =nr

    j=1

    nt

    k=1

    nr

    j=1

    nt

    k=1

    Fs(mijk ,mijk) srjj stkk

    Clement Grimal (LIG) Reunion FRAGRANCES 15 novembre 2010 21 / 25

    Improvements of -Sim

    Experiments Tensorial extension

    The full approach

  • 8/8/2019 Fragrances Grimal 15-11-10

    42/50

    The full approach

    Two users are similar if they use similar tags to describe similar ressources.

    Sim(ui, u

    i

    ) =nr

    j=1

    nt

    k=1

    nr

    j=1

    nt

    k=1

    Fs(mijk ,mijk) srjj stkk

    Complexity: n4 comparisons for every pair of users n6

    Clement Grimal (LIG) Reunion FRAGRANCES 15 novembre 2010 21 / 25

    Improvements of -Sim

    Experiments Tensorial extension

    The intermediate approach

  • 8/8/2019 Fragrances Grimal 15-11-10

    43/50

    The intermediate approach

    Two users are similar if they use similar tags to describe the same ressources.

    Sim(ui, ui ) =

    nr

    j=1

    nt

    k=1

    nt

    k=1

    Fs(mijk ,mijk ) srjj stkk

    =

    nr

    j=1

    nt

    k=1

    nt

    k=1

    Fs(mijk ,mijk ) stkk

    Complexity: n3 comparisons for every pair of users n5

    Clement Grimal (LIG) Reunion FRAGRANCES 15 novembre 2010 22 / 25

    Improvements of -Sim

    Experiments Tensorial extension

    The intermediate approach

  • 8/8/2019 Fragrances Grimal 15-11-10

    44/50

    The intermediate approach

    Two users are similar if they use the same tags to describe similar ressources.

    Sim(ui, ui ) =

    nr

    j=1

    nt

    k=1

    nr

    j=1

    Fs(mijk ,mijk) srjj stkk

    =

    nr

    j=1

    nt

    k=1

    nr

    j=1

    Fs(mijk ,mijk) srjj

    Complexity: n3 comparisons for every pair of users n5

    Clement Grimal (LIG) Reunion FRAGRANCES 15 novembre 2010 22 / 25

    Improvements of -Sim

    Experiments Tensorial extension

    A few ideas

  • 8/8/2019 Fragrances Grimal 15-11-10

    45/50

    A few ideas...

    By considering that Fs(, ) is just the product, we should be able to reducethe time complexity (as for -Sim actually).

    Clement Grimal (LIG) Reunion FRAGRANCES 15 novembre 2010 23 / 25

    Improvements of -Sim

    Experiments Tensorial extension

    A few ideas

  • 8/8/2019 Fragrances Grimal 15-11-10

    46/50

    A few ideas...

    By considering that Fs(, ) is just the product, we should be able to reducethe time complexity (as for -Sim actually).

    It would be interesting to use the full approach on users, and only thesimplest or intermediate approach on ressources and tags.

    Clement Grimal (LIG) Reunion FRAGRANCES 15 novembre 2010 23 / 25

    Improvements of-Sim

    Experiments Tensorial extension

    A few ideas

  • 8/8/2019 Fragrances Grimal 15-11-10

    47/50

    A few ideas...

    By considering that Fs(, ) is just the product, we should be able to reducethe time complexity (as for -Sim actually).

    It would be interesting to use the full approach on users, and only thesimplest or intermediate approach on ressources and tags.

    Maybe SR (and ST) should be computed beforehand with a specificmeasure depending on the type of the ressources (images, etc.).

    Clement Grimal (LIG) Reunion FRAGRANCES 15 novembre 2010 23 / 25

    Improvements of -Sim Experiments Tensorial extension

    A few ideas

  • 8/8/2019 Fragrances Grimal 15-11-10

    48/50

    A few ideas...

    By considering that Fs(, ) is just the product, we should be able to reducethe time complexity (as for -Sim actually).

    It would be interesting to use the full approach on users, and only thesimplest or intermediate approach on ressources and tags.

    Maybe SR (and ST) should be computed beforehand with a specificmeasure depending on the type of the ressources (images, etc.).

    Finding a good normalization scheme will be challenging...

    Clement Grimal (LIG) Reunion FRAGRANCES 15 novembre 2010 23 / 25

  • 8/8/2019 Fragrances Grimal 15-11-10

    49/50

    Thank you very much!

    Improvements of -Sim Experiments Tensorial extension

    References

  • 8/8/2019 Fragrances Grimal 15-11-10

    50/50

    References

    S. Deerwester, S. T. Dumais, G. W. Furnas, Thomas, and R. Harshman. Indexing by latent semantic

    analysis.Journal of the American Society for Information Science, 41:391407, 1990.

    I. S. Dhillon, S. Mallela, and D. S. Modha. Information-theoretic co-clustering.

    In Proceedings of the Ninth ACM SIGKDD, pages 8998, 2003.

    S. F. Hussain, C. Grimal, and G. Bisson. An improved co-similarity measure for document clustering.

    In International Conference on Machine Learning and Applications, 2010.

    N. Liu, B. Zhang, J. Yan, Q. Yang, S. Yan, Z. Chen, F. Bai, and W. ying Ma. Learning similarity

    measures in non-orthogonal space.In Proceedings of the 13th ACM CIKM, pages 334341. ACM Press, 2004.

    B. Long, Z. M. Zhang, and P. S. Yu. Co-clustering by block value decomposition.

    In Proceedings of the Eleventh ACM SIGKDD, pages 635640, New York, NY, USA, 2005. ACM.

    B. Long, Z. M. Zhang, X. Wu, and P. S. Yu. Spectral clustering for multi-type relational data.

    In ICML 06: Proceedings of the 23rd international conference on Machine learning, pages 585592,New York, NY, USA, 2006. ACM.

    L. Yen, F. Fouss, C. Decaestecker, P. Francq, and M. Saerens. Graph nodes clustering with the

    sigmoid commute-time kernel: A comparative study.Data Knowl. Eng., 68(3):338361, 2009.

    Clement Grimal (LIG) Reunion FRAGRANCES 15 novembre 2010 25 / 25