8
1 - Distributed Mining of Association Rules on Horizontally Partitioned Data Author: M. Kantarcioglu and . lifton !ource: "### $rans. on Kno%ledge and Data #ngineering& vol. 1'& no. (& )). 1*+'-1*, & +** !)ea/er: 0u-hiang i Date: 2ove3ber +'& +**

Association Rules on Horizontally Partitioned Data

Embed Size (px)

DESCRIPTION

horizontal partiov

Citation preview

  • Privacy-Preserving Distributed Mining of Association Rules on Horizontally Partitioned DataAuthor: M. Kantarcioglu and C. CliftonSource: IEEE Trans. on Knowledge and Data Engineering, vol. 16, no. 9, pp. 1026-1037, 2004Speaker: Yu-Chiang LiDate: November 26, 2004

  • OutlineIntroductionBackground and Related WorkSecure Association Rule MiningSecurity against CollusionDifficulties with the Two-Party CaseCommunication and Computation CostsConclusions

  • IntroductionGoal: Produce association rules while limiting the information shared about each siteProtects individual data privacy and discoveries global rulesNo site should be able to learn contents of a transaction at any other site

  • Related WorkFDM: Fast Distributed Mining of association rulesEx: 150 transactionsminSup=10%{A, B, C} {B, C, D} {E, F, G, H}Candidates of local large Candidates of global large

  • Secure Association Rule Mining

    {A, B, C, D, E, F, G, H}

    E1({AB, AC, BC, ...})

    3

    E2({BC, BD, CD,...})

    E3({EF, EG, EH, FG, FH, GH,...})

    {KL, KM, KN, KO, KP,...}

    2

    1

    E2E1({AB, AC, BC, ...})

    E3E2E1({AB, AC, BC, ...})

    E3E2({BC, BD, CD,...})

    E1E3E2({BC, BD, CD,...})

    E3E2E1({AB, AC, BC, ...})

    E3E2E1({AB, AC, BC, BD, CD,...})

  • {ABC:8}

    {ABC:6}

    3

    R=17

    S1=17+4-50*10% =16

    S2=16+6-50*10% =17

    2

    1

    S3=17+8-50*10% =20

    |DB1| = |DB2| = |DB3| =50minSup = 10%

    If S3>R Then {ABC} is large

    {ABC:4}

  • ConclusionsDistributed association rule mining can be done efficiently under reasonable security assumptionsAllowing error in mining results may enable more efficient algorithm that maintain the desired level of securityTo predict the value of information for a particular organization, allowing trade off between disclosure cost, computation cost, and benefit from the result

  • L1C2C2L2C3C3L3Apriori

    TIDItems100A C D200B C E300A B C E400B E

    ItemsetSup.{A}2{B}3{C}3{D}1{E}3

    ItemsetSup.{A}2{B}3{C}3{E}3

    Itemset{A B}{A C}{A E}{B C}{B E}{C E}

    ItemsetSup.{A B}1{A C}2{A E}1{B C}2{B E}3{C E}2

    ItemsetSup.{A C}2{B C}2{B E}3{C E}2

    Itemset{B C E}

    ItemsetSup.{B C E}2

    ItemsetSup.{B C E}2