07 Analyzing Privacy Enterprise Slide

Embed Size (px)

Citation preview

  • 8/8/2019 07 Analyzing Privacy Enterprise Slide

    1/21

    Bruno Ribeiro, Gerome Miklau, Don TowsleyUMass Amherst

    Weifeng Chen

    California University of Pennsylvania

    Analyzing Privacy in Enterprise Packet

    Trace Anonymization

  • 8/8/2019 07 Analyzing Privacy Enterprise Slide

    2/21

    2

    Bruno Ribeiro, Weifeng Chen, Gerome Miklau, and Don Towsley, Analyzing Privacy in Enterprise Packet Trace Anonymization

    Motivation

    InternetEnterprise(university)

    Monitor

    src address dest address src port dest port

    14.1.1.1 11.0.0.3 6738 80

    18.0.0.1 11.0.0.1 2434 22

    11.0.0.1 20.0.0.3 6913 80

    Packets

    Packet header traces

    Used for networking research

    Many public repositories (UMass, CAIDA, LBNL, )

    Raw trace may violate user privacy

    If enterprise IP addresses can be tied to individuals

  • 8/8/2019 07 Analyzing Privacy Enterprise Slide

    3/21

    3

    Bruno Ribeiro, Weifeng Chen, Gerome Miklau, and Don Towsley, Analyzing Privacy in Enterprise Packet Trace Anonymization

    Motivation

    src addr. dest addr. src port dest port

    14.1.1.1 11.0.0.3 6738 80

    11.0.0.1 20.0.0.3 7913 22

    src addr. dest addr. src port dest port

    200.0.1.2 128.0.64.2 6738 80 128.0.64.0 5.0.4.5 7913 22

    anonymizationmapping

    Anonymized

    trace

    Trace repositoriesAnonymize IP addresses

    Two most widely used schemes

    Full prefix preservation (Xuet al. , 2001)Partial prefix preservation (Pang et al. 2006)

    Original

    trace

  • 8/8/2019 07 Analyzing Privacy Enterprise Slide

    4/21

    4

    Bruno Ribeiro, Weifeng Chen, Gerome Miklau, and Don Towsley, Analyzing Privacy in Enterprise Packet Trace Anonymization

    Adversary

    Adversarial model:

    De-anonymizeenterprise IP addresses in the trace1. Probes (scan) enterprise network2. Collects similar information from the trace

    De-anonymizes trace IPsmatching (1) with (2)

  • 8/8/2019 07 Analyzing Privacy Enterprise Slide

    5/21

    5

    Bruno Ribeiro, Weifeng Chen, Gerome Miklau, and Don Towsley, Analyzing Privacy in Enterprise Packet Trace Anonymization

    Outline

    Our contributions

    New attack on IP anonymization:

    Attack overviewDefined as a tree editing distance problem

    Worst-case analysis:

    From a set of trace labels (information)Assesses worst-case attack

    Related work

    Conclusions

  • 8/8/2019 07 Analyzing Privacy Enterprise Slide

    6/21

    6

    Bruno Ribeiro, Weifeng Chen, Gerome Miklau, and Don Towsley, Analyzing Privacy in Enterprise Packet Trace Anonymization

    Proposed attack overview

    Adversary provides:

    Labeled treeconstructed using anonymizedtraceLabeled tree constructed from probingenterpriseA cost (or distance) function(to deal with mismatchedlabels)

    Our algorithm finds:

    All de-anonymizationsthatcomply withprefix preservation restrictionsandhave minimum total cost

    An instance of the tree edit distanceproblem

  • 8/8/2019 07 Analyzing Privacy Enterprise Slide

    7/21

    7

    Bruno Ribeiro, Weifeng Chen, Gerome Miklau, and Don Towsley, Analyzing Privacy in Enterprise Packet Trace Anonymization

    Full prefix preserving anonymization

    Full prefix preservation

    If two real addresses share first X bits, then

    the same two anonymizedaddresses share first X bitsIt imposes restrictions on the real IP AnonymizedIP mapping

  • 8/8/2019 07 Analyzing Privacy Enterprise Slide

    8/21

    8

    Bruno Ribeiro, Weifeng Chen, Gerome Miklau, and Don Towsley, Analyzing Privacy in Enterprise Packet Trace Anonymization

    Labeled trees

    Trace tree

    Match sets:

    00 maps to{01

    } 10 maps to {10, 11}

    Probed tree

    Web server

    Not a Web server

    Probed IP leaf labels

    No traffic on port 80

    Trace IP leaf labels

    00 01 10 11 00 01 10 11

    0

    0

    1

    1

    0 1

    Match set:

    00 maps to {00, 01, 10, 11} 10 maps to {00, 01, 10, 11}

    Traffic on port 80

  • 8/8/2019 07 Analyzing Privacy Enterprise Slide

    9/21

    9

    Bruno Ribeiro, Weifeng Chen, Gerome Miklau, and Don Towsley, Analyzing Privacy in Enterprise Packet Trace Anonymization

    Imperfect information

    Trace treeProbed tree

    Web server

    Not a Web server

    Traffic on port 80

    No traffic on port 80

    Backup Web server Correct mapping

    Other sources of imperfect labels: Dynamic IP addresses, host shutdown, etc.

    Probed IP leaf labels Trace IP leaf labels

    00 01 10 11 00 01 10 11

  • 8/8/2019 07 Analyzing Privacy Enterprise Slide

    10/21

    10

    Bruno Ribeiro, Weifeng Chen, Gerome Miklau, and Don Towsley, Analyzing Privacy in Enterprise Packet Trace Anonymization

    Mapping costs

    Assign a cost to map two IPs with different labels

    Is zero if labels are equal

    Mapping cost

    Sum of all individual costs

    Trace treeProbed tree

    Cost = 0

    Cost = 1

    Cost = 1

    Example:

    1

    0 00

    Total cost = 1

  • 8/8/2019 07 Analyzing Privacy Enterprise Slide

    11/21

    11

    Bruno Ribeiro, Weifeng Chen, Gerome Miklau, and Don Towsley, Analyzing Privacy in Enterprise Packet Trace Anonymization

    Proposed attack

    Allminimum costmappings (over the whole network)Because it is prefix-preserving

    Every de-anonymization limits future de-anonymizations

    And our algorithm is fast

    10 seconds (on this laptop) for all mappings of a network with 216addresses

    Probed tree Trace tree

    ?

    00 01 10 11 00 01 10 11

  • 8/8/2019 07 Analyzing Privacy Enterprise Slide

    12/21

    12

    Bruno Ribeiro, Weifeng Chen, Gerome Miklau, and Don Towsley, Analyzing Privacy in Enterprise Packet Trace Anonymization

    Experiment

    Network: class B (64K addresses)

    Labels

    Active host

    Active ports: FTP, SSH, Telnet, E-mail, Time, DNS, Web, POP3, SOCKS

    Trace IP labels

    Active hostlabel recorded any outgoing trafficActive portsRecorded traffic from ports 80, 22, .

    Probed IP labels

    Probed over all network

    Active hostlabel PINGActive portsTCP SYN ACK reply from ports 80, 22,

    Nave cost function: Zero is labels are equal, one otherwise

  • 8/8/2019 07 Analyzing Privacy Enterprise Slide

    13/21

    13

    Bruno Ribeiro, Weifeng Chen, Gerome Miklau, and Don Towsley, Analyzing Privacy in Enterprise Packet Trace Anonymization

    Experiment results

    Trace collected: 2007, June 18th

    (9097 active IPs)Network probed: 2007, June 18th

    0%

    10%

    20%

    30%

    40%

    50%

    60%

    1 2 3 4 5 6 7 8

    size of matching set

    cumulativefractionof

    hosts

    inthetrace

    Correct

    matches

    Uniquely re-identified

    BAD

    Datapublishe

    rsview

    Incorrect

    matches

  • 8/8/2019 07 Analyzing Privacy Enterprise Slide

    14/21

    14

    Bruno Ribeiro, Weifeng Chen, Gerome Miklau, and Don Towsley, Analyzing Privacy in Enterprise Packet Trace Anonymization

    Worst-case analysis

    Given a labeled trace tree

    Findbestde-anonymization

    We provide an algorithm that

    Obtains worst attack matching set size

    For each IP address in the trace

    For any label mismatch cost function

    For any labeledprobed tree

  • 8/8/2019 07 Analyzing Privacy Enterprise Slide

    15/21

    15

    Bruno Ribeiro, Weifeng Chen, Gerome Miklau, and Don Towsley, Analyzing Privacy in Enterprise Packet Trace Anonymization

    Worst-case experiment

    Full prefix preservationJune 18th experiment

    0%

    20%

    40%

    60%

    80%

    100%

    1 2 3 4 5 6 7 8

    size of matching set

    Nave attack

    Worst-caseattack

    Uniquely re-identified

    BAD

    Datapublishersview

    cumulativefractionof

    hostsin

    thetrace

  • 8/8/2019 07 Analyzing Privacy Enterprise Slide

    16/21

    16

    Bruno Ribeiro, Weifeng Chen, Gerome Miklau, and Don Towsley, Analyzing Privacy in Enterprise Packet Trace Anonymization

    Partial prefix preservation

    Does not retain part of the address structure

    Used in Pang et al., 2006

    Solution also formulated as an instance of the tree edit distanceproblem

    Probed tree rootProbed tree root

    8 bits8 bits

    Anonymized tree rootAnonymized tree root

    8 bits8 bits 8 bits8 bits

    Anonymization

    mapping

    8 bits8 bits

    Up to 256 addresses

  • 8/8/2019 07 Analyzing Privacy Enterprise Slide

    17/21

    17

    Bruno Ribeiro, Weifeng Chen, Gerome Miklau, and Don Towsley, Analyzing Privacy in Enterprise Packet Trace Anonymization

    0%

    20%

    40%

    60%

    80%

    100%

    1 2 3 4 5 6 7 8

    size of matching set

    Partial vs. Full prefix preservation

    Intuition: Partial is much safer than full prefix preservation

    Worst case:Full prefixpreservation

    Worst case:Partial prefixpreservation

    BAD

    Datapublishersview

    cumulativefractionof

    hostsin

    thetrace

  • 8/8/2019 07 Analyzing Privacy Enterprise Slide

    18/21

    18

    Bruno Ribeiro, Weifeng Chen, Gerome Miklau, and Don Towsley, Analyzing Privacy in Enterprise Packet Trace Anonymization

    Worst-case analysis (II)

    Uniquely re-identified

    Full prefix preservation: 2713 active IP addresses in the trace

    Partial prefix preservation: 113 active IP addresses in the trace

    Partial prefix preservation is saferbut not completely safe

  • 8/8/2019 07 Analyzing Privacy Enterprise Slide

    19/21

    19

    Bruno Ribeiro, Weifeng Chen, Gerome Miklau, and Don Towsley, Analyzing Privacy in Enterprise Packet Trace Anonymization

    Related work

    Playing Devil's Advocate: Inferring Sensitive Information from Anonymized Traces,Scott Coull, Charles Wright, Fabian Monrose, Michael Collins andMichael Reiter,NDSS 2007

    An attack on partial prefix preservation

    Taming the Devil: Techniques for Evaluating Anonymized Network Data,Scott Coull, Charles Wright, Fabian Monrose, Angelos Keromytis and Michael Reiter,NDSS 2008

    Comes right after this talk

  • 8/8/2019 07 Analyzing Privacy Enterprise Slide

    20/21

    20

    Bruno Ribeiro, Weifeng Chen, Gerome Miklau, and Don Towsley, Analyzing Privacy in Enterprise Packet Trace Anonymization

    Conclusions

    Attack

    Include global mapping restrictions

    An instance of the tree edit distance problem

    Indicates that full prefix preservation has flawsImpact of late probing on the de-anonymization

    Worst-case analysis

    Can help future anonymization schemes

    A tool for data publishers

    Experiments indicate that:

    Partial is much safer than full prefix preservation

    But still not completely safe

  • 8/8/2019 07 Analyzing Privacy Enterprise Slide

    21/21

    21

    Bruno Ribeiro, Weifeng Chen, Gerome Miklau, and Don Towsley, Analyzing Privacy in Enterprise Packet Trace Anonymization

    Thanks

    Jim Kurose, UMassAmherstEdmundo de Souza e Silva, Federal University of Rio de Janeiro

    Kyoungwon Suh,Illinois State UniversityAnonymous NDSS08 reviewers

    Neils Provos, Google Inc.