Upload
others
View
0
Download
0
Embed Size (px)
Citation preview
Efficient Private Set Intersection for aDecentralised Web of Trust
Álvaro García-Recuero
October 31, 2017
“Privacy-preserving protocols for the WWW in the age of masssurveillance and adversarial learning”
Álvaro García-Recuero Efficient Private Set Intersection for a Decentralised Web of Trust 2 / 40
Why is that?
Álvaro García-Recuero Efficient Private Set Intersection for a Decentralised Web of Trust 3 / 40
Strong and MaliciousMass-surveillance AND personal data collection by third-partieson the WWW are a real threat to liberal societies and citizens!a.
ahttps://www.theguardian.com/technology/2017/aug/01/data-browsing-habits-brokers
CountermeasuresA truly decentralised WWW will require the network to provideprivacy and trust by design.
Álvaro García-Recuero Efficient Private Set Intersection for a Decentralised Web of Trust 4 / 40
Strong and MaliciousMass-surveillance AND personal data collection by third-partieson the WWW are a real threat to liberal societies and citizens!a.
ahttps://www.theguardian.com/technology/2017/aug/01/data-browsing-habits-brokers
CountermeasuresA truly decentralised WWW will require the network to provideprivacy and trust by design.
Álvaro García-Recuero Efficient Private Set Intersection for a Decentralised Web of Trust 4 / 40
How safe is Big Data?
Adversarial learningManipulating or inserting corrupted samples in the dataset toobtain a desired outcome (e.g., financial credit score in OSNs).
De-anonymisationPossible to use external data sources to re-identify users and theirpreferences.
Privacy breachesWoTa extension collecting users’ metadata in the browser.
ahttps://en.wikipedia.org/wiki/WOT_Services
Álvaro García-Recuero Efficient Private Set Intersection for a Decentralised Web of Trust 5 / 40
How safe is Big Data?
Adversarial learningManipulating or inserting corrupted samples in the dataset toobtain a desired outcome (e.g., financial credit score in OSNs).
De-anonymisationPossible to use external data sources to re-identify users and theirpreferences.
Privacy breachesWoTa extension collecting users’ metadata in the browser.
ahttps://en.wikipedia.org/wiki/WOT_Services
Álvaro García-Recuero Efficient Private Set Intersection for a Decentralised Web of Trust 5 / 40
How safe is Big Data?
Adversarial learningManipulating or inserting corrupted samples in the dataset toobtain a desired outcome (e.g., financial credit score in OSNs).
De-anonymisationPossible to use external data sources to re-identify users and theirpreferences.
Privacy breachesWoTa extension collecting users’ metadata in the browser.
ahttps://en.wikipedia.org/wiki/WOT_Services
Álvaro García-Recuero Efficient Private Set Intersection for a Decentralised Web of Trust 5 / 40
What is the Web-of-Trust about?
Álvaro García-Recuero Efficient Private Set Intersection for a Decentralised Web of Trust 6 / 40
What is decentralised PSI useful for?Trust for a non-public Web-of-TrustWe should be able to establish trust without a centralisedCertification Authority (CA).
Going DecentralisedThe user should able to establish direct trust with its peers,similarly to what happens with PGP, GnuPG and others, butwithout exposing who the signers are, etc.
Why is it desirable?Centralised data silos prone to privacy breach, e.g., third-partyapps as the WoT plugin.Governments and powerful authorities, e.g., NSA, GCHQ.
Álvaro García-Recuero Efficient Private Set Intersection for a Decentralised Web of Trust 7 / 40
What is decentralised PSI useful for?Trust for a non-public Web-of-TrustWe should be able to establish trust without a centralisedCertification Authority (CA).
Going DecentralisedThe user should able to establish direct trust with its peers,similarly to what happens with PGP, GnuPG and others, butwithout exposing who the signers are, etc.
Why is it desirable?Centralised data silos prone to privacy breach, e.g., third-partyapps as the WoT plugin.Governments and powerful authorities, e.g., NSA, GCHQ.
Álvaro García-Recuero Efficient Private Set Intersection for a Decentralised Web of Trust 7 / 40
What is decentralised PSI useful for?Trust for a non-public Web-of-TrustWe should be able to establish trust without a centralisedCertification Authority (CA).
Going DecentralisedThe user should able to establish direct trust with its peers,similarly to what happens with PGP, GnuPG and others, butwithout exposing who the signers are, etc.
Why is it desirable?Centralised data silos prone to privacy breach, e.g., third-partyapps as the WoT plugin.Governments and powerful authorities, e.g., NSA, GCHQ.
Álvaro García-Recuero Efficient Private Set Intersection for a Decentralised Web of Trust 7 / 40
Abusing the WWW
Álvaro García-Recuero Efficient Private Set Intersection for a Decentralised Web of Trust 8 / 40
DefinitionModeling Abuse
Deny
Deceive
Degrade
DisruptGovernment Communications Headquarters
Álvaro García-Recuero Efficient Private Set Intersection for a Decentralised Web of Trust 9 / 40
Defining DeceiveModeling Abuse
Supplanting a known user identity (impersonation) forinfluencing other users behaviour and activities, includingassuming false identities (but not pseudonyms).
SYLVESTER: framework for automated interaction & aliasmanagement in Online Social Networks.
UNDERPASS Change outcome of online polls.
SCRAPHEAP CHALLENGE: perfect spoofing of emailsfrom Blackberry targets.
BURLESQUE: capability to send spoofed SMS textmessages.
Álvaro García-Recuero Efficient Private Set Intersection for a Decentralised Web of Trust 10 / 40
Defining DegradeModeling Abuse
Disclosing personal and private data of others without theirapproval as to harm their public image or reputation.
BIRDSTRIKE is a Twitter monitoring and profile datacollection tool.
SPRING BISHOP: finds private photographs of targets inFacebook.
Álvaro García-Recuero Efficient Private Set Intersection for a Decentralised Web of Trust 11 / 40
Defining DenyModeling Abuse
Encouraging self-harm to other users, promoting violence(direct or indirect), terrorism or similar activities.
CLEAN SWEEP: masquerades Facebook wall posts forindividuals or entire countries, effectively denying access toinformation (censorship).
ROLLING THUNDER: distributed denial of service usingP2P.
Álvaro García-Recuero Efficient Private Set Intersection for a Decentralised Web of Trust 12 / 40
Defining DisruptModeling Abuse
Distracting provocations, denial-of-service, flooding withmessages, promote abuse.
BIRDSONG: automated posting of Twitter updates.
CANNONBALL: capability to send repeated text messagesto a single target.
PITBULL: enabling large scale delivery of a tailored messageto users of instant messaging services.
Álvaro García-Recuero Efficient Private Set Intersection for a Decentralised Web of Trust 13 / 40
Abuse detection
Álvaro García-Recuero Efficient Private Set Intersection for a Decentralised Web of Trust 14 / 40
Abuse ground truthTrollslayer tool
1github.com/algarecu/trollslayer1RepoÁlvaro García-Recuero Efficient Private Set Intersection for a Decentralised Web of Trust 15 / 40
Mutual SubscriptionsFeature analysis
0.02
0.05
0.10
0.20
CCDF of mutual followees in log−log scale
log(x)
log[
P(X
> x
)]
100
acceptableabusive
|Subscription ∩ Subscription|CCDF shows less overlap amongsubscriptions of author of abusivemessages and subscriptions ofpotential victim.
Privacy: it needs a protocol toprotect metadata.
Security? Hard to prevent increasein overlap of subscriptions ofpotential victim if that is publicinformation.
Álvaro García-Recuero Efficient Private Set Intersection for a Decentralised Web of Trust 16 / 40
Straw-man versionPrivacy Protocol
Problem: Alice wants to compute n := |LA ∩ LB|
Suppose each user has a private key ci and the correspondingpublic key is Ci := gci where g is some generator
The set up is as follows:LA: set of public keys representing Alice’s subscriptions
LB: set of public keys representing Bob’s subscriptions
Alice picks an ephemeral private scalar tA ∈ Z/pZ (set of scalarsused for a D-H exchange).
Bob picks an ephemeral private scalar tB ∈ Z/pZ (set of scalarsused for a D-H exchange).
Álvaro García-Recuero Efficient Private Set Intersection for a Decentralised Web of Trust 17 / 40
Privacy Protocol: straw-man version
XA :={CtA
∣∣∣ C ∈ LA}
YA :={CtA
∣∣∣ C ∈ XB}
={CtA·tB
∣∣∣ C ∈ LA}
Alice Bob
XA
XB,YB
XB :={CtB
∣∣∣ C ∈ LB}
YB :={CtB
∣∣∣ C ∈ XA}
={
CtB·tA∣∣∣ C ∈ LB
}
Alice can get |YA ∩ YB| within linear cost
Álvaro García-Recuero Efficient Private Set Intersection for a Decentralised Web of Trust 18 / 40
Straw-man Protocol 1Attack 1
Attack 1: insertion of sock-puppet accounts to infer size ofthe potential’s victim contact list.
Solution: defeat it with shuffling of contact list beforesending it to other party.
Álvaro García-Recuero Efficient Private Set Intersection for a Decentralised Web of Trust 19 / 40
Straw-man Protocol 1Attack 2
Attack 2: insertion of sock-puppet account with a marker inthe potential perpetrator list allows to infer set membershipin potential victim’s list (identifying pair of elements).
Solution: hash the commitments of reblinded contact list inthe reply to potential perpetrator.
Álvaro García-Recuero Efficient Private Set Intersection for a Decentralised Web of Trust 20 / 40
Assume a fixed system security parameter κ ≥ 1
For any list or set Z, define Z′ := {h(x)|x ∈ Z}, e.g., X ′B,i:
hashing each element ∈ XB
Álvaro García-Recuero Efficient Private Set Intersection for a Decentralised Web of Trust 21 / 40
Protocol 1: Cut & choose version
Alice Bob
send XA
X ′B,i,Y
′B,i
J
XB,j, tB,j
1 Alice sends:XA :=
sort [CtA | C ∈ A]
2 Bob responds withcommitments:
X ′B,i,Y ′
B,i for i ∈ 1, . . . , κ3 Alice picks a non-empty
random subsetJ ⊆ {1, . . . , κ} and sendsit to Bob.
4 Bob replies withXB,j for j ∈ J, tB,jfor j /∈J
Álvaro García-Recuero Efficient Private Set Intersection for a Decentralised Web of Trust 22 / 40
Cut & choose version of Protocol 1: Verification
For j /∈ J, Alice checks the tB,j matches the commitment Y ′B,j.
For j ∈ J Alice checks the commitment to XB,j and computes:YA,j :=
{CtA
∣∣∣ C ∈ XB,j
}Finally, Alice computes: n = |Y ′
A,j ∩ Y ′B,j|.
Alice checks that n values for all j ∈ J, agree.
Álvaro García-Recuero Efficient Private Set Intersection for a Decentralised Web of Trust 23 / 40
Privacy Analysis of PSI features0.
020.
050.
100.
200.
50
CCDF of mutual followers in log−log scale
log(x)
log[
P(X
> x
)]
100
acceptableabusive
|Subscriber ∩ Subscriber|CCDF shows that authorsof abusive messages are lesslikely to have commonsubscribers.
Security? Hard to preventfake subscribers.Privacy? Yes, Protocol 1.
Álvaro García-Recuero Efficient Private Set Intersection for a Decentralised Web of Trust 24 / 40
Privacy Analysis of PSI features5e
−04
2e−
031e
−02
5e−
022e
−01
CCDF of mutual followers−followees in log−log scale
log(x)
log[
P(X
> x
)]
100
acceptableabusive
CCDF of|Subscribers ∩ Subscriptionr|shows less overlap amongsubscriptions of authors ofabusive messages andsubscriptions of thepotential victims.
Security? Assume moredifficult for an adversary toincrease feature overlap.
Privacy? Yes, our Protocolwith BLS signatures.
Álvaro García-Recuero Efficient Private Set Intersection for a Decentralised Web of Trust 25 / 40
Protocol 2: PSI with Subscriber Signatures
Assume Subscribers are willing to sign they are subscribed.
Subscribers provide the signatures and not a certificationauthority.
BLS signatures are compatible with our blinding, so weintegrate them with our cut & choose version of the protocol.
Detailed protocol is in the paper.
Álvaro García-Recuero Efficient Private Set Intersection for a Decentralised Web of Trust 26 / 40
What is Protocol 2 useful for?
Prove overlap of subscribers without reveling their identity.
Key authentication in non-public Web-of-Trust (1-hop only).
Unlike PSI-CA from De Cristofaro (2016), no need for a CA!
Álvaro García-Recuero Efficient Private Set Intersection for a Decentralised Web of Trust 27 / 40
Privacy-preserving featuresFeature Falsification/Adaptation Crypto helps?
5.1 # lists trivial n/a# subscriptions trivial n/a# subscriptions
age trivial n/a#subscriptions#subscribers trivial n/a
5.2 # mentions costly n/a# hashtags costly n/a# mentions
age costly yes# mentions# messages costly n/a
5.3 message invasive hard n/a5.4 # messages
age costly yes# retweets costly n/a# favorited messages costly n/a
5.5 age of account hard yes5.6 # subscribers possible minimally
# subscribersage possible minimally
5.7 subscription ∩ subscription costly w. privacy5.8 subscriber ∩ subscriber possible w. privacy5.9 subscribers ∩ subscriptionr very hard yes
subscriptions ∩ subscriberr possible w. privacyÁlvaro García-Recuero Efficient Private Set Intersection for a Decentralised Web of Trust 28 / 40
Decision Trees with privacy
Objective function is tomaximize AUC under the P-Rcurve.
0.0 0.2 0.4 0.6 0.8 1.0Recall
0.0
0.2
0.4
0.6
0.8
1.0
Pre
cisi
on
Precision-Recall (AUC = 0.48)
Objective function is tominimize FP and FN rates.
acceptable abusivePredicted label
acc
ep
tab
leab
usi
veT
rue l
ab
el
0.915 0.085
0.355 0.645
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
Álvaro García-Recuero Efficient Private Set Intersection for a Decentralised Web of Trust 29 / 40
Random Forest with privacy
Objective function is tomaximize AUC under the P-Rcurve.
0.0 0.2 0.4 0.6 0.8 1.0Recall
0.0
0.2
0.4
0.6
0.8
1.0
Pre
cisi
on
Precision-Recall (AUC = 0.48)
Objective function is tominimize FP and FN rates.
acceptable abusivePredicted label
acc
ep
tab
leab
usi
veT
rue l
ab
el
0.937 0.063
0.419 0.581
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
Álvaro García-Recuero Efficient Private Set Intersection for a Decentralised Web of Trust 30 / 40
Extra Trees with privacy
Extra Trees is the most balanced among FP and FN, and has thebest P-R curve.
Objective function is tomaximize AUC under the P-Rcurve.
0.0 0.2 0.4 0.6 0.8 1.0Recall
0.0
0.2
0.4
0.6
0.8
1.0
Pre
cisi
on
Precision-Recall (AUC = 0.49)
Objective function is tominimize FP and FN rates.
acceptable abusivePredicted label
acc
ep
tab
leab
usi
veT
rue l
ab
el
0.795 0.205
0.194 0.806
0.24
0.32
0.40
0.48
0.56
0.64
0.72
0.80
Álvaro García-Recuero Efficient Private Set Intersection for a Decentralised Web of Trust 31 / 40
Gradient Boosting with privacy
Gradient Boosting = Gradient Descent + Boosting.
Objective function is tomaximize AUC under the P-Rcurve.
0.0 0.2 0.4 0.6 0.8 1.0Recall
0.0
0.2
0.4
0.6
0.8
1.0
Pre
cisi
on
Precision-Recall (AUC = 0.45)
Objective function is tominimize FP and FN rates.
acceptable abusivePredicted label
acc
ep
tab
leab
usi
veT
rue l
ab
el
0.972 0.028
0.581 0.419
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
Álvaro García-Recuero Efficient Private Set Intersection for a Decentralised Web of Trust 32 / 40
Method can protect privacy.
Method can handle adaptive adversaries.
Using reduced ground truth almost as Human Score.
Álvaro García-Recuero Efficient Private Set Intersection for a Decentralised Web of Trust 33 / 40
Data minimisation
Álvaro García-Recuero Efficient Private Set Intersection for a Decentralised Web of Trust 34 / 40
MinHashesData minimisation
Intuition: we can use the intersection to estimated theapproximated Jaccard index of J (A,B) by counting thenumber of indexes (i) such as that hA
min(·) = hBmin(·).
Evaluate approximate, privacy-preserving PSI in terms of:(i) computation time(ii) accuracy of classification.
Álvaro García-Recuero Efficient Private Set Intersection for a Decentralised Web of Trust 35 / 40
The Efficient Privacy-Preserving Protocol
Approximated Jaccard index estimation with MinHashesreduces computational footprint.
In addition, Data Minimisation provides our PP Protocol forDOSN just a fingerprint of the one-hop graph metadata.
Note that even centralised Social Network providers asLinkedIn stop counting contacts after +500 on their site.
Álvaro García-Recuero Efficient Private Set Intersection for a Decentralised Web of Trust 36 / 40
Results and performance
Features Timing (ms) # of hash. func. (k) Error boundAll using J index) 3 018 632.98 – –All using approx. J index) 2 626 971.92 64 O(1/
√k)
All using approx. J index) 2 642 225.02 128 O(1/√
k)
We see a reduction in computation time for the same setsizes (details in ASONAM ’17 article)
Supervised learning results come close to what we obtainedusing no approximation features with PSI, now the Jaccardindex thanks to MinHashes.
Álvaro García-Recuero Efficient Private Set Intersection for a Decentralised Web of Trust 37 / 40
Conclusions & Future Work
Our protocol is resistant against malicious adversaries.
Data minimisation reduces exposing training process tomalicious adversaries tampering training samples.
Use our protocol to support a decentralised Web-of-Trustthat provides trust but also privacy.
Álvaro García-Recuero Efficient Private Set Intersection for a Decentralised Web of Trust 38 / 40
Q & A
QUESTIONS?
Contact: algarecu.wordpress.comRepos: github.com/algarecu
Álvaro García-Recuero Efficient Private Set Intersection for a Decentralised Web of Trust 39 / 40
Publications
Á. García-Recuero Efficient Privacy-Preserving AdversarialLearning in Decentralized Online Social Networks.In 2017 2017 IEEE/ACM International Conference onAdvances in Social Networks Analysis and Mining(ASONAM), Sydney, Australia.
Á. García-Recuero, J. Burdges, and C. Grothoff.Privacy-preserving abuse detection in future decentralizedonline social networks.In 11th International ESORICS Workshop in Data PrivacyManagement, DPM 2016. Springer Lecture Notes inComputer Science.
Álvaro García-Recuero Efficient Private Set Intersection for a Decentralised Web of Trust 40 / 40