Alessandro Acquisti and Ralph Gross

Embed Size (px)

Citation preview

  • AlessandroAcq istiandRalphGrossAlessandroAcquistiandRalphGross

    HeinzCollege/CyLabC i M ll U i itCarnegieMellonUniversity

    ResearchsupportfromNationalScienceFoundation,U.S.ArmyR hOffi (th hC L b) C i M llResearchOffice(throughCyLab),CarnegieMellon

    Berkman Fund,andPittsburghSupercomputingCenter

    BlackHatUSA2009

  • 1. Show thatSocialSecuritynumbers(SSNs)arepredictable

    frompubliclyavailabledata

    Knowledgeofanindividualsbirthdayandbirthplacecanbeexploitedtoinfernarrowrangesofvalueslikelytoincludethatp g y

    individualsSSN

    Thisisdueinparttowell meaning butcounter effective public Thisisdueinparttowellmeaning,butcountereffective,publicpolicyinitiatives

    Hi hli ht i t d i k di li ti2. Highlight associatedrisksandimplications

    3. Discuss possibleriskmitigatingstrategies&policies

  • SSNsweredesignedandissuedbytheSocialSecurityg y yAdministration(SSA)forthefirsttimein1936asidentifiersforaccountstrackingindividualearnings

    Unfortunately,overtimetheystartedbeingused,andabused,asauthenticationdevices NotwithstandingwarningsbySSA,FCT,GAO,scholars,andso

    forth

    Naturally,thesamenumbercantbeusedsecurelybothasidentifierandforauthentication

  • ThewideavailabilityofSSNs andtheirdualuseas ThewideavailabilityofSSNs,andtheirdualuseasidentifiersandauthenticators,makeidentitythefteasyandwidespreadandwidespread

    Knowledgeofsomebodysname,DOB,andSSNisoftensufficientconditionforaccesstofinancial medical andsufficientconditionforaccesstofinancial,medical,andotherservices Sometimes evenapplicationswithjust7outof9correctdigitsare Sometimes,evenapplicationswithjust7outof9correctdigitsare

    acceptedasvalid(FTC2004)

  • E hSSNh di it EachSSNhas9digits: XXXYYZZZZ

    andiscomposedofthreeparts andiscomposedofthreeparts: Areanumber:XXX

    G b YY Groupnumber:YY Serialnumber:ZZZZ

    TheSSNissuanceschemeiscomplex butnot TheSSNissuanceschemeiscomplex,butnotstochastic

    Th SSAi lfh f l i bli l l di TheSSAitselfhasforalongtimepubliclyrevealeditsdetails

  • Thisiswellknown Thisiswellknown Infact,inferenceofthelikelytimeandlocationofSSN

    applicationsbasedontheirdigitshasbeenexploitedtocatchpp g pfraudstersandimpostors

    However,theSSAalsostatesthattheSSNassignmentff l dprocessis,effectively,random:

    SSNsareassignedrandomly bycomputerwithintheconfinesoftheareanumbersallocatedtoaparticularstatebasedondatatheareanumbersallocatedtoaparticularstatebasedondatakeyedtotheModernizedEnumerationSystem(RM00201.060)

  • Alaska NewYork

    First 5digitswith1guess

    All9digitswith

  • Inthelast30years SSNissuancehasbecomemoreregular Inthelast30years,SSNissuancehasbecomemoreregular Increasingcomputerizationofthepublicadministration,including

    SSAanditsvariousfieldsoffices

    After1972,SSNassignmentcentralizedfromBaltimore TaxReformActof1986(P.L.99514) After1989,EnumerationatBirthProcess (EAB)

    Priorto1989,onlysmallpercentageofpeoplereceivedSSNwhentheywereborntheywereborn

    Currentlyatleast90percentofallnewbornsreceiveSSNviaEABtogetherwithbirthcertificate

  • 1. WeexpectedSSNissuancepatternstohavebecomemore

    regularovertheyears,i.e.increasinglycorrelatedwithan

    individualsbirthdayandbirthplacey p

    ThisshouldbedetectedthroughanalysisofavailableSSNdata2. Weexpectedthesepatternstohavebecomesoregularthatitp p g

    ispossibletoinferunknownSSNs basedonthepatterns

    detectedonavailableSSNsdetected o a a lable SS s

    ThisshouldbeverifiedbycontrastingestimatedSSNsagainstknownSSNs

  • OutsidetheSSA,thecurrentunderstandingoftheassignmentofthefirst3digitswasincorrect,andtherelationshipbetween

    demographicpatternsandthesequentiality ofthelast4digitsg p p q y g

    wasunexplored

    Hence,previousworkinthisareafocusedoninferringthelikelyyearor, p g y yyearsandstateofSSNissuanceofaknown SSN(e.g.,[Wessmiller,

    2002],[Sweeney,2004],[EPIC,2008])

    Wefocusedontheinverse,harder,andmuchmoreconsequentialinference:exploitingthepresumptivedayand

    locationofSSNapplicationtopredictunknown SSNs

  • Alaska,1998 NewYork,1998

    First 5digitswith1guess

    All9digitswith

  • TheSocialSecurityAdministrationsDeathMasterFileisa TheSocialSecurityAdministration sDeathMasterFileisapubliclyavailabledatabaseoftheSSNsofindividualswhoaredeceased Oneofthepurposesofmakingthisdataavailablewastocombat

    fraud Unfortunately,itcanalsobeanalyzedtofindpatternsintheSSN

    issuancescheme WeusedDMFdatatofindpatternsintheissuanceofSSNs

    bydateofbirthandStateofSSNissuancefordeceasedbydateofbirthandStateofSSNissuancefordeceasedindividuals Namely,wesortedrecordsbyreportedDOBandgroupedthemby

    t dSt t fireportedStateofissuance Aniterativeprocess

  • Name Birth Death Last Residence SSN Issued

    JOHN SMITH

    21 Jun 1904

    Oct 1979

    33540 (Zephyrhills, Pasco, FL) 022-10-3459 Massachusetts

  • 1. TEST1:WeusedmorethanhalfamillionDMFrecordstodetectpatternsinSSNissuancebasedonbirthplaceandstateofissuance andusedthosepatternstopredict(andstateofissuance,andusedthosepatternstopredict(andverify)individualSSNsintheDMF

    2 TEST2:Wemineddatafromanonlinesocialnetworkto2. TEST2:Wemineddatafromanonlinesocialnetworktoretrieveindividualsselfreportedbirthdaysandbirthplaces,andestimatedtheirSSNsbyinterpolatingp y p gthatdatawithDMFpatterns.WeverifiedtheestimatesusingofficialEnrollmentdatausingaprotected(andIRBapproved)protocol

  • 1. Whetherwecouldpredictthefirst5digitsofanindividualsSSNwithoneattempt

    2. WhetherwecouldpredicttheentireSSNwithfewerthan10,100,and1,000attempts

    Note:1,000attemptsisequivalentto3digitPIN Thatis,veryinsecureandvulnerabletobruteforcey

    attacks

  • ME

    EAB starts here (1989)CA

    1973 2003

  • h l (f f d l ) Withasingleattempt(firstfivedigitsonly): 7%(1973 1988) 44%(19892003)

    With10attempts(complete9digitSSNs): 0.01%of(1973 1988) 0 1%(19892003)0.1%(1989 2003)

    With1,000attempts(complete9digitSSNs): 0.8%(19731988)

    8 %( 8 ) 8.5%(1989 2003)

    Theseareweightedaverages forsmallerstatesandrecentyears,predictionratesarehigher.E.g.,1outof20SSNsinDE,1996,areidentifiablewith10orfewerattempts

  • f InTest2weusedbirthdaydataof621aliveindividualstopredicttheirSSN,basedoninterpolationwithDMFdatadata Oursample:bornin19861990(i.e.,mostlybeforeEAB) Inmostpopulousstates(i e worstcasescenario)Inmostpopulousstates(i.e.,worstcasescenario)

    Birthdayandbirthplacedatacanbeobtainedfromseveralsources,butmosteasily,andinmassamounts,fromonlinesocialnetworks ItistrivialforanattackertowritescriptstopenetrateOSN

    d d l d f dcommunitiesanddownloadmassiveamountsofdata

  • T t fi d lt fT t (f i f Test2confirmedresultsofTest1(forsamemixofyears/statesofbirth)

    ThisvalidatesthatinterpolationofSSNdatafordeceased ThisvalidatesthatinterpolationofSSNdatafordeceasedindividualsandbirthdaydataforaliveindividualscanleadtothepredictionofthelattersSSNs

    ExtrapolatingtotheUSlivingpopulation,thiswouldimplyh id ifi i f d illi SSN fi di i dtheidentificationofaround40millionSSNsfirst5digitsandalmost8millionindividualscompleteSSNs

    Caveat:Assumingknowledgeofbirthdata! Caveat:Assumingknowledgeofbirthdata!

  • l k l d Personalknowledge Onlinesocialnetworks Voterregistrationlists Voterregistrationlists Freeonlinepeoplesearchservices Commercialdatabases

  • Statisticalpredictionsdonotamount,alone,doStatisticalpredictionsdonotamount,alone,doidentitytheft Howcanyoutest10,100,or1,000variationsofanSSNy , , ,withoutraisingredflags? Usingbotnets anddistributedonlineservicesforbruteforceverificationattacks TumblingattackshavebeendocumentedbyIDAnalytics

  • Phishing Phishing SSNVS:SSNVerificationService(SSA) eVerify (DHS) eVerify (DHS) Instantcreditapprovalservices DOB/SSNmatchoftenissufficientconditiontoget DOB/SSNmatchoftenissufficientconditiontoget

    approvedforseveralservices

  • SSN Online SSNs as Availability of Distributed predictability verification

    systems Instant credit

    approvalseVerify

    authenticators CRAs Financial

    institutions Medical

    birth data Commercial

    databases Free online

    people

    attacks Botnets

    eVerify SSNVS

    services []

    people searches

    Voter registration lists

    Online social networks

    d i l ih d f l Randomizeassignmentscheme(alldigits)?

    Improvepersonalcomputersecurity?

    Beonthealertfordistributedattacks?Improverealtimecoordination?(ID

    StopusingSSNsforauthentication,reverttosingleuseasidentifiers?

    Changedefaultsettings?Changeaccess/securitypolicies? coordination?(ID

    Analytics2003)Improvelaxverificationprocedures?

    useasidentifiers?policies?

  • Shortterm Randomizescheme But,thisalonenotenough:

    DoesnotprotectedissuedSSNs;doesnotresolveauthenticator/identifierissue

    LongtermLongterm Reconsiderlegislativeinitiativesfocusingon

    redacting/removingSSNsfromdocuments/publicexposure Phaseoutauthenticationusage

    Negligenceargumentforbusinessesthatusethemassuch?

    Sunsetsolution? Sunset solution? E.g.,makeallSSNspublicbyyear2014 transitiontosecure,private,

    efficientauthenticationmethodsinthemeanwhile?

  • Research support from the National Science Foundation under ppGrant 0713361, from the U.S. Army Research

    Office under Contract DAAD190210389, from Carnegie MellonBerkman Development Fund, and from the Pittsburgh Supercomputing Center is gratefully acknowledged