38
Anaconda: Detecting private-information leaks in Android apps using static data-flow analysis Rijksuniversiteit Groningen Authors: Stephan Groenewold Klaas L. Winter Jan Veldthuis Supervisor: dr. Doina Bucur Second reviewer: dr. Arnold Meijster Abstract With the advent of the smartphone, new ways of communicating and connecting with the internet have opened up. For many people, smartphones have replaced their watch, their address book, and their calendar. This means that the average smartphone has a lot of private information stored on it, for example: SMS or MMS messages, what web-pages were visited in the browser, or the call history. Unfortunately, it is not clear what apps on smartphones do with this information, and whether this information is used maliciously. Anaconda addresses these issues by using static data-flow analysis on Android apps, reporting whether these apps request private information, and reporting whether this information is sent to remote servers (i.e. leaked). In 14 popular apps, Anaconda found 572 requests of private information, of which 243 cases led to a possible leak. While some of these information leaks are legitimate uses, a large number of the leaks are not legitimate or at least suspicious. Most leaked information is either sent to ad servers or the app developers’ servers, and may not even be used by the app itself. 1

Anaconda: Detecting private-information leaks in Android ... · Figure 1: Android Architecture such as smartphones and tablets. In 2005, Android Inc. was bought by Google, who continued

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Anaconda: Detecting private-information leaks in Android ... · Figure 1: Android Architecture such as smartphones and tablets. In 2005, Android Inc. was bought by Google, who continued

Anaconda: Detecting private-information leaks inAndroid apps using static data-flow analysis

Rijksuniversiteit Groningen

Authors:Stephan GroenewoldKlaas L. WinterJan Veldthuis

Supervisor:dr. Doina Bucur

Second reviewer:dr. Arnold Meijster

Abstract

With the advent of the smartphone, new ways of communicating and connecting with the internethave opened up. For many people, smartphones have replaced their watch, their address book, andtheir calendar. This means that the average smartphone has a lot of private information stored on it,for example: SMS or MMS messages, what web-pages were visited in the browser, or the call history.Unfortunately, it is not clear what apps on smartphones do with this information, and whether thisinformation is used maliciously. Anaconda addresses these issues by using static data-flow analysis onAndroid apps, reporting whether these apps request private information, and reporting whether thisinformation is sent to remote servers (i.e. leaked). In 14 popular apps, Anaconda found 572 requestsof private information, of which 243 cases led to a possible leak. While some of these informationleaks are legitimate uses, a large number of the leaks are not legitimate or at least suspicious. Mostleaked information is either sent to ad servers or the app developers’ servers, and may not even beused by the app itself.

1

Page 2: Anaconda: Detecting private-information leaks in Android ... · Figure 1: Android Architecture such as smartphones and tablets. In 2005, Android Inc. was bought by Google, who continued

Contents

1 Introduction 4

2 Background 4

3 Concept 63.1 Why static analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

3.1.1 Dynamic analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63.1.2 Checking outgoing data at runtime . . . . . . . . . . . . . . . . . . . . . . . . . . . 83.1.3 Static analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

3.2 Sources and Sinks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93.2.1 Direct sources and sinks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93.2.2 Indirect sources and sinks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

3.3 Tracking from source to sink . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103.3.1 Tracking forward from source . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113.3.2 Tracking sinks forward . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

4 Realisation 124.1 Finding sinks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124.2 Finding sources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124.3 Finding a path from source to sink . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

4.3.1 Actions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134.3.2 Difficulties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144.3.3 Optimisation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144.3.4 Track paths . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

4.4 HTML report . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154.5 Algorithmic complexity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

5 Further research 175.1 Tracking references . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175.2 Conditional statements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 185.3 Usage of source to sink paths . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

6 Evaluation and results 196.1 Analysed apps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

6.1.1 Dropbox . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 206.1.2 Barcode Scanner (zxing) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 206.1.3 Humble Bundle Downloader . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 206.1.4 Tweakers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 206.1.5 Sudoku Free . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 216.1.6 Reddit Sync . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 216.1.7 Word Search . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 216.1.8 gReader . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 216.1.9 Shazam . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 226.1.10 Buienradar . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 226.1.11 Twitter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 226.1.12 WhatsApp . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 236.1.13 iGroningen . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 236.1.14 NS Reisplanner Xtra . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

6.2 Resulting HTML . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 246.3 Discussion of results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 246.4 Analysis time . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

2

Page 3: Anaconda: Detecting private-information leaks in Android ... · Figure 1: Android Architecture such as smartphones and tablets. In 2005, Android Inc. was bought by Google, who continued

7 Future work 277.1 Inheritance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 277.2 Permissions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 287.3 Reflection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 287.4 Intents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 287.5 Content providers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

8 Related work 298.1 Androguard . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 298.2 TaintDroid . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 298.3 SAAF . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 298.4 Stowaway . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

9 Conclusion 30

References 30

Appendix 32

A Leak type glossary 32

B List of recognised sinks 33

C List of recognised sources 34

D Leak occurence tables 38

3

Page 4: Anaconda: Detecting private-information leaks in Android ... · Figure 1: Android Architecture such as smartphones and tablets. In 2005, Android Inc. was bought by Google, who continued

1 Introduction

More and more people are starting to use smart-phones. Smartphones are, for most people, idealdevices, since they offer the same functionality thatbefore required multiple devices or items. Exam-ples of the devices and items that are being replacedby smartphones include: calendars, address books,cameras, flashlights, watches, multi-media players,game devices, etcetera. The list of functionalitiessmartphones possess is only increasing with newtechnology and new apps. Because a large numberof the tasks done by a smartphone are personal innature, a big amount of personal data is stored ona smartphone. Furthermore, because smartphonesare getting equipped with more and more sensors,such as GPS, motion sensors and a microphone,information from these sensors also becomes avail-able. Examples of personal information that couldbe acquired because of this are: SMS and MMSmessages, contacts, photos that were made, where-abouts of the phone, conversations that happenin close proximity of the phone, browser history,etcetera. Everyone that has access to the phone,including people and installed apps, could poten-tially access this information.

Although smartphones with different operatingsystems installed are available on the market to-day, this paper focuses on the Android OS. Oneimportant reason for this is the open source natureof Android, which makes working and developingwith it easy. Another important reason is the factthat Android is by far the most popular mobile OSbeing sold today, with 74.4% of sold smartphonesin the first quarter of 2013 using Android [1].

The Android OS tries to limit third-party apps’access to personal information by requiring specificpermissions for different types of information, suchas permission to connect to the internet or per-mission to access the user’s address book. Thesepermissions have to be given by the user upon in-stalling a third-party app. However, because An-droid cannot notify the user what the app will ac-tually do with that information, many users simplyaccept whatever permissions the app requests. Be-cause the warnings produced by Android are oftenignored, the system does not help much in securingusers’ data.

Not all apps are very careful with your infor-mation either. Take for example WhatsApp, thewell known messaging service. In the year 2012, ithas been found to not encrypt your messages, bothwhen stored or send, allowing any other app withSD card access, or on the same network to be ableto read your messages [2]. Numerous more securityissues have been discovered since. Not only that, it

sends all the phone numbers in your address bookto the WhatsApp servers, to check which of themhave been registered with WhatsApp. Who knowswhat happens with that information. Clearly, evenwidely used popular apps can not always be trustedwith your personal information.

This paper presents Anaconda, a static data-flow analyser that is capable of detecting personal-information usage in apps, and is capable of deter-mining whether this personal information is leakedby apps. Anaconda can be provided with an An-droid app installation package which it decompiles.Anaconda then searches for usage of personal in-formation and tracks this information through thedecompiled code. This tracking occurs in a depth-first-search fashion, and during the tracking a tracktree is generated that shows what was done withthe personal information, and most importantlywhether the information is possibly leaked.

By providing information about whether appsacquire or even leak personal data, developers canmake their apps more secure and privacy aware. Itcould also provide users with more detailed infor-mation about what the apps they install do withtheir personal information. This way they couldlook for alternative apps or use a solution such asTISSA [3] to not allow the leaking to happen.

In the next chapter we will first give a briefoverview of the Android operating systems andsome of the most important related terms we willuse in this document. After that, we take a look atthe theory behind some commonly used approachesto app analysis, and look into static analysis, andhow Anaconda performs static analysis in more de-tail. A detailed description of each of the stepsperformed by Anaconda to analyse an app followsin the Realisation section. In Further research, weanalyse some solutions to problems we encounteredwhile developing Anaconda. Having discussed howAnaconda works, we look at the results generatedby Anaconda. We discuss each of the 14 apps weanalysed, in which Anaconda found a total of 572requests of private information, of which 243 areleaked. The performance of Anaconda is also exam-ined. Finally, we talk about some potential futureimprovements for Anaconda, compare our solutionto what already has been done, and conclude ourreport with a conclusion.

The source code for Anaconda is available athttps://github.com/KPWhiver/Anaconda.

2 Background

Android is an operating system, created by AndroidInc., specially designed to be run on mobile devices,

4

Page 5: Anaconda: Detecting private-information leaks in Android ... · Figure 1: Android Architecture such as smartphones and tablets. In 2005, Android Inc. was bought by Google, who continued

Figure 1: Android Architecture

such as smartphones and tablets. In 2005, AndroidInc. was bought by Google, who continued the de-velopment of Android as an open source project.Android uses a modified Linux kernel as its kernel.One of the features that were added to the Linuxkernel by Google is the binder driver, which allowsprocesses to communicate with each other. Some ofthe components under “Linux Kernel” in Figure 1were also added by Google, such as more aggressivepower management. Android supplies a set of C li-braries, such as an implementation of OpenGL, agraphics library, and libc. Figure 1 shows a list ofsome available libraries. The libc provided by An-droid is based on the BSD C library and modifiedby Google. Android’s libc is called Bionic and it ishighly optimised for mobile devices.

Applications that are written for Android are,at least partially, written in Java. Android appli-cations are commonly referred to as apps. To runthis Java code efficiently, Google created the DalvikVirtual Machine, which is a Java Virtual Machineoptimised for mobile devices. The big difference be-tween a normal Java VM and the Dalvik VM is thatDalvik works with register-based bytecode, whilenormal Java VMs work with stack-based bytecode.Both Dalvik and the standard Java libraries arepart of the “Android Runtime” in Figure 1.

On top of these libraries and Dalvik, Androidsupplies a set of Java libraries for use by apps. This

can be seen under “Application Framework” in Fig-ure 1. These libraries include features like: a win-dow manager, http clients, sensor access, etcetera.To obtain private information about the user ofa smartphone, the supplied Java libraries can beused. When writing an app for Android, it is possi-ble to use the provided Java libraries and the stan-dard Java library, but it is also possible to use the Clibraries that are available on the Android system.

For more information about the inner workingsof the Android OS, see “A survey on Android vs.Linux” [4].

Several Android related terms will be used inthis document. An explanation for what thoseterms mean can be found here.

APK: An APK (Application Package) file is anarchive containing the code of the app in DEX for-mat along with resources the app requires. Thesefiles are used to distribute Android apps. An APKis in reality just a zip archive, so the contents caneasily be retrieved by using any unpacking tool.

DEX: DEX is a bytecode-format that is based onregisters. Register-based bytecode means that ifyou look at a readable form of DEX you will seethat all the data in a certain function is storedin certain registers, much like the way a typicalJava program may store data inside local vari-

5

Page 6: Anaconda: Detecting private-information leaks in Android ... · Figure 1: Android Architecture such as smartphones and tablets. In 2005, Android Inc. was bought by Google, who continued

ables. DEX furthermore offers a list of differentinstruction types (opcodes) [5] to be used, suchas method invokes, binary and unary operators,jumps, etcetera. A typical instruction in DEX isa certain opcode followed by a set of registers theopcode applies to. An instruction might also needother parameters such as how far an opcode shouldjump (in the case of a goto or an if statement) orwhat method to invoke (in the case of an opcodethat invokes methods). Data about what classesare used and what methods they contain is stillavailable in DEX.

Smali: Smali is a human readable form of DEXbytecode. The name originates from a DEX assem-bler with the same name, taking smali as input.The format of the data that Androguard [6] (seesection 8) creates is very similar to smali, althougha lot of information that is present in smali is notpresent in Androguard’s format.

Activity: An Activity is an important componentof any Android app and represents a single screenthat has an interface which can be interacted withby the user. A single app can have many activities.For example, in a mail app, one activity mightshow a list of the emails in your inbox, and an-other might contain a form for sending an email.Each activity works independently, but can com-municate with other activities, even ones outsideof the current app, if that app allows it. All thestarted activities form a stack, with the one ontop being the active and visible one. Activities inthe background will be paused automatically, andcan be stopped by Android to free up resources.Whenever an activity ends or the back button ispressed, the activity is popped off the stack and theprevious activity is shown. If needed, it is restarted.

Service: A Service is also a component of an An-droid app. Like activities, multiple services can becontained in a single app. Unlike activites, they donot provide a user interface, and run in the back-ground. Services are often used for long backgroundtasks like downloading a file or playing music thatshould not stop when changing activities. Any appcan communicate with a service (unless declaredprivate). Examples of some system services arethe notification service, volume service and alarmservice.

Intent: Intents are messages that can be sent be-tween Activities or to a Service. These messagescontain an action the receiver needs to perform anddata on which the action should be performed. Anintent is either explicit, meaning the exact recipient

is given, or implicit, when no recipient is specifiedand only criteria for the recipient is provided. An-droid uses this information to deliver the intentto the correct receiver, asking the user or pickingone randomly when multiple are available. For ex-ample, an app might want to provide the user theability to share something with other people. Theapp could then send an intent containing the action“ACTION SEND”. Android then allows the userto pick an app that provides this functionality, likeyour mail app or the facebook app.

Reflection: Reflection is a technique that allowsa program to determine, at runtime, what kindof methods and fields classes have. In the case ofJava it also allows determining what classes theprogram has, and it can be used to read and writeto class members, including private class members.Reflection is posing difficulties in the context ofstatic code analysis, because it allows a program todecide at runtime which function it will call. Staticcode analysis can only look at what is known atcompile-time, making reflection hard to deal with.

Listener: A listener is an object intended torespond to certain events. For example, a de-veloper can define a class with a method calledonMouseClick(). After defining the class hecan pass an object of the class to the class thathandles the mouse, for example, a Window class.The Window class will store the passed object,and every time a mouse click happens it will callonMouseClick(). In this example the class withthe onMouseClick method is the listener.

NDK: The NDK (Native Development Kit) is away for developers on Android to call C/C++ codefrom Java. The C/C++ code that is called has ac-cess to the C libraries that are mentioned in thebeginning of this section.

3 Concept

3.1 Why static analysis

There are several approaches to analyse apps forthe purpose of information-leak detection. Let uslook at the different approaches and see why wedecided on using static code analysis.

3.1.1 Dynamic analysis

One way to check whether an app is leaking privatedata is by tracking private data at runtime through

6

Page 7: Anaconda: Detecting private-information leaks in Android ... · Figure 1: Android Architecture such as smartphones and tablets. In 2005, Android Inc. was bought by Google, who continued

the app. TaintDroid is a good example of this tech-nique. As explained in Section 8, TaintDroid [7]modifies Android’s Virtual Machine. Thanks to themodifications, the Virtual Machine is able to markwhether a piece of data in memory is tainted, and itis able to track this tainted data as it flows throughthe Android system. Because almost all of theapps on Android are actually being run on the Vir-tual Machine (the exception being apps completelyrun through the NDK), it even becomes possible totrack data sent from one process to another process.

Code obfuscationThe big advantage of tracking taint dynamicallyis that when running the code you know exactlywhat code is being executed and what code is not,making obfuscating the code almost impossible. Awriter of a malicious app may for example do thefollowing:

Example 1: Calling a methodthrough reflection

1 // encryptedMethodName is ”getDeviceId”2 // when decrypted3 String encryptedMethodName =4 ”v84yvb4yt2rbc3rcnb832n08vb”;5

6 String methodName =7 decrypt(encryptedMethodName);8

9 //methodName now equals ”getDeviceId”10 Method deviceIdGetter = TelephonyManager.11 class.getMethod(methodName);12

13 String imei = deviceIdGetter.invoke();

In the example an encrypted version of thestring “getDeviceId” is decrypted. After decrypt-ing the string, it is used to make a call to thefunction with the same name through reflection.Because the function name is encrypted, if we justlook at the code it is not clear what method is beingcalled. If we were tracking dynamically, the VirtualMachine would see that the function that’s actuallybeing called here is getDeviceId(), which returnsthe phone’s IMEI (an unique identifier associatedwith the phone), because that’s the function theVirtual Machine looks up through reflection. Usingstatic analysis it is almost, if not entirely, impossi-ble to figure out which method is being called here.

Control flowThe fact that dynamic code analysis only looks atthe code being executed is both a strength and aweakness. Why this is, will be illustrated in thenext two examples concerning control flow. Exam-

ple 2 shows a strength of dynamic analysis.

Example 2: Leaking based on aboolean

1 int taintedData;2 // someBooleanValue is set to false somewhere,3 // because the code should never run4 if(someBooleanValue)5 leak(taintedData);

In this example we only leak ifsomeBooleanValue is set to true. BecausesomeBooleanValue is always false, the VirtualMachine will never get to the leak code and noleak will be reported. With static code analy-sis, someBooleanValue must be tracked to fig-ure out whether the boolean variable could pos-sibly be true. Unfortunately, it can not al-ways be determined whether a boolean variablecan actually become true. The topic of track-ing conditional variables is further discussed inSection 5.2. An example of a leak that occursbased on some boolean value can be found inthe Amazon ad-API, where location data is onlysent to Amazon if the developer who is using theAPI calls enableGeoLocation(boolean) in theAdTargetingOptions class, with the boolean be-ing true. In this case it is very easy to see that,although the developer never intents to leak infor-mation, static code analysis might still report aleak because code that is able to leak is present.

In the following example, a weakness of dy-namic code analysis can be seen:

Example 3: Leaking only if possible

1 Location GPSloc = getLastKnownLocation();2

3 if(GPSloc != null)4 leak(GPSloc);

In this example we only leak data if requestingthe data did not fail. Dynamic code analysis willonly see the leakage if the request does not fail.If the request does fail, the leak code will neverbe executed and dynamic code analysis will nevercome across it. Static code analysis can detect theleakage because it will just assume the worst case,in this case that GPSloc can be unequal to null.

Other disadvantagesIn the case of TaintDroid (although this also goesfor dynamic analysis in general), it is very hardto dynamically track data through a C/C++ pro-gram, if not impossible. This is because unlike

7

Page 8: Anaconda: Detecting private-information leaks in Android ... · Figure 1: Android Architecture such as smartphones and tablets. In 2005, Android Inc. was bought by Google, who continued

Java, C/C++ code is not run inside a Virtual Ma-chine, instead it runs directly on the hardware.This means that TaintDroid can only make assump-tions about what happens to private informationafter it is passed to C/C++ code.

Other disadvantages of dynamic code analysisare that it introduces CPU and memory overheadat app runtime. Also, because you generally do notfollow all possible execution paths when runningan app, you could potentially miss certain leakages.Example 3 is a simple example of this behaviour.Because the execution path usually depends heavilyon runtime conditions, it becomes very likely thatwe do not follow all execution paths.

3.1.2 Checking outgoing data at runtime

Another runtime technique to detect informationleakage is by sniffing the data packages that are ac-tually leaving the phone. This could mean that youfilter the data that is being sent, to see if, for ex-ample, the phone’s IMEI is present in the output.Although this system will probably not give a lotof false positives, it is very limited in what it candetect.

Some data, like data that shows where you are,e.g., longitude and latitude, will constantly change,and it will therefore be very difficult to see whethersomething is an irrelevant piece of data or an actuallocation coordinate. One way to get around this isto modify the Android-API in such a way that itfeeds the app false data, for example, false locationdata. By feeding the app false location data, wecan match against the false data to see if the appleaks it. The TISSA system, is a system that doesexactly this: it provides apps with bogus private in-formation so no real private information is leaked.A downside to this is that providing bogus privateinformation might render some apps useless, e.g.,an app which tells an user his or her location needsaccess to location data.

Unfortunately this technique fails completelywhen the data that is sent out is encrypted, whichis becoming more and more commonplace in soft-ware, for example through techniques such as SSLand HTTPS.

3.1.3 Static analysis

A third way, and the way we chose, to detect privateinformation leakages is by using static code analy-sis. Static code analysis is a way of analysing thebehaviour of a program without running the pro-gram. To achieve this, some analysable version ofthe program needs to be available, for example, thesource code or the assembly version of the program.

Static code analysis is used a lot in detecting bugsin source code [8], but it is sometimes also used todetect malicious behaviour in programs, as is thecase with some of the projects in Section 8 (Re-lated Work).

There are a number of benefits static code anal-ysis has that other analysis techniques may nothave. Also, since any analysis that looks at code,without actually running the code, falls under staticcode analysis, there are many ways in which staticcode analysis can be done.

Benefits of static code analysisBecause static code analysis is not restricted inwhat code it can look at, all the code of a pro-gram can be analysed. This means that issues thatwould only occur on, for example, a different CPUarchitecture can still be detected. Even code thatwill never be executed, e.g., a function that is nevercalled, can be analysed. Example 3 is a good ex-ample of how static code analysis can detect issuesin code that is not executed.

Because static code analysis can analyse all thecode at once, it only needs to be run once. Depend-ing on what kind of analysis is being done runningthe analysis might still take long, but it will have noeffect on the runtime performance of the analysedapp.

Using static code analysis, it is possible to buildan analyser which reports no false negatives, e.g.,it is possible to have a static analyser that doesnot miss any leaks. This is mostly due to the factthat static code analysis offers the ability to lookat all the code at once, not just at the code beingrun. As such all leaks can potentially be spotted,not just leaks in the code that is actually beingrun. Even when reporting no false negatives, it isstill, almost always, possible to give detailed infor-mation about reported leaks, for example, in theform of an execution path which causes a leak. Theonly cases in which no detailed information can begiven are the cases in which it is only possible toassume that leaking occurs, e.g., when informationabout the leaking is only available at runtime, as isthe case with reflection. A downside to giving nofalse negatives is that static code analysers usuallygive, at least some, false positives. Chances are bigthat when an analyser is made to give fewer falsenegatives, more false positives are generated.

Static code analysis methodsDifferent methods to do static code analysis usuallyrange from giving in-depth leak reports to just giv-ing an superficial overview of program behaviour.This can clearly be seen in the previous attemptsof statically analysing Android apps.

8

Page 9: Anaconda: Detecting private-information leaks in Android ... · Figure 1: Android Architecture such as smartphones and tablets. In 2005, Android Inc. was bought by Google, who continued

Tools like Androwarn [9] and Andrubis [10] onlylook at whether private information is being re-quested in a program. The upside to doing thisis that it only requires a lightweight analysis. Thedownside to this method is that the results are notvery extensive. SAAF [11] goes a step further be-cause it first applies program slicing to the code, todetermine whether the code to request private in-formation can be reached and is not simply “deadcode”. Program slicing is a technique that “slices”all the code away that is not relevant to the con-crete problem currently being looked at, in this casecode that can not be executed. In the end, thesethree tools only give an overview of what privateinformation is requested.

Anaconda gives more information about whatan app does with private information, because italso determines whether requested private informa-tion leaves the phone. Anaconda achieves this byalso tracking the private information through thecode of the app. The results Anaconda producesare much more precise, but it takes more effort andtime to get these results.

For even more precision, it may be possibleto use model checking to analyse an app. Modelchecking creates a mathematical model of the pro-gram code. After creating the mathematical model,a check is performed to see whether the model sat-isfies a given formula. The given formula wouldin this case be a formula that describes maliciousbehaviour. Model checking can be more precisethan simpler forms of static code analysis, suchas the analysis Anaconda performs, but it is alsomuch more expensive in terms of the CPU time andmemory usage it needs to perform its analysis [12].

Why static code analysisIn the end, we chose to use static code analysis toattack the problem of leaks in Android apps. Eventhough static code analysis is limited in what it cananalyse [13], static code analysis has several ben-efits over other techniques that allows it to detectleaks other techniques can not detect. Furthermore,dynamic code analysis has already extensively beenlooked at with taintdroid, while existing static codeanalysis solutions are still severely lacking (as canbe seen in Related Work, Section 8). All in all wefelt that the best results and the best progress couldbe achieved by using static code analysis.

3.2 Sources and Sinks

To be able to detect whether an app leaks infor-mation, we need to know a few things. First, weneed to know whether an app is actually acquiringprivate information, and where it takes place. Sec-

ond, we need to know where the places are that theinformation could leave the system and thereforebe leaked. To approach this problem, we introducethe terms source and sink, where a source is theprogram location that provides private information,and a sink is a program location from where datacan be sent to the internet.

Basically there are two types of sources andsinks that we have to deal with: direct sources anddirect sinks, and indirect sources and indirect sinks.Direct sources are sources that we can be sure ofthat they provide private information, while directsinks are sinks that we can be sure of that they willsend the provided data to the internet. Indirectsources and sinks are ways in which informationcan leave or enter an app to or from the rest ofthe operating system, for example, file reads andwrites.

3.2.1 Direct sources and sinks

The only direct sources in the Android sys-tem are certain methods and structures inthe Android-API. An example of this is theTelephonyManager.getDeviceId() call, which re-turns the phone’s IMEI number. The Android-APIhas quite a lot of direct sources, both in the formof methods that directly return private information,but also in the form of methods that can be passedlistener objects. The passed listener objects willeventually be passed private information.

A direct sink in the Android system might be,for example, a socket or an http-request.

3.2.2 Indirect sources and sinks

Example 4 shows an example of indirect source andsink usage in an app:

Example 4: Leaking through a file

1 function1:2 int privateData;3 writeToFile(”file.txt”, privateData);4 function2:5 int data;6 data = readFromFile(”file.txt”);7 leak(data);

When we are reading data from outside the app,such as in function2, we do not know where thisdata is coming from. The data we are reading mightcome from a direct source, such as in function1.We can not be sure that data that is written out-side the app will not, later on, be read and leaked.Because of this, we have to deal with these poten-tial sources and sinks, which we call indirect sourcesand sinks.

9

Page 10: Anaconda: Detecting private-information leaks in Android ... · Figure 1: Android Architecture such as smartphones and tablets. In 2005, Android Inc. was bought by Google, who continued

There are numerous ways in which an app canmake data “leave” the app and, later on, “enter” itagain:

• Files: File writes and reads, such as in Ex-ample 4.

• NDK: By using the NDK a developer couldpass data to a C/C++ method and later re-quest that data again. Since this is nativecode it becomes very hard to track this codestatically. It is also possible for native codeto read and write data to and from files andto use sockets.

• Intents: An app could send or receive anintent containing private information, thatcould later on be leaked.

• Reflection: Although reflection is not reallya way for getting data outside of a runningapp, it is still useful to treat reflection as anindirect source or sink. Every function calldone through reflection could potentially re-turn private information or leak private infor-mation.

There are several ways to deal with indirectsources and sinks, ranging from algorithmicallysimple producing a large number of false positivesto algorithmically complex producing almost nofalse positives.

A. Track whether an indirect sink is accessedagainOne way to deal with this problem is to trackwhether anything we put in an indirect sink isaccessed again. For example, if we write some-thing to a file, search if we also read from thatfile, if we do read from that file we can con-tinue tracking the data that is read from thatfile. The downside to this method is that it canbe hard to find whether we access an indirectsink again. We would have to find out whether,for example, a file that is read from is exactlythe same as the file we wrote to. Especiallyfile names can be easily obfuscated. For sometypes of indirect sources and sinks such as theNDK this method is close to impossible to im-plement without looking at the C/C++ codethat is used. Statically analysing C/C++ codeis certainly possible, but it is not something wehave looked into.

In Figure 2, the apps in set 4 correspond to theapps reported as leaking with this solution.

B. Assume all indirect sources of the sametype, are sourcesAnother way to deal with this problem wouldbe to simply state that whenever we write some-thing to a file or another indirect sink, all readsof the same type return tainted information.For files this would mean that whenever pri-vate information is written to a file, all file readsare assumed to return private information. Thisway of dealing with the problem is much sim-pler, because we do not need to be sure that weare dealing with the same file or intent. Thedownside of this approach is that it can resultin a lot more false positives.

In Figure 2, the apps in the intersection of set2 and set 3, correspond to the apps reported asleaking with this solution.

C. Make all indirect sinks directBy far the easiest solution is to simply statethat if something is leaked through an indirectsink it is leaked, and when something is accessedfrom an indirect source it is private information.The big downside of this is obviously the hugeamount of false positives it can create. It mustbe noted that in the case of reflection, it is alsonecessary to treat the indirect sources as directsources. The reason that indirect sources needto be treated as direct for reflection is becausea call using reflection which returns data (indi-rect source) can potentially be a direct source,an example of this is Example 1.

In Figure 2, the apps in set 2 correspond to theapps reported as leaking with this solution. Inthe case of reflection, the apps reported as leak-ing will be a union of the apps in set 2 and set3.

Anaconda uses solution C to solve the problemof indirect sources and sinks, with the exceptionthat reflection is currently not dealt with. Chang-ing the solution used to solution B or A is some-thing for potential future work.

3.3 Tracking from source to sink

To actually make the connection between acquir-ing data from a source and passing that data toa sink, we need to track what happens to the datathat is provided by sources. This whole process notonly involves tracking the data from source, but italso involves tracking sinks and tracking referencesto the data. The subject of tracking references isdiscussed in Section 5.1.

10

Page 11: Anaconda: Detecting private-information leaks in Android ... · Figure 1: Android Architecture such as smartphones and tablets. In 2005, Android Inc. was bought by Google, who continued

Figure 2: Venn diagram

1.

2. 3.4.

1: All apps.2: Apps that leak private data to an indirect sink.3: Apps that leak data acquired from an indirectsource.4: Apps that leak private data through an indirectsource and sink pair.

3.3.1 Tracking forward from source

The first step in figuring out whether private in-formation is leaked is by finding out if and wherein the code this information is acquired. Find-ing where private information is accessed can beas simple as figuring out where certain functions,such as getDeviceId(), are called. After find-ing where this data is accessed, we can track thedata through the app, to find out if it eventuallyleaves the phone. To track data we simply lookat all the instructions that use the data, and takethe appropriate action. Let us look at an exam-ple of an attempt to leak the phone’s IMEI number.

Example 5: Simple leak

1 String imei = manager.getDeviceId();2

3 Socket socket = new Socket(”hostname”, port);4 DataOutputStream out =5 new DataOutputStream(6 socket.getOutputStream());7

8 out.write(imei);

In the example, getDeviceId() is called. SincegetDeviceId() returns the phone’s IMEI, we con-sider it to be a source and track the result. Theresult of the function call is stored in the localvariable imei, so we start tracking imei. Even-tually imei is passed to an output stream con-nected to a socket, so we can now report a leak.In this example the instructions that we track aresimple: an instruction that stores the result ofgetDeviceId(), and an instruction that invokes

the method DataOutputStream.write(String).In more complex examples there might be instruc-tions that return the data, data might be storedinside a class member, etcetera.

3.3.2 Tracking sinks forward

One problem that appears in the previous exam-ple is the fact that, while tracking, we do not re-ally see the link between the DataOutputStream

that we write to, and the Socket that thisDataOutputStream eventually sends its data to.To still be able to detect whether data will ac-tually end up in, in this example, a socket, weneed to identify which instructions cause data tobe passed to a sink (such as the invoke of theDataOutputStream.write(String) method).

Identifying which instructions pass data to asink can be done by tracking from the place wherethe sink is defined. Finding where a sink is definedcan, for example, be done by looking at where theconstructor of the sink is called. After having foundthe sink, we look for instructions that call a methodof the sink. The only thing left to do after we havefound such a method is marking this instructionas passing data to a sink. Furthermore, we alsoneed to handle the other instructions that try todo something with the sink we are tracking, for ex-ample, when the function we are in returns the sink.Luckily we can handle this by using the same ruleswe used when tracking forward from source.

In Example 5 we treat the OutputStream

that Socket.getOutputStream returns as the sink,since this is the object that we will pass datato. While tracking this OutputStream we wouldnotice that it is passed to a DataOutputStream.

11

Page 12: Anaconda: Detecting private-information leaks in Android ... · Figure 1: Android Architecture such as smartphones and tablets. In 2005, Android Inc. was bought by Google, who continued

Because of this, we also start tracking theDataOutputStream. Eventually a method callof a method of the sink we are tracking, theDataOutputStream is called, meaning we can markthis instruction as passing data to a sink. When thetracking from source occurs we will find that imei ispassed to the method we marked as passing data toa sink (DataOutputStream.write(String)), and aleak can be reported.

In the end, we only need one algorithm for bothtracking data forward from a source and trackingsinks forward. The main exception this algorithmmakes is that while tracking sinks, instructions maybe marked as passing data to a sink, which is notthe case when tracking from source.

4 Realisation

To determine if an app leaks, Anaconda [14] triesto find a path between a source and a sink. To beable to find said sources and sinks we first need toconvert the APK, and more specifically the DEXinside, into a format that we can use to analyse theapp. Instead of doing this ourselves from scratch,we used the Androguard tool. As explained in Sec-tion 8.1, Androguard is a collection of tools foranalysing APKs. Besides decompiling the DEX,it provides simple functionalities for looking up us-ages of fields and function calls. We use this tofind the locations of usage of certain functions orfields we are interested in. We then further anal-yse the instructions at these locations to track thedata through the code. The following four steps canroughly be distinguished in this tracking process inAnaconda:

1. Finding sinks

2. Finding sources

3. Finding a path from source to sink

4. Generate HTML report

We will take a more detailed look at each step be-low.

4.1 Finding sinks

The goal of this step is to find all the direct andindirect sinks created in the app and mark the in-structions where they are used. When tracking pri-vate data from sources to a sink in the third step,finding a path from a source to sink, we can usethis information to determine if the information isleaking or not. First we start looking up all thelocations where a sink is created. For this we use

Androguard. We created a list of sinks availableon the Android platform, containing 49 of the mostcommon sinks. In this list are, for example, thestandard Java socket but also the Near Field Com-munication (NFC) adapter. We pass each entry toAndroguard, which then tells us where the sink isused. The entire list can be viewed in appendix B.

After having found the used sinks, we trackthese sinks to find all leaking instructions as de-scribed in tracking sinks forward in Section 3.3.2.While tracking we take actions depending on theinstructions encountered as described in Table 1.After completing this step, each instruction involv-ing a function call on a sink has been marked.

It is important to note that in the case where afunction is called that is not defined in the APK,we only mark it as leaking if the call is made onthe sink object, not if it is used as a parameter.What this means is that we do not take cases wherea sink is passed as a parameter to a standard li-brary call in consideration. We could not find anycases where this would result in an actual leak, andlooking manually through cases found by Anacondaall cases appear to be false positives. As such, in-cluding these cases would only result in more falsepositives and we decided to exclude them. When,however, the code is available, we continue trackingthat parameter in the function itself.

4.2 Finding sources

Now that we know where all the sinks are, we needto check if any private information from sources isput into them. To find the sources, we again haveconstructed a list of sources we are interested in.This includes functions like getDeviceId(), whichreturns an unique device id, and getLastKnown-Location(). Calling a function, however, is not theonly way of accessing private information. Privateinformation could also be stored in fields of objects.An Account object, for instance, has a name field.No function call is needed to retrieve this informa-tion. So, besides function calls, this list also in-cludes fields that contain private information. An-other way to get private information is adding alistener. A LocationListener could be used to re-trieve the current location of the user. Each timethe location is updated this Listener is notified andpassed the new location. Some known listeners arealso included in this list. Combining these methods,fields and listeners results in a list of 115 knownsources, which is attached in appendix C.

Just like with the sinks, we feed this list intoAndroguard which tells us where the sources areused.

12

Page 13: Anaconda: Detecting private-information leaks in Android ... · Figure 1: Android Architecture such as smartphones and tablets. In 2005, Android Inc. was bought by Google, who continued

Table 1: Decision logic. X represents the register currently being tracked, where X iseither a sink or a source. Other registers are referenced by v. Constants are representedby c.

Instruction Action Descriptionmove X, v Stop Tracked register is overwrittenmove v, X Track(v) Information is copied into v, track v as wellreturn X Track(Method.usage) Continue tracking at every call of this methodconst X, c Stop Tracked register is overwritteninvoke {X, ...} Method Track(Method.return) First parameter is the instance, so no point

tracking in method. Track what is returnedinvoke1 {..., X, ...} Method Track(Method.parameter(X)) Continue tracking in the methodinvoke-virtual {..., X, ...} Method Track(Method.parameter(X)) Continue tracking in the method, and in this

method of subclasses as well.invoke2 {..., X, ...} Method Track(Method.return) Method is not defined in APK, track what is re-

turnedinvoke-static {..., X, ...} Method Track(Method.parameter(X)) Continue tracking in the methodinvoke-static2 {..., X, ...} Method Track(Method.return) Method is not defined in APK, track what is re-

turnedcheck-cast X Pass Tracked register is not modifiednew-instance X Stop Tracked register is overwritteniget X, v, Field Stop Tracked register is overwritteniget v, X, Field Track(v) Field of tracked object is accessediput X, v, Field Track(Field.usage) Continue tracking at every iget of this field of

any instance of this classiput v, X, Field Pass Field of the tracked object is changedsget X, Field Stop Tracked register is overwrittensput X, Field Track(Field.usage) Continue tracking at every sget of this fieldunary-operator X, v Track(v) Start tracking the resultunary-operator v, X Stop Tracked register is overwrittenbinary-operator X, v1, v2 Stop Tracked register is overwrittenbinary-operator v1, X, v2 Track(v1) Start tracking the resultbinary-operator v1, v2, X Track(v1) Start tracking the result

A 1 above an invoke instruction means, in case of a sink being tracked, this instruction will be marked as passing data toa sink. A 2 means the method is not defined in APK, and we can not continue tracking in the instructions of the method.

4.3 Finding a path from source tosink

When we have found a source, we need to deter-mine what happens with the private informationretrieved from it. As explained before, the DalvikVirtual Machine uses registers. By determiningwhat register the information is put into, we canlook for usages of the information by looking forusages of that register. Whenever the register isused, we first need to check if we previously markedthe instruction as passing data to a sink. If this isthe case, that means the private information is po-tentially being leaked! The instruction is markedas leaking and we continue with the next instruc-tion, as it might leak in more locations. If it is notmarked, we need to analyse this instruction to de-termine what happens with the information. Pos-sibly we need to start tracking other registers thatnow also contain the private information, or at leastparts or references to it. There are however a lotof different instructions, which all perform differentoperations on the register. Depending on the in-struction, a different action should be taken. Wedescribe these actions in Table 1.

4.3.1 Actions

There are three possible actions when an instruc-tion uses a tracked register:

• In the first case, we need to track an addi-tional register. This could be when, for exam-ple, a move instruction is performed. Thesekind of instructions move the contents of oneregister into another. Another example is afunction call which takes private informationas a parameter. It could return data which isbased on the private information, which thenneeds to be tracked. Tracking a new regis-ter does, however, not necessarily happen inthe same function we are currently tracking.Tracking a new register could very well hap-pen in a called function, with the currentlytracked register as a parameter.

• The second possibility is that we stop track-ing a register. Let us consider the move in-struction example again. What if the trackedregister is not the source register, but the tar-get register? That would mean the registerthat is being tracked is overwritten with new

13

Page 14: Anaconda: Detecting private-information leaks in Android ... · Figure 1: Android Architecture such as smartphones and tablets. In 2005, Android Inc. was bought by Google, who continued

information. This reference or copy of the in-formation we were tracking is lost, so we nolonger track this register.

• The final possibility is the most trivial one.The register is used in an instruction, but thisdoes not result in the private information be-ing transferred to another register. An ex-ample of this is the check-cast instruction. Itchecks if the object referenced in the registercan be cast to a certain type. If this test failsan exception is thrown. The flow is possiblyinterrupted here, but the information in theregister is not transferred or modified in anyway.

These three actions can be represented by threedifferent functions. These are Track(x), Stop andPass respectively. We have used these three func-tions to present our decision logic in Table 1. Inthis table, X represents the register currently beingtracked. By v, v1, v2... we denote any other regis-ters used in the app. Method and Field representreferences to a specific method and field. Most ofthese actions are quite self-explanatory. For exam-ple, when the tracked register is overwritten, track-ing of it is stopped. When it is moved into an-other register (modified or not), the new register istracked as well.

There are many more instructions available inthe Dalvik Virtual Machine, but most are variationson the same instructions that we present in Table 1.In most cases the action taken does not change andas a result we omitted these instructions from thetable.

4.3.2 Difficulties

Although the mentioned cases could be handledwithout much complexity, there are more complexcases. Most important is the differentiation be-tween an invoke where the instructions are avail-able, and an invoke where the instructions are notavailable. It is possible for a function to be calledwhich is not defined within the APK we are cur-rently analysing. While included libraries are en-closed in DEX, standard Java or Android classesand methods are not. As a result, we have no accessto the instructions of that method and we can notdynamically find out what the function does. Theonly solution is specifying ourselves what a methodwill do with its parameters. We have done this forsome sinks and sources, but it is hard to do this forevery function available. In these cases, we have tomake an assumption. When a tracked register ispassed into a function we do not have the code of(and which is not marked as leaking data to a sink),

we assume the tracked register is contained in thereturned data in some form. While this clearly canresult in false positives, it appears to be the onlysound option in this case.

Another difficult case is instance fields. Instancefields are variables declared in each instance of acertain class. Each instance field is independentand can be different for each instance. Class fieldsare simpler. These are variables at the class level,and as such there is only one. When private infor-mation is put in a class field, we can look-up alllocations where this field is read again. At everylocation we then continue tracking the register itwas copied in. Instance fields, however, are harderto deal with. It is very hard to know which in-stance is used when an instance field is accessedwhile only tracking forward, as shown later in Sec-tion 5.1. To still be able to detect whether or not afield is used again, we look for all reads of the fieldinstead. This means we potentially start trackingfields from other instances of this class. As a result,false positives will be introduced.

Detecting where a function is used or a field isread is relatively simple. But we defined a thirdkind of source, listeners. Listeners are harder totrack down because they always are a subclass ofan interface or abstract class. Looking for objectspassed to AddListeners will often not be enougheither, as at compile time these can very well be ofthe interface type instead of the subclassed type.To detect these listeners, we therefore look for anysubclasses of the listeners. This is possible becauselisteners always need to be implemented by the de-veloper. This does not yet guarantee it is also addedas a listener, but when a listener is defined, it islikely to be used as well. We know which functionof that listener will be called, and which parametercontains the private information. From here on wecan treat it as a normal function call with a taintedparameter.

4.3.3 Optimisation

Because of the way tracking works in Anaconda,it is very likely that we start tracking paths thatalready have been tracked. Tracking paths thathave already been tracked is very detrimental forperformance, and should be avoided. There aretwo ways in which it is possible to start trackingpaths that already have been tracked: loops in thecode, and calling functions that were called before.The methods we have used to optimise these casesare very similar.

LoopsCode is bound to have loops in it, else the function-

14

Page 15: Anaconda: Detecting private-information leaks in Android ... · Figure 1: Android Architecture such as smartphones and tablets. In 2005, Android Inc. was bought by Google, who continued

ality of an app would be severely limited. Loopscould appear in the form of a simple for or whileloop, but also in recursion or goto constructs. Toprevent Anaconda from going into an infinite loopwhile tracking, we need to check whether we arestarting to loop. We have solved this problem bykeeping track of every instruction we visited, andalso remembering the register we were trackingwhen we visited it. These combinations of instruc-tions and registers are remembered for as long as weare tracking the current function. Then, for eachinstruction we visit, we check if we have been therebefore with the register we are currently tracking.If we arrive at an instruction we have visited be-fore, we appear to be in a loop. When we havedetected this loop, we stop tracking in the currenttracked path, which is equivalent to the Stop actionin Table 1. Since the way we track is deterministic,when we encounter an instruction/register pair wehave encountered before, we can be sure that wewill, from there, follow an already followed path.We also do not have to be afraid that, when weencounter a loop, the register we are tracking con-tains something else than the first time, since weare still tracking the result of the same source.

Function callsWhenever a function is called that has been calledbefore, code that has already been tracked could betracked again. Tracking code that has been trackedbefore, because of function calls, can also causeAnaconda to go into an infinite loop. An infiniteloop, in this case, happens when, for example, func-tion A() calls function B(), which in turn calls func-tion A(). Not only can function calls cause infiniteloops, they can also cause very long tracked pathsto be tracked again, sometimes tens to hundreds oftimes, making some analyses take several minutesinstead of several seconds. To solve this problem,we also keep track of the instruction and registerwe look at when we first start to analyse a func-tion. These instruction/register pairs are stored ina global structure. Every time we start tracking ina function, we first check this global structure to seewhether we have started tracking in this functionbefore. If we have been in this function before, westop tracking the current path, which is, just likewith loops, equivalent to the Stop action in Table 1.

As opposed to the loops solution, it is possiblefor a register we are tracking to contain somethingdifferent than the first time we visited this instruc-tion/register pair. This could pose a problem when,for example, the code has a function leak() thatleaks data. The first time data is passed to thisfunction, e.g., the phone’s IMEI, a leak of the IMEIis reported. The second time data is passed to this

function, e.g. location data, no leak is reported,because we have been here before and thus executethe Stop action. Because we Stop the second time,we never report that location data is being leaked aswell. To fix this problem, we need to know whethertracking from a certain instruction/register pair re-sults in a leak. For this reason, this information isalso stored in the global structure. When we finda leak, we mark every instruction we inspected inthe path from the root to this leaking instructionas leaking. Now, when we encounter an instruc-tion/register pair we have already tracked, all thathas to be done is check if this instruction/registerpair leads to a leak. If this instruction/register pairleads to a leak, we can also report the private in-formation we are currently tracking to be leaking.

4.3.4 Track paths

Figure 3 shows an example of how different pathsof tracking form a tree. The root of the tree isformed by the first instruction analysed. Each col-umn represents a track path, and each row showsthe actions taken for a single instruction. A singlepath only looks at a single register. If that registeris overwritten, that path ends. Whenever the infor-mation is copied and/or modified, and then put ina different register, a new path is created in whichtracking continues. This could be compared withdepth first search. Each time a new register needsto be tracked, that register is tracked first. Whenthat path ends, either because it is overwritten orthe method ends, the algorithm continues on theprevious path.

4.4 HTML report

Figure 4: HTML report

To help us get insights into the results of Anaconda,we generate an HTML report from the results. Todo this, we build a tree structure like in Figure 3for each source and sink from our lists found by An-droguard, resulting in a list of trees. We keep track

15

Page 16: Anaconda: Detecting private-information leaks in Android ... · Figure 1: Android Architecture such as smartphones and tablets. In 2005, Android Inc. was bought by Google, who continued

Figure 3: Example track tree

of all the decisions made and all the instructionswhich have been marked as leaking and use pri-vate information, i.e., instructions that leak. Fromthis information we build a report in HTML. Anoverview of such a report is shown in Figure 4. Thereport consists of four sections. On the left side isa list of all the different track trees for all the sinksand sources. When they contain a leak, they aremarked as leaking. Selecting a tree allows you toview more detailed information about that tree onthe right side of the page. At the top the tree itselfis shown. This tree is, however, a simplified versionof the tree shown in Figure 3. Instead of show-ing each path which tracks only one register, whichwould result in a lot of branches, we represent asingle method as a node. Each node in the tree ismarked as leaking if either it or one (or more) of itschildren leak private information.

Below the tree is the code of the currently se-lected node. Androguard provides functionalityto generate Java code from DEX. As Java codeis much more compact it is easier to read, butmany names are lost and sometimes even obfus-cated when compiled into DEX. It is only an ap-proximation of the original code. Under the Javacode are the separate instructions extracted fromDEX. They are grouped into code blocks so that theprogam flow becomes visible. The GOTOs indicate

which block follows. Some of the instruction havecomments, indicating where and which instructionsare tracked, and which actions are taken. Instruc-tions that are found to be leaking are marked aswell.

While these paths can be quite lengthy, they dogive good insight into where the private informa-tion comes from and which sink or sinks it is leakedinto, including the entire path between them. Asthe instructions and actions are provided, it alsoallows for manual verification.

4.5 Algorithmic complexity

The time it takes for Anaconda to do its analy-sis heavily depends on the code of the app it isanalysing. If the app that is being analysed doesnot use a big number of sources or sinks not muchtracking needs to be done, as such the time neededfor analysis is short. The time needed to anal-yse can also greatly differ between apps, based onhow deep we need to track. Sometimes a trackis stopped early, because the data that is beingtracked is, for example, overwritten.

Even though analysis time depends on the appcode, it is possible to give an upper limit to theamount of instructions we need to look at to anal-yse an app. Thanks to the earlier mentioned op-timisations, an instruction will not be looked at

16

Page 17: Anaconda: Detecting private-information leaks in Android ... · Figure 1: Android Architecture such as smartphones and tablets. In 2005, Android Inc. was bought by Google, who continued

more than twice, once for finding sinks, and once forfinding sources. The time it takes to handle an in-struction depends on the number of arguments theinstruction takes. For each argument an instruc-tion takes, a new track could possibly be started.The number of arguments is limited by the numberof arguments a function in this app can have. Assuch the upper complexity limit is O(i ∗ a), wherei equals the total number of instructions used, anda equals the number of arguments functions in thisapp can have. The lower complexity limit is O(1),this is the case where no sources or sinks can befound. Please note though that it is not known tous what the complexity is of the decompiling pro-cess of Androguard. Because of this the lower com-plexity limit, and even the upper complexity limit,could be higher.

5 Further research

Unfortunately time is limited. As a result we werenot able to implement everything we would haveliked to. Below we look into a number of problemsin our current program. We did research solutionsfor these problems, and formulated the results intoconcrete solutions. Unfortunately, we did not havetime to implement any of these solutions into oursystem. Still, we would like to share our proposalsto resolve these problems.

5.1 Tracking references

Tracking data forward, as is done with private in-formation and sinks, catches a lot of leakages thatappear in code. There is, unfortunately, a type ofleakage that we can not track by tracking forwardfrom a source. The problem that arises when usingmerely forward tracking from a source can be easilyshown with an example:

Example 6: Leaking through classfields

1 String imei = this.imeiMember;2 imei.append(manager.getDeviceId());3

4 // Leak something to the internet5 leak(this.imeiMember);

Because Java is based on references, theprevious described methods of tracking forwarddoes not detect the leak that occurs. Whenthis.imeiMember is assigned to the local stringimei, nothing is copied, instead imei now pointsto the same data this.imeiMember is pointing to.Because imei is a reference, when we append to it

we also append to this.imeiMember. When the ac-tual tracking occurs, we start with tracking the re-sult of getDeviceId(), which is appended to imei.Because of the append to imei we start trackingimei, however, imei is not used after the append,and no further tracking will be done. Eventuallythe code leaks this.imeiMember, which containsthe IMEI because it references the same data asimei did.

Basically, the problem occurs because we arenot sure who else possesses a reference to the datawe are tracking. In this example someone else,this.imeiMember, possessed that data. To solvethis problem we propose a number of solutions.

Tracking all class fieldsOne solution would be to track the field usages of allthe fields in the app. By tracking all the fields it ispossible to create lists of places where others referthe same data as the field we are tracking. Cre-ating these lists would be done by looking whereexactly a field is accessed, and after that trackingthis reference forward and noting where it is usedor copied.

In Example 6 we would first look wherethis.imeiMember is accessed. After finding theusage of this.imeiMember in our example, we canstart tracking this.imeiMember forward. Sincethis.imeiMember is copied to imei, we start track-ing imei. At all the occurrences of imei, we makea note stating that it refers to the same data asthis.imeiMember. Once we have done this, andonce we have determined that imei contains pri-vate information, we can also decide to track theusages of this.imeiMember, since we know thatit refers to the same data as imei. A potentialproblem with this solution is that it requires thatwe track all the field accesses in the app. Sincethe number of field accesses is generally quite high,this could potentially have a big negative impacton performance.

Tracking references backAnother solution is to track back when we en-counter the situation that private information isappended to a local variable. Appended in this con-text does not mean a literal append, as is the casein the example, it means the case where data isadded to a variable instead of overwriting it. Theidea here is to track back and see how this localvariable has been created. It could be the case thatthis local variable has been created by a new, inwhich case nothing special has to be done. The in-teresting case would be when the local variable iscreated by an assignment from some other variable.When this other variable is a class member, we can

17

Page 18: Anaconda: Detecting private-information leaks in Android ... · Figure 1: Android Architecture such as smartphones and tablets. In 2005, Android Inc. was bought by Google, who continued

look for usages of this class member. If the othervariable is not a class member we have to keep ontracking both variables.

In the previous example we would see that pri-vate information is appended to a local variable,imei, and as such we track back to see what hap-pens to imei. While tracking back we will noticethat this.imeiMember is assigned to imei and de-cide to track usages of this.imeiMember forward,finding the leak. This approach will take extra per-formance, but since we do not have to track all thefields, as in the previous solution, this approach isa lot cheaper than the previous approach.

5.2 Conditional statements

A problem that can occur is that whether some-thing is actually leaked can depend on conditionalstatements. A common example of this behaviouris that APIs that provide ads for Android appssometimes only request certain private informationif the developer of the app has set a certain boolean.An example of this is Example 2 in Section 3.1.1.

In Example 2, we would start tracking imei be-cause the result of getDeviceId() is stored in it.Since imei is after that passed to leak(), a leak isreported. It is however clear to see that dependingon the value of someBooleanValue, the developermight not have had the intention to leak the IMEI,and leaking it might never actually happen.

Tracking conditional statusesTo see whether it is possible that the use of a con-ditional results in the leaking of data, we need toknow whether the value upon which the conditionalswitches can be a certain value. Let us stick withif statements for now. In the previous example,we want to know whether it is not the case thatsomeBooleanValue is always set to false. If wefind a place where someBooleanValue is set totrue, or if we cannot determine what the variableis set to, we will assume it is set to true.

Example 7: Leaking based on aclass field

1 function1:2 this.someBooleanValue = true;3 function2:4 this.someBooleanValue = false;5 function3:6 if(this.someBooleanValue)7 leak(privateInformation);

To figure out what values a variable can have,

we need to track back from the place we are usingthe variable, in this case the if statement. Whiletracking back we look for instructions that assignvalues to the variable we are tracking. When an as-signment to our variable is found, multiple thingscould happen: true is assigned, in which case we as-sume the variable is equal to true; false is assigned,in which case we assume the variable is equal tofalse, or something else is assigned. When some-thing other than true or false is assigned to ourvariable, we either track whatever is assigned, orwhen we cannot track it we assume the variable isequal to true. In the case that different values couldhave been assigned to the variable (see Example 7)we choose true. The reason we choose true in Ex-ample 7 is because it is very hard to determine whatthe value of this.someBooleanValue is at the mo-ment the leak occurs.

The problem with this approach is that trackingpotential values of one variable can quickly branchinto the tracking of multiple variables, which mightnot be booleans. Consider the following example:

Example 8: Leaking based on acomparison

1 int value1 = value2 + value3;2

3 if(value1 < value4)4 leak(privateInformation);

To determine, in the example, whether the leakcan occur, we need to determine whether value1

is smaller than value4. To determine this we al-ready need to track two variables. When we starttracking these two variables we will eventually seethe value1 = value2 + value3 expression. To beable to figure out whether our condition can be truewe now also have to track value2 and value3. Be-cause the amount of variables to track can be eas-ily expanded, tracking the values of variables canpotentially have quite an impact on performance.Another problem that occurs in this example isthat we are no longer tracking booleans but inte-gers or floats. Because we are now tracking integersand floats, we have to detect what possible valuesall the variables we track could have. Let us as-sume that value2 could be either eleven or three,and that value3 could be either one or seven, inthis case value1 could be twelve, four, eighteen orten. If more branching occurs, the amount of val-ues value1 could have, can increase exponentially,which will also have a negative impact on perfor-mance. In the end it might be a better solutionto assume that complex conditions, such as in theexample, can be true. With this solution, we as-

18

Page 19: Anaconda: Detecting private-information leaks in Android ... · Figure 1: Android Architecture such as smartphones and tablets. In 2005, Android Inc. was bought by Google, who continued

sume that if code was meant to be possibly dead, aboolean would have been used to indicate this (suchas this.someBooleanValue in Example 7).

5.3 Usage of source to sink paths

Once we have found paths that lead from source tosink, we need to ask ourselves when we will actu-ally be leaking. Even though we have found a pathfrom source to sink, we are not sure yet whetherthe code in this path is actually reachable. An appcould, for example, have a function which acquiresthe phone’s IMEI, creates a socket, and passes theIMEI to the socket. However, if the mentionedfunction is never called, in which case it is “deadcode”, the IMEI is never leaked.

To be able to detect whether the leak actuallyoccurs, we will need to split the code in this pathin several parts and see if every part in this path isexecuted. To see how to choose these parts let uslook at a possible data path:

Example 9: Data path

1 function1:2 this.imei = manager.getDeviceId();3 function2:4 String localImei = this.imei;5 return localImei;6 function3:7 String localImei = function2();8 leak(localImei);

In this example, we will start tracking infunction1 since that is where getDeviceId() iscalled. Since the IMEI is stored in this.imei,we will have to look for usages of this.imei.this.imei is used in function2 where it isreturned, causing us to search for usages offunction2. function2 is used in function3 wherethe result of the function call is leaked. We cannotjust simply look whether function1 is called, be-cause just a call to function1 only implies thatthe IMEI is stored, not that it is leaked. We couldcheck whether every function in the path is called.This approach would be sufficient to show that thepath we have found can be executed. The problemwith the last approach is that it is a little too com-plex, we do not need to check if all the functionsare called, since calling function3 implies callingfunction2.

To reduce this problem and understand it, weneed to look at the control flow in this exam-ple. The problem with tracking data through anapp is that the path data takes is not necessar-ily the same as the execution path of the app. In

the given example, we can distinguish two differ-ent control flows. The first control flow is donein function1, because function1 is not called byfunction2 or function3 and it does not call thosefunctions either. Because function1 encompassesa control flow, we need to check if function1 is ac-tually called. The second control flow starts withfunction3. After the control flow is started, itcontinues in function2 because function3 callsfunction2. Because the second control flow startsin function3, we also need to check whetherfunction3 is called. To generalise the last controlflow, we could say that the function that starts thecontrol flow is the function that does not return thedata we are tracking, and the function that is notcalled by any of the other functions in the path.

Now that we know which functions need to becalled for a leak to happen, we can find out if theyare called. To find out if a function is called, wecan make a list of all the functions that call thisfunction. If the first function started in the appis in this list, we know our function is called. InAndroid the first function to be called is generallythe onCreate of an activity, although more func-tions exists that could start an Android app. Ifthe first function is not in the list, we repeat theprocess for the functions in the generated list. Thisis an example in pseudo-code of how this algorithmwould work:

Example 10: isFunctionCalled

algorithm

1 bool isFunctionCalled(function):2 var functionList = function.isCalledBy()3 if onCreate in functionList:4 return true5

6 for function in functionList:7 return isFunctionCalled(function)8

9 return false

6 Evaluation and results

We have tested Anaconda on a number of popu-lar apps on the Android Market. Our selection ofapps is comprised of popular apps that a large num-ber of Android users will have installed, and appsfor Dutch services that are also popular. We thenanalysed the results in further detail.

The different types of leaks are explained ingreater detail in Appendix A. Throughout thissection we use the simplified names for the leaksinstead of the names of the methods we actually

19

Page 20: Anaconda: Detecting private-information leaks in Android ... · Figure 1: Android Architecture such as smartphones and tablets. In 2005, Android Inc. was bought by Google, who continued

track.

For some apps we made a distinction betweenintended usage and unexpected leaks. Some appssend out private information, but this is what theapp is supposed to do. That would be intendedusage. However, if an app sends out private infor-mation that it logically should not need in order tofunction, that would be an unexpected leak. Wecannot detect if a leak is intended or unexpected;both are marked as leaks by Anaconda.

In the following sections we discuss some of theinteresting leaks, behaviour of the app and falsepositives by Anaconda.

6.1 Analysed apps

6.1.1 Dropbox

Leaks:

Leak AmountAccount user data 1Query 5Wifi MAC address 1Network operator 1

We analysed the Dropbox app because it isa popular app to synchronise files between yoursmartphone and other devices. To use the app youneed to log in to your Dropbox account, so it isexpected to use account data like username andpassword.

The user’s network operator is leaked via anHTTP GET to a URL on Dropbox’s domain,https://api.dropbox.com/1/report host info?.The network operator name is sent to this address,along with other data such as the app version, theuser’s locale, the phone’s model name and Androidversion. This is very clearly a leak of the user’spersonal information. It seems that this code isused to gather statistics on the app’s users. Thisis an unexpected leak; gathering this data is notessential for the app to function. Because you needto log in to your Dropbox account to use the app,it might be possible that this information can betraced back to the specific user of the app ratherthan being anonymous.

The amount of obfuscation in the APK is quitehigh, which makes manual verification of the re-sults challenging. We can say for certain that theapp only uses the leaked data to create text stringsusing a StringBuilder, and only sends data to anaddress on their own domain, not to an external adserver.

6.1.2 Barcode Scanner (zxing)

Leaks: nothing.

Barcode Scanner is an app to scan barcodes andQR codes with the smartphone’s camera and is animplementation of the ZXing library. Other appscan also use this app to scan barcodes, instead ofhaving to write their own implementation.

We did not detect any leaks in this app. Thereare only a few sinks and sources found, and thereis not a path between any of them.

The app also uses the Wifi SSID and WEPKeys.One of the features of the app is to read Wifi set-tings from a QR code, so that you can automati-cally connect with a network without having to typethe password. We did not detect that this data isbeing leaked.

6.1.3 Humble Bundle Downloader

Leaks: nothing.

The Humble Bundle app allows you to easilydownload the games you have purchased throughthe Humble Bundle service. We did not detectany leaks in the Humble Bundle Downloader app.Again, only a few of the sources and sinks thatwe track were found, and none of the sources areleaked.

6.1.4 Tweakers

Leaks:

Leak AmountNetwork operator 1

Tweakers.net is a Dutch technology-relatedwebsite. The Tweakers app allows you to view thesite’s contents with an Android-native interface.

The Tweakers app is mostly clean; few of thesources and sinks that we track have been detected,in other words the app does not even use most ofyour private information, let alone leak it. How-ever, the app does leak your network operator. Thishappens in a method that appears to fill a hashmap with various types of information such as thesize of your screen, the Android version and lan-guage and a number, and other ad-related informa-tion retrieved from the com.google.ads (AdMob)package such as the session id and the position anad should appear.

In the end, a StringBuilder takes all the in-formation from this hashmap and concatenates itinto a single string. What is done with this result-ing string is unclear, but it will probably be used

20

Page 21: Anaconda: Detecting private-information leaks in Android ... · Figure 1: Android Architecture such as smartphones and tablets. In 2005, Android Inc. was bought by Google, who continued

to retrieve ads targeted to the specific user’s deviceand preferences.

6.1.5 Sudoku Free

Leaks:

Leak AmountAccount name 1Query 1Last known location 6Wifi MAC address 2OS serial 1IMEI 5Phone number 1Network country iso 1Network operator 5Sim country iso 2

The Sudoku Free app has a staggering list of de-tected leaks. Closer inspection reveals that manyof these leaks come from the fact that Sudoku Freeuses multiple ad APIs at the same time. Amongthe list of sources we track, the APIs that leak thesesources were:

• Millennial Media

• AppLovin

• Mobclix

• MoPub

• Tapjoy

• Admob

Each of these APIs leaks some piece of privateinformation, with some overlap in what sourcesthey leak. Only one leak is caused by code fromthe app’s developers themselves. This was theIMEI. The developer seems to have his own pack-age to manage its ads and user information. Thispackage has a custom method com.genina.util.

version.VersionSpecificFunctionFacade.

getDeviceId() for getting the IMEI, and alsostores the IMEI in a field.

This method is eventually used, throughanother method getAndroidID() that returnsthe IMEI from this field, to create a stringPHONE MODEL that contains the Android modeland IMEI. This ID is finally leaked in a methodsubmitStackTraces(). Apart from this, the devel-opers do not leak any data themselves. It shouldalso be noted that the developers have a webpage ontheir policy for protecting private information [15];either they are unaware of the leaks or are breach-ing their own privacy guidelines.

Also interesting is to see which ad APIs acquiredtracked sources, but were not detected as leaking.These were:

• Greystripe

• Pontiflex

• Inneractive

• Kiip

Aside from these, it is possible the app mightuse more ad APIs that did not even use the sourcesthat we track.

6.1.6 Reddit Sync

Leaks:

Leak AmountQuery inputstream 1Query 2Network operator 1

Reddit is a social news site. The Reddit Syncapp allows you to browse the site with a moremobile-friendly interface.

The network operator is stored in a hash mapand then passed to the Admob API. A query in-putstream is used in decoding bitmap images, andbecause this was marked as a sink somewhere elsein the app, it is seen as leaking. Overall there isnot much interesting happening in the app.

6.1.7 Word Search

Leaks:

Leak AmountOn location changed 1Last known location 7IMEI 3Network operator 2

Word Search is an app to play, as the name sug-gests, a word search game. As with Sudoku Free,this app uses six different ad APIs. The cause ofthe leaks originates from these APIs. Three of themleak the user’s last known location, each in theirown way.

6.1.8 gReader

Leaks:

Leak AmountAccount name 2Query 28Network Operator 1

21

Page 22: Anaconda: Detecting private-information leaks in Android ... · Figure 1: Android Architecture such as smartphones and tablets. In 2005, Android Inc. was bought by Google, who continued

The network operator is leaked via Google Ad-mob. Query is leaked many times, and called with-out leaking even more often. Interestingly, An-droidQuery, a library for manipulating UI elements,leaks Account Name. There does not seem to be areason for a UI library to request an account namefrom the device, so this is very odd.

6.1.9 Shazam

Leaks:

Leak AmountQuery 19On location changed 1Last known location 3OS fingerprint 2IMEI 5Network country iso 1Network operator 4Subscriber id 1

Shazam is an app that allows the user to iden-tify music by using the microphone. For this app, itmakes sense that it connects to a remote server todo the identification process, but for this it shouldnot require much private information. The appdoes use the last known location in order to sharewhere you have heard a song with your friends, sothat is intended usage. As before, Shazam uses anumber of ad APIs that leak the usual bits of pri-vate information such as the user’s location and theIMEI.

Of interest is the fact that Shazam leaks theIMEI and IMSI (device id and subscriber id), alongwith some other irrelevant information, into a cus-tom configuration class that had been marked asa sink. The app also uses two libraries for crashreporting, one of which, Crittercism, also leaks theIMEI.

Some library, which is unidentifiable due to ob-fuscation, leaks the network operator. It is notlikely to be code written by the app’s developers,because the rest of the app’s code is not obfuscated.

6.1.10 Buienradar

Leaks:

Leak AmountQuery 1On location changed 1Last known location 2OS serial 1

Buienradar is a Dutch app that displays theweather. The app can also show the weather inyour current area if you enable it. As such, it is

logical that the app can “leak” your current lo-cation, because it uses your location to check theweather in the area. This is intended usage. How-ever, the source “on location changed” is leakedby Adgoji, an ad API, which has methods likesetLongitude and setLatitude; it is obvious thatAdgoji is also requesting and leaking the user’s lo-cation aside from Buienradar itself.

An analytics API, comScore, is used in a verylong track that leaks the last known location, so itis possible that it is being leaked by that API too.

Also interesting is that the GSM CID and GSMLAC are also being used (though not leaked).These methods return the cell id, a unique num-ber for the mobile network cell the phone is locatedin (in other words which network tower the phoneis connected to), and the location area code. Thesenumbers are specific to the user’s network operator,and can be used to very roughly determine wherethe user is located. The cell radius for a networkcan range from hundreds of meters to multiple kilo-metres, so it is not going to be very precise. An-other problem with this method is that it requires adatabase of cell ids and area codes, which the net-work operators do not officially supply. There areunofficial databases [16], but those will not haveperfect accuracy. This is the only app in our testset that uses these methods.

6.1.11 Twitter

Leaks:

Leak AmountAccount name 1Account password 1Account user data 7Bluetooth address 1Query inputstream 3Query 4On location changed 1Wifi MAC address 1IMEI 1

The leaking of password and account name isprobably intended usage, because the app needs tosend your username and password to its authenti-cation servers. From the results it seems like theaccount name and password are indeed being usedin this way.

The app uses a crash reporter library calledCrashlytics, which leaks the Bluetooth address.However, to call this method the app requires theBLUETOOTH permission, which the app does not re-quest. The app cannot actually leak the Bluetoothaddress in this way because of this.

22

Page 23: Anaconda: Detecting private-information leaks in Android ... · Figure 1: Android Architecture such as smartphones and tablets. In 2005, Android Inc. was bought by Google, who continued

Twitter also has a feature “Tweet Location”,which adds geolocation data to your tweets. Thissetting is off by default, but Anaconda tracks itanyway regardless of this setting. The app leaks“on location changed”, and uses but does not leakthe last known location. It is unclear due to obfus-cation which of these two methods the app uses forgeolocation when you send a tweet. It seems morelikely to request the last known location upon send-ing a tweet, so it is more likely that that method isused. That raises the question what “on locationchanged” is being used for.

6.1.12 WhatsApp

Leaks:

Leak AmountAccount name 5Account type 6Query inputstream 7Query 38On location changed 2Wifi MAC address 1IMEI 2Network operator 3Sim operator 1

WhatsApp is one of the larger results, with 65instances of leaks, 38 of which come from leakingquery. This also resulted in the largest HTML pagebeing generated by Anaconda, at 421 MB in size.

The Wifi MAC address is leaked by the PayPalAPI, an online payment system, by appending itinto a string. PayPal also leaks the IMEI into astring.

6.1.13 iGroningen

Leaks:

Leak AmountAccount name 1Account password 1Account user data 1Bluetooth address 1Query inputstream 2Query 9Last known location 2Wifi MAC address 1IMEI 3Phone number 1Network country iso 1Network operator 4Sim country iso 1Sim operator 2

iGroningen, an app which gives University ofGroningen related information to students, leakseven more than Sudoku Free; 30 leaks in total. Itis quite difficult to analyse the results manually dueto the heavy obfuscation and the large size of theapp.

Particularly strange is the fact that the app usesand leaks the phone number. What this methodreturns is very device-dependant, but typically itreturns the main phone number (phone number forline 1) on the device. This depends on if the phonenumber is stored on the SIM card, and might faildepending on device and provider. Nevertheless,the app seems to use the phone number, along withthe device name, carrier, OS version and other in-formation in some login-related code.

The leak results for the account user data isa good example of a false positive result given byAnaconda. The leak in this case is found in amethod that makes the following call:

1 p7.equals(this.b.getUserData(v0, ”school id”))== 0

Here, p7 is a string passed to this method andv0 is the data being leaked. In this case, the userdata school id is retrieved and compared with thegiven string. String comparison obviously does notleak data out of the app. The reason this methodleaks is because p7 has been marked as a sink some-where in the app. It is difficult to find where andwhy exactly this was, but in the end it is clear thatthe method does not actually leak.

6.1.14 NS Reisplanner Xtra

Leaks:

Leak AmountAccount name 1On location changed 2Last known location 1

The NS is the “Nederlandse Spoorwegen”, themain railway operator in the Netherlands. TheirAndroid app allows the user to plan their journeyand check the departure times of trains per station.

The NS app also uses the AndroidQuery librarythat gReader uses. In this app, it leaks the accountname again, just like before.

This app requests the ACCESS FINE LOCATION

permission. For a travel planning app this makessense: for example, a user might want to find theearliest leaving train at the train station closest tohis current location. For the most part it seems likethe app is keeping track of the user’s location in or-der to use it elsewhere in the app. However, “onlocation changed” is overridden by AndroidQuery,

23

Page 24: Anaconda: Detecting private-information leaks in Android ... · Figure 1: Android Architecture such as smartphones and tablets. In 2005, Android Inc. was bought by Google, who continued

which uses it to print the user’s location to a logevery time the location changes. This is not as badas sending the location to a server over the inter-net, but once the location is in the log anythingcould happen with it, for example, writing it to afile or sending it over the internet later once theapp closes.

6.2 Resulting HTML

Anaconda outputs its results to an organisedHTML file. Figure 5 shows what this webpage lookslike.

On the left side of the page is a bar withthe list of sinks and the list of sources. Inthis bar, it can be seen that the network op-erator, query inputstream and query are be-ing leaked by the app. Currently the treetrack for Landroid/content/ContentResolver;-

>openInputStream is selected.

At the top right part of the screen the treefor this track is displayed. Here there arefour nodes in this track tree. Currently thecode for the node b()Vstart-register:v1 is se-lected. This can be seen from the method nameabove the Java source code. The comments forthe top level node a(Landroid/net/Uri;)Vstart-register:v? indicates that query inputstream isbeing leaked into the field M of the class Lp/ag;.The three nodes below the top level node in thetree use this field.

The currently selected node can leak the datafrom the field M, which is indicated by the text“Leaks:” appearing before the method name inthe tree. The calls that use the field M can beseen in the Java source code for this method andin the annotated bytecode instructions. Here, theinstruction for this.V.setImageBitmap(this.M);can leak query inputstream through the field M.This is indicated by the comment “Data is put insink!” in the bytecode instructions.

6.3 Discussion of results

Figure 6 shows the amount of times each type ofleak was detected in the apps that we analysed.In general, many apps use and leak query. Thismethod can be used to retrieve data such as yourcontacts and call log, but it could be used in a harm-less way as well. We cannot track exactly what thismethod is being used for in an app; we discuss thisfurther in Section 7.5.

The table in Appendix D shows the number oftimes a source was leaked and the number of timesa source was requested for each of the apps. The

total number of leaks over the total number of ac-cess can be seen as an indicator of how many timesthe apps in general leak private information. Anytime a source is requested, it could potentially beleaked too.

Other popular leaks are user’s account names.This can be explained by the fact that many ofthese apps allow the user to log in to the service;it makes sense that an account name stored on thedevice is being sent to, for example, an authentica-tion server, which is intended usage.

We also analysed what kind of calls are made bythe various ad APIs used by these apps, and howmany times these calls result in leaks. The resultscan be found in Table 2. The most common leaksby these APIs are Last Known Location, Query,Network Operator and IMEI. Developers often usethe IMEI as a unique identifier for the user. Forexample, some apps add it to the list of parameterswhen making a request to a URL.

Anaconda does not track if the methods it de-tects as leaking are actually called by the app. Anapp could have defined a method that requests andleaks the IMEI, but if the app never calls thatmethod your data is not being leaked, even thoughAnaconda would detect that the app leaks.

6.4 Analysis time

In this subsection we will discuss the time efficiencyof Anaconda. The measured time consists of onlythe time to analyse, not the time to decompile theapp and the time to generate the HTML file.

Figure 7 plots the time to analyse the app vs.the size of the app (the size of the classes.dex filein the APK). The y-axis of this plot is in a logarith-mic scale. It looks like the analysis time increaseslinearly with the app size, but there is too muchvariance in the data to get a proper conclusion fromthis data.

Figure 8 is a similar graph that plots the anal-ysis time vs. the number of sources and sinks usedby the app. Here it can be clearly seen that analysistime increases linearly with the number of sourcesand sinks, as is to be expected. There is an inter-esting difference between gReader and WhatsApp:gReader has many more sinks and sources, but ashorter analysis time. An explanation for this couldbe that gReader uses sources and sinks many times,but the actual leak tracks are short resulting in ashort analysis time.

24

Page 25: Anaconda: Detecting private-information leaks in Android ... · Figure 1: Android Architecture such as smartphones and tablets. In 2005, Android Inc. was bought by Google, who continued

Figure 5: Full-sized screenshot of the HTML report

25

Page 26: Anaconda: Detecting private-information leaks in Android ... · Figure 1: Android Architecture such as smartphones and tablets. In 2005, Android Inc. was bought by Google, who continued

Table 2: Amount of times ad-APIs acquire and leak private information per app. a/bmeans: a out of b acquired pieces of information were leaked.

qu

ery

netw

ork

op

era

tor

on

locati

on

chan

ged

locati

on

Wifi

MA

Cad

dre

ss

IME

I

ph

on

enu

mb

er

netw

ork

cou

ntr

yIS

O

SIM

cou

ntr

yIS

O

accou

nt

nam

e

OS

fin

gerp

rint

OS

seri

al

nu

mb

er

SIM

op

era

tor

total

tweakersadmob 0/2 1/1 1/3

sudoku freeadmob 0/2 1/1 1/3mobclix 0/1 0/1 0/3 2/4 1/1 0/2 3/12inneractive 0/1 0/1 0/1 0/3millenialmedia 1/1 1/1 1/1 3/3mopud 0/1 2/2 2/3applovin 1/1 1/1 1/1 1/1 2/2 1/2 1/1 8/9tapjoy 3/6 1/1 1/1 1/1 6/9greystripe 0/1 0/1 0/1 0/3portiflex 0/1 0/1kiip 0/1 0/1 0/2

word searchjumptap 1/1 1/1 1/1 3/3adfonic 0/2 0/2 0/1 0/5inmobi 0/2 2/2 2/4mobfox 4/4 2/2 6/6millenialmedia 1/1 1/1admob 0/2 1/1 1/3

greaderadmob 0/2 1/1 1/3

shazamflurry 1/1 1/1 1/1 3/3admarvel 1/5 0/1 0/1 1/7sessionm 2/4 2/2 2/2 1/1 7/9admob 0/2 1/1 1/3

buienradaradgoji 0/1 1/1 0/1 0/1 1/1 2/5

reddit syncadmob 1/2 1/2

total* 3/23 12/22 3/8 16/24 2/2 10/13 1/3 2/2 2/2 1/2 0/2 2/2 0/1 53/105

26

Page 27: Anaconda: Detecting private-information leaks in Android ... · Figure 1: Android Architecture such as smartphones and tablets. In 2005, Android Inc. was bought by Google, who continued

Figure 6: Histogram of the number of leaks in our test set

0 20 40 60 80 100 120

Account name

Account type

Account password

Account userdata

Bluetooth address

Query inputstream

Query

On location changed

Last known location

Wifi mac address

OS fingerprint

OS serial

IMEI

Phone number

Network country iso

Network operator

Sim country iso

Sim operator

Subscriber id

11

6

2

9

2

13

107

8

21

6

2

2

19

2

3

22

3

3

1

Number of times leaked

Figure 7: Analysis time vs. app size

0 0.5 1 1.5 2 2.5 3 3.5 4 4.5

x 106

100

101

102

103

1

2 3

4

5

6 7

8

9

10

11

12

13

14

APK size in KB

Analy

ze tim

e in s

econds (

log s

cale

)

1: Dropbox

2: zxing

3: Humble Bundle Downloader

4: Tweakers

5: Sudoku Free

6: Reddit Sync

7: Word Search

8: gReader

9: Shazam

10: Buienradar

11: Twitter

12: WhatsApp

13: iGroningen

14: NS

7 Future work

There are still many possibilities to improve ouranalyser, some of which have already been de-scribed in Section 5. Static code analysis is a largefield, with a great number of possibilities for en-hancements. As a result, more improvements canalways be made to find more leaks, reduce the num-ber of false positives or optimise the analysis to

make it faster. Some ideas we came up with arediscussed here and could be investigated further inthe future.

7.1 Inheritance

While we do provide support for inheritance in thecase of function calls while tracking, we are cur-rently not looking for any subclasses of the sources,

27

Page 28: Anaconda: Detecting private-information leaks in Android ... · Figure 1: Android Architecture such as smartphones and tablets. In 2005, Android Inc. was bought by Google, who continued

Figure 8: Analysis time vs. number of sources and sinks

0 50 100 150 200 25010

0

101

102

103

1

23

4

5

6 7

8

9

10

11

12

13

14

Number of sources and sinks

Analy

se tim

e in s

econds (

log s

cale

)

1: Dropbox

2: zxing

3: Humble Bundle Downloader

4: Tweakers

5: Sudoku Free

6: Reddit Sync

7: Word Search

8: gReader

9: Shazam

10: Buienradar

11: Twitter

12: WhatsApp

13: iGroningen

14: NS

sinks, fields, or listeners we search for. Androguardunfortunately does not provide functionality to dothis. It should, however, be relatively simple to ex-tend our current implementation which finds sub-classes to make it usable for these other cases. Itwould allow our system to detect cases where, forexample, a Socket is subclassed, after which infor-mation is send by calling that class’ send meth-ods. These methods then call the Socket class’ sendmethods, leaking the information.

7.2 Permissions

In the results, we have seen many apps using ads.As we have shown, many of the ad APIs providefunctions which need to be called before data isleaked, which set a boolean for example. As weare currently unable to detect whether or not thesefunctions are called, and thus assume they are, falsepositives are introduced. A way to reduce the num-ber of false positives is to check whether or not therequired permission is requested by the app. If, forexample, the ad API is able to leak your location,but only does so when the developer calls a functionto set a boolean to true, a permissions is requiredto request this information. If the permission ismissing, it is impossible for the app to be leakingit. This could be used to filter the leaks dependingon whether the required permission is requested ornot.

7.3 Reflection

Anaconda has no support for reflection. The sim-plest solution would be to assume all functionscalled through reflection which take private infor-

mation as a parameter will leak it or return it ifthey do not. There are however many legitimateusages for reflection in Android, the most impor-tant one being backward compatibility. A bettersolution might be to try and resolve the reflection.While this is a very difficult problem in static anal-ysis, it is theoretically possible to solve when thestrings used are available in the code. These stringscould be used to look up the real function that iscalled. The strings could, however, be obfuscated,as shown in Example 1, or even send to the de-vice over the internet. In these cases it would beimpossible to determine the function being called,but it also almost guarantees something maliciousis being done with the information.

7.4 Intents

As explained in Section 2, intents are messages usedto communicate between processes, either withinthe same app or with another app installed on thephone. Information send with intents is hard tokeep track of. At compile time the receiver maynot even be known, for example, in the case ofa ”ACTION SEND” intent a receiver is selectedby the user. While it is very uncommon to sendprivate information this way, it could be a goodsolution to leak private information for maliciousapp developers trying to avoid detection. Whenthe receiver is explicitly specified and is part of theapp being analysed, this could be tracked in theory.The receiving activity or service can be determined,leading to the function called when the intent is re-ceived. When the function is known, tracking couldcontinue there. This would lead to less false pos-itives than marking any intent containing private

28

Page 29: Anaconda: Detecting private-information leaks in Android ... · Figure 1: Android Architecture such as smartphones and tablets. In 2005, Android Inc. was bought by Google, who continued

information as leaking.

7.5 Content providers

Another feature of Android that we could analysefurther to reduce the amount of false positives isthe content provider. A content provider is a in-terface for data, where the data can be in a num-ber of forms like files or a database. It is oftenused to share data among apps. Information canbe retrieved by performing a query on the contentprovider using a content resolver.

Android provides numerous content providers,among which are content providers containing yourcontacts, your call history and your calendar.Clearly they hold private information. To recogniseif this information is accessed, we look for querieson content providers. We are, however, unable todetect which content provider is referenced. Thisleads to false positives. To reduce the number offalse positives, we could attempt to detect whichcontent provider is accessed. As they are referencesby a constant value, this should in most cases bepossible to detect. It does however require back-tracking to find out which content provider is ac-cessed. We could then only track the queries on thecontent providers containing private information.

8 Related work

8.1 Androguard

Androguard [6] is a combination of a set of tools foranalysing Android apps, written in Python. One ofthe things among what Androguard does is decom-piling Android package files (APKs). It can disas-semble dex files with a number of external tools,or use its own decompiler called DAD (DAD is adecompiler), which is included in Androguard andis also written in Python. Androguard parses thesedex files and forms readable instructions. These areput in a structure of classes, methods and blocks.It then provides an API, which can then be used tobuild your own analyser.

Androguard does not do much analysis by itself.It does however provide you with functionality todirectly search the APK. This allows you for exam-ple to look up usages of methods or fields. Othertools like Androwarn [9] and Andrubis [10] use thisto detect usages of known sources of private infor-mation, like telephone identifiers or the location ofyour phone. They then assume the informationis used in a malicious way. This is because An-droguard does not track information through theapp. While using private information does suggest

that the app could leak the information, it doesnot guarantee this information will actually leavethe phone, or the code in which it happens is evencalled. Especially in ad-APIs it is not uncommonto have functions that could leak private informa-tion, but only do so when called in the code of thedeveloper.

Anaconda goes a step further. Besides justchecking if and where a certain method is called,we then look for a path from this source of infor-mation to a location where it leaves the phone, likea socket. When such a path exists, we know this in-formation can be leaked. This process is explainedin more detail in Realisation (Section 4).

8.2 TaintDroid

While Anaconda is a static analysis tool, it is alsopossible to analyse an Android app dynamically.This dynamic analysis has been done with a toolcalled TaintDroid [7]. TaintDroid is an extension toAndroid which modifies Android’s virtual machineso it can track tainted data as it flows through theapp. The tracking of data is accomplished by giv-ing every piece of data a taint-tag, stating whetheror not the data is tainted with private informa-tion. The nice thing about this dynamic approachis that it is very precise and it can find leakages thatare otherwise hard to detect, for example, when aleaking method is called through reflection or whentainted data is passed from one app to another app.The disadvantage of such a system is that it takes atoll on CPU and RAM usage at app runtime, in thecase of TaintDroid: about 14% CPU overhead andabout 4.4% RAM overhead. Another disadvantageis that such a system can only track the data-flowof an app and not the control-flow, whereas staticanalysis is capable of also tracking the control-flow.

8.3 SAAF

SAAF [11] (Static Android Analysis Framework)is a static analysis tool that uses program slicingto cut away irrelevant parts of the APK, and thenanalyses the remaining code for patterns, to detectif an app is leaking private information. Like Ana-conda, SAAF also analyses smali code. SAAF looksfor certain method usages and uses data flow anal-ysis on the parameters of the methods. By track-ing parameters back SAAF can determine whetherthese are constant values, which is useful to de-termine risk factors of certain function calls. Forexample, a function could be executing a harmlesscall to “ls”, or it might be trying to gain superuserrights with a call to “sudo”. SAAF is thanks to thisalso capable of detecting if, for example, an app is

29

Page 30: Anaconda: Detecting private-information leaks in Android ... · Figure 1: Android Architecture such as smartphones and tablets. In 2005, Android Inc. was bought by Google, who continued

trying to send text messages to a hard-coded SMSnumber, and SAAF can also give you this number.

Although SAAF does some data-flow analysis,it does not try to determine whether actual leakingof private information occurs. Because of this theactual results produced by SAAF are more compa-rable to the results of tools like Androwarn [9], thanto the results of Anaconda.

8.4 Stowaway

Stowaway [17] analyses the permissions an appuses and checks whether or not the app is over-privileged. Requesting more permissions than nec-essary may increase the impact of a bug or a vulner-ability in the app. To find out which permissionsare really needed it statically analyses the app andsearches for API calls. It then compares these APIcalls with the required permissions. This howeveris not as trivial as it sounds. Android documen-tation regarding permissions is limited. Only 78methods for Android 2.2 are documented to needpermissions, while Felt et al. found 1259 API callsto generate a permission check. They constructeda map from this data resulting in a more completelist of required permissions.

9 Conclusion

Many apps have access to private information onyour smartphone, but it is often unclear what ex-actly is done with it. Even when there is a clearlegitimate use for it, there is no guarantee it is notleaked to a third party. With Anaconda we haveshown that many apps leak private information toremote servers. While in some cases this is to beexpected due to the functionality of the app requir-ing the data, in many cases this is clearly not thecase. We found that 7 of the 14 apps we tested tocontain ads, of which each and every single one wasrequesting and leaking private information. For 243of the 572 requests of private information found byAnaconda, a path was found from the source to asink, indicating the information leaves the system.

With the results given by Anaconda, we haveshown that static code analysis is a viable wayto detect private-information leaks in apps. Ana-conda already detects numerous potential leaks,and suggested future improvements will most prob-ably only increase the amount of detected leaks.All in all, Anaconda has proven itself to be a usefultool for analysing Android app behaviour and fordetecting leaks in apps.

References

[1] Gartner. Gartner Says Asia/Pacific LedWorldwide Mobile Phone Sales to Growth inFirst Quarter of 2013. http://www.gartner.

com/newsroom/id/2482816.

[2] WhatsApp is broken, really broken.http://fileperms.org/whatsapp-is-

broken-really-broken/, September 2012.

[3] Yajin Zhou, Xinwen Zhang, Xuxian Jiang,and Vincent W. Freeh. Taming Information-Stealing Smartphone Applications (on An-droid). In TRUST’11 Proceedings of the 4thinternational conference on Trust and trust-worthy computing, pages 93–107, 2011.

[4] Frank Maker and Yu-Hsuan Chan. A Surveyon Android vs. Linux. Technical report, Uni-versity of California, 2009.

[5] Gabor Paller. A list of Dalvik opcodes.http://pallergabor.uw.hu/androidblog/

dalvik_opcodes.html.

[6] Anthony Desnos. Reverse engineering, Mal-ware and goodware analysis of Android ap-plications. https://code.google.com/p/

androguard/.

[7] William Enck, Peter Gilbert, Byung-GonChun, Landon P. Cox, Jaeyeon Jung, PatrickMcDaniel, Anmol N. Sheth, and Leslie Lam-port. TaintDroid: An Information-Flow Track-ing System for Realtime Privacy Monitoringon Smartphones. In 9th USENIX Symposiumon Operating Systems Design and Implemen-tation, 2010.

[8] CERT. Static source code analysistools. https://www.cert.org/secure-

coding/tools.html.

[9] Thomas Debize. Androwarn - Yet anotherstatic code analyzer for malicious Androidapplications. https://github.com/maaaaz/

androwarn.

[10] Andrubis: A Tool for Analyzing UnknownAndroid Applications. http://blog.

iseclab.org/2012/06/04/andrubis-a-

tool-for-analyzing-unknown-android-

applications-2/.

[11] Johannes Hoffmann, Martin Ussath, MichaelSpreitzenbarth, and Thorsten Holz. SlicingDroids: Program Slicing for Smali Code. InSAC’13, March 2013.

30

Page 31: Anaconda: Detecting private-information leaks in Android ... · Figure 1: Android Architecture such as smartphones and tablets. In 2005, Android Inc. was bought by Google, who continued

[12] Kostyantyn Vorobyov and Padmanabhan Kr-ishnan. Comparing Model Checking and StaticProgram Analysis: A Case Study in Error De-tection Approaches. In SSV ’10, 2010.

[13] Andreas Moser, Christopher Kruegel, and En-gin Kirda. Limits of Static Analysis for Mal-ware Detection. In 23rd Annual Computer Se-curity Applications Conference, 2007.

[14] J. Veldthuis, S. Groenewold, and K. L. Winter.Anaconda. https://github.com/KPWhiver/

Anaconda, 2013.

[15] Genina. Privacy Policy for genina.com.http://www.genina.com/pages/privacy/

privacy.jsf.

[16] OpenCellId. http://opencellid.org/.

[17] Adrienne Porter Felt, Erika Chin, SteveHanna, Dawn Song, and David Wagner. An-droid Permissions Demystified. In CCS ’11Proceedings of the 18th ACM conference onComputer and communications security, pages627–638, 2011.

31

Page 32: Anaconda: Detecting private-information leaks in Android ... · Figure 1: Android Architecture such as smartphones and tablets. In 2005, Android Inc. was bought by Google, who continued

Appendix

A Leak type glossary

This is a list of terms used to describe the different kinds of data that was acquired by apps. Please notethat this list only includes data that was actually found by Anaconda as being acquired, other possiblekinds of data exists, but these were never acquired.

Query: A way for an app to get information from a local databases stored on the device. These databasescontain, amongst other things: stored SMS and MMS messages, contacts, bookmarks, and the call log.

Query inputstream: Allows an app to get the same data as with a query, only through an InputStream.

Wifi MAC address: The MAC (media access control) address is a unique number associated with theWifi adaptor of a phone.

Wifi SSID: The SSID (service set identifier) is the name of a Wifi network that the phone can connectto. This information can be accessed by requesting a list of configured Wifi networks, and looking forthis value in that list.

Wifi WEP keys: WEP (wireless equivalent privacy) keys that are used to access configured Wifinetworks on the phone. Just like the SSID, this information can be accessed from a list of configurednetworks.

Bluetooth address: The MAC address of the phone’s Bluetooth adaptor.

OS serial: A hardware serial number.

OS fingerprint: A string that uniquely identifies this build of the OS.

Account user data: Data associated with a user account on the phone. Google has not clearly de-fined what this data should encompass. The user account of this request, and all other account relatedrequests, could be an account made by this application, but it could also be, for example, a Facebook ora Google account.

Account password: The password associated with a user account on the phone.

Account type: The type associated with a user account on the phone.

Account name: The name associated with a user account on the phone.

On accounts updated: A listener that is passed accounts as soon as they are updated, e.g., when anew account is added.

On location changed: A listener that is passed the location of the phone whenever a new location isfound, e.g., through GPS or GSM triangulating.

Last known location: The last known location of the phone.

IMEI: The IMEI (international mobile station equipment identity) is a unique number associated withthe phone, as such it can be used to identify the phone.

Phone number: The phone number associated with the phone.

32

Page 33: Anaconda: Detecting private-information leaks in Android ... · Figure 1: Android Architecture such as smartphones and tablets. In 2005, Android Inc. was bought by Google, who continued

Network operator: The company that provides you access to the current mobile network, for example,T-Mobile and AT&T.

Network country ISO: The country of which the phone is currently using the network.

SIM country ISO: The country in which the SIM’s (subscriber identity module) provider resides.

SIM operator: The company that provided the phone’s SIM.

GSM CID: The GSM CID (cell ID) is a unique number referring to the base transceiver station, whichfacilitates communicating between the phone and a network.

GSM LAC: The GSM LAC (location area code) corresponds to a set of base transceiver stations. Thebase transceiver station identified by the GSM cid is part of this set.

On call state changed: A listener that is passed information when: the phone is being called, thephone is off the hook, or the phone is idle. If possible, an incoming phone number is also passed.

On cell location changed: A listener that is passed information about the new GSM cid and GSMlac whenever it changes.

Subscriber id: This is equivalent to the IMSI (international mobile subscriber identity). The IMSI isa number associated with the network the phone is allowed access to.

B List of recognised sinks

1 # These functions return sinks2 java.net.DatagramSocketImplFactory −> createDatagramSocketImpl3 java.net.http.AndroidHttpClient −> newInstance4 android.net.http.HttpResponseCache −> install5 android.net.http.HttpResponseCache −> getInstalled6

7 javax.net.ssl.SSLSocket −> getOutputStream8

9 java.net.URLStreamHandler −> openConnection10 java.net.URL −> openConnection11

12 java.nio.channels.ServerSocketChannel −> accept13 java.net.ServerSocket −> accept14 java.nio.channels.DatagramChannel −> socket15 java.nio.channels.DatagramChannel −> connect16 java.nio.channels.DatagramChannel −> disconnect17 java.nio.channels.DatagramChannel −> open18 java.nio.channels.SocketChannel −> open19

20 java.sql.DriverManager −> getConnection21

22 org.apache.http.conn.ClientConnectionRequest −> getConnection23 org.apache.http.impl.conn.SingleClientConnManager −> getConnection24 org.apache.http.impl.conn.AbstractClientConnAdapter −> getWrappedConnection25

26 java.net.Socket −> getOutputStream27 java.net.SocketImpl −> getOutputStream28 android.bluetooth.BluetoothSocket −> getOutputStream29

30 android.nfc.NfcAdapter −> getDefaultAdapter

33

Page 34: Anaconda: Detecting private-information leaks in Android ... · Figure 1: Android Architecture such as smartphones and tablets. In 2005, Android Inc. was bought by Google, who continued

31

32 # These classes are sinks33 java.net.DatagramSocket34 java.net.DatagramSocketImpl35 java.net.MulticastSocket36

37 java.io.FileOutputStream (indirect sink)38 java.io.FileWriter (indirect sink)39

40 java.net.HttpURLConnection41 java.net.URLConnection42

43 android.net.http.AndroidHttpClient44

45 java.nio.channels.DatagramChannel46 java.nio.channels.SocketChannel47

48 javax.net.ssl.HttpsURLConnection49

50 org.apache.http.impl.AbstractHttpClientConnection51 org.apache.http.impl.AbstractHttpServerConnection52 org.apache.http.impl.DefaultHttpClientConnection53 org.apache.http.impl.DefaultHttpServerConnection54 org.apache.http.impl.SocketHttpClientConnection55 org.apache.http.impl.SocketHttpServerConnection56

57 org.apache.http.impl.client.DefaultHttpClient58 org.apache.http.impl.client.AbstractHttpClient59 org.apache.http.impl.client.DefaultRequestDirector60 org.apache.http.impl.client.BasicResponseHandler61

62 org.apache.http.impl.conn.AbstractClientConnAdapter63 org.apache.http.impl.conn.AbstractPooledConnAdapter64 org.apache.http.impl.conn.tsccm.BasicPooledConnAdapter65 org.apache.http.impl.conn.SingleClientConnManager.ConnAdapter66 org.apache.http.impl.conn.DefaultClientConnection67 org.apache.http.impl.conn.DefaultClientConnectionOperator

C List of recognised sources

1 # These methods are sources2 android.accounts.AccountManager −> getPassword3 android.accounts.AccountManager −> getUserData4

5 android.accounts.AccountManagerService −> getPassword6 android.accounts.AccountManagerService −> getUserData7

8 android.accounts.IAccountManager$Stub$Proxy −> getPassword9 android.accounts.IAccountManager$Stub$Proxy −> getUserData

10

11 android.accounts.Account −> toString12

13 android.bluetooth.BluetoothAdapter −> getAddress14 android.bluetooth.BluetoothAdapter −> getName15 android.bluetooth.BluetoothDevice −> getName16 android.bluetooth.BluetoothDevice −> getAddress17 android.bluetooth.BluetoothDevice −> toString18

34

Page 35: Anaconda: Detecting private-information leaks in Android ... · Figure 1: Android Architecture such as smartphones and tablets. In 2005, Android Inc. was bought by Google, who continued

19 android.bluetooth.IBluetooth$Stub$Proxy −> getAddress20 android.bluetooth.IBluetooth$Stub$Proxy −> getName21 android.bluetooth.IBluetooth$Stub$Proxy −> getRemoteName22

23 android.server.BluetoothService −> getAddress24 android.server.BluetoothService −> getAddressFromObjectPath25 android.server.BluetoothService −> getName26 android.server.BluetoothService −> getRemoteName27

28 android.location.LocationManager −> getLastKnownLocation29 android.location.ILocationManager$Stub$Proxy −> getLastKnownLocation30

31 android.telephony.gsm.GsmCellLocation −> getCid32 android.telephony.gsm.GsmCellLocation −> getLac33 android.telephony.gsm.GsmCellLocation −> getPsc34 android.telephony.gsm.GsmCellLocation −> toString35

36 android.telephony.cdma.CdmaCellLocation −> getSystemId37 android.telephony.cdma.CdmaCellLocation −> getNetworkId38 android.telephony.cdma.CdmaCellLocation −> getBaseStationLongitude39 android.telephony.cdma.CdmaCellLocation −> getBaseStationLatitude40 android.telephony.cdma.CdmaCellLocation −> getBaseStationId41 android.telephony.cdma.CdmaCellLocation −> toString42

43 android.telephony.NeighboringCellInfo −> getCid44 android.telephony.NeighboringCellInfo −> getLac45 android.telephony.NeighboringCellInfo −> getPsc46 android.telephony.NeighboringCellInfo −> getRssi47 android.telephony.NeighboringCellInfo −> toString48

49 com.android.internal.telephony.ITelephony$Stub$Proxy −> getCellLocation50

51 android.net.wifi.WifiConfiguration −> toString52

53 android.net.wifi.ScanResult −> toString54

55 android.net.wifi.WifiInfo −> getBSSID56 android.net.wifi.WifiInfo −> getIpAddress57 android.net.wifi.WifiInfo −> getMacAddress58 android.net.wifi.WifiInfo −> getSSID59 android.net.wifi.WifiInfo −> toString60

61 android.provider.Browser −> getAllBookmarks62 android.provider.Browser −> getAllVisitedUrls63 android.provider.Browser −> getVisitedHistory64 android.provider.Browser −> getVisitedLike65

66 android.provider.Calendar$CalendarAlerts −> query67 android.provider.Calendar$Calendars −> query68 android.provider.Calendar$EventDays −> query69 android.provider.Calendar$Events −> query70 android.provider.Calendar$Instances −> query71

72 android.provider.CalendarContract.Attendees −> query73 android.provider.CalendarContract.EventDays −> query74 android.provider.CalendarContract.Reminders −> query75

76 android.provider.CallLog$Calls −> getLastOutgoingCall77 android.provider.Contacts$People −> loadContactPhoto78 android.provider.Contacts$People −> openContactPhotoInputStream79 android.provider.Contacts$People −> queryGroups

35

Page 36: Anaconda: Detecting private-information leaks in Android ... · Figure 1: Android Architecture such as smartphones and tablets. In 2005, Android Inc. was bought by Google, who continued

80 android.provider.ContactsContract$Contacts −> openContactPhotoInputStream81

82 com.android.internal.telephony.CallerInfo −> getCallerId83

84 android.provider.Telephony$Mms −> query85 android.provider.Telephony$Sms −> query86

87 android.telephony.TelephonyManager −> getDeviceId88 android.telephony.TelephonyManager −> getLine1Number89 android.telephony.TelephonyManager −> getSimSerialNumber90 android.telephony.TelephonyManager −> getSubscriberId91 android.telephony.TelephonyManager −> getNetworkCountryIso92 android.telephony.TelephonyManager −> getNetworkOperator93 android.telephony.TelephonyManager −> getNetworkOperatorName94 android.telephony.TelephonyManager −> getSimCountryIso95 android.telephony.TelephonyManager −> getSimOperator96 android.telephony.TelephonyManager −> getSimOperatorName97 android.telephony.TelephonyManager −> getSimSerialNumber98

99 com.android.internal.telephony.IPhoneSubInfo$Stub$Proxy −> getDeviceId100 com.android.internal.telephony.IPhoneSubInfo$Stub$Proxy −> getDeviceSvn101 com.android.internal.telephony.IPhoneSubInfo$Stub$Proxy −> getIccSerialNumber102 com.android.internal.telephony.IPhoneSubInfo$Stub$Proxy −> getLine1Number103 com.android.internal.telephony.ISms$Stub$Proxy −> getAllMessagesFromIccEf104

105 android.content.ContentResolver −> openTypedAssetFileDescriptor106 android.content.ContentResolver −> openInputStream107 android.content.ContentResolver −> openFileDescriptor108 android.content.ContentResolver −> query109

110 android.app.ContextImpl$ApplicationContentResolver −> openInputStream111 android.app.ContextImpl$ApplicationContentResolver −> openTypedAssetFileDescriptor112 android.app.ContextImpl$ApplicationContentResolver −> openFileDescriptor113 android.app.ContextImpl$ApplicationContentResolver −> query114

115 # These fields are sources116 android.accounts.Account −> name117 android.accounts.Account −> type118

119 android.app.ActivityManager.RecentTaskInfo −> description120 android.app.ActivityManager.RecentTaskInfo −> origActivity121

122 android.app.ActivityManager.RunningAppProcessInfo −> processName123 android.app.ActivityManager.RunningAppProcessInfo −> importanceReasonComponent124

125 android.app.ActivityManager.RunningServiceInfo −> process126 android.app.ActivityManager.RunningServiceInfo −> service127

128 android.app.ActivityManager.RunningTaskInfo −> baseActivity129 android.app.ActivityManager.RunningTaskInfo −> description130 android.app.ActivityManager.RunningTaskInfo −> topActivity131

132 android.net.wifi.WifiConfiguration −> SSID133 android.net.wifi.WifiConfiguration −> BSSID134 android.net.wifi.WifiConfiguration −> preSharedKey135 android.net.wifi.WifiConfiguration −> wepKeys136

137 android.net.wifi.ScanResult −> BSSID138 android.net.wifi.ScanResult −> SSID139

140 android.os.Build −> FINGERPRINT

36

Page 37: Anaconda: Detecting private-information leaks in Android ... · Figure 1: Android Architecture such as smartphones and tablets. In 2005, Android Inc. was bought by Google, who continued

141 android.os.Build −> SERIAL142

143 # These listeners are sources, the first listed function is the function that adds the listener,144 # the ”listener:” functions are the actual listeners145 android.accounts.AccountManager −> addOnAccountsUpdatedListener(Landroid.accounts.

OnAccountsUpdateListener; Landroid.os.Handler; Z)146 listener: android.accounts.OnAccountsUpdateListener −> onAccountsUpdated147

148 android.telephony.TelephonyManager −> listen(Landroid.telephony.PhoneStateListener; I)149 listener: android.telephony.PhoneStateListener −> onCallStateChanged150 listener: android.telephony.PhoneStateListener −> onCellInfoChanged151 listener: android.telephony.PhoneStateListener −> onCellLocationChanged152

153 android.location.LocationManager −> requestLocationUpdates(J F Landroid.location.Criteria; android.location.LocationListener; Landroid.os.Looper;)

154 listener: android.location.LocationListener −> onLocationChanged155

156 android.location.LocationManager −> requestLocationUpdates(Ljava.lang.String; J F Landroid.location.LocationListener;)

157 listener: android.location.LocationListener −> onLocationChanged158

159 android.location.LocationManager −> requestLocationUpdates(Ljava.lang.String; J F Landroid.location.LocationListener; Landroid.os.Looper;)

160 listener: android.location.LocationListener −> onLocationChanged161

162 android.location.ILocationManager$Stub$Proxy −> requestLocationUpdates(Ljava.lang.String; J F Landroid.location.LocationListener;)

163 listener: android.location.LocationListener −> onLocationChanged164

165 android.location.LocationManager −> requestLocationUpdates(Ljava.lang.String; J F Landroid.location.LocationListener; Landroid.os.Looper;)

166 listener: android.location.LocationListener −> onLocationChanged167

168 android.provider.Browser −> requestAllIcons(Landroid.content.ContentResolver; Ljava.lang.String; Landroid.webkit.WebIconDatabase.IconListener;)

169 listener: android.webkit.WebIconDatabase$IconListener −> onReceivedIcon170 listener: android.webkit.WebIconDatabase$IconListener −> onReceivedIcon

37

Page 38: Anaconda: Detecting private-information leaks in Android ... · Figure 1: Android Architecture such as smartphones and tablets. In 2005, Android Inc. was bought by Google, who continued

D Leak occurence tables

Table 3: Amount of times apps acquire and leak private information. a/b means: a outof b acquired pieces of information were leaked.

accou

nt

use

rd

ata

accou

nt

pass

word

accou

nt

nam

e

accou

nt

typ

e

on

accou

nts

up

date

d

qu

ery

qu

ery

inp

uts

tream

Wifi

MA

Cad

dre

ss

Wifi

SS

ID

Wifi

WE

Pkeys

OS

seri

al

OS

fin

gerp

rint

sub

scri

ber

id

Dropbox 1/2 0/2 0/1 5/7 1/3 0/1zxing 0/2 0/1 0/5 0/1Humble Bundle 0/1Tweakers 0/2Sudoku Free 1/2 1/6 2/2 1/1 0/2Reddit Sync 2/3 1/2Word Search 0/5gReader 2/6 0/1 28/177 0/2Shazam 0/1 19/27 2/3 1/1Buienradar 1/2 0/2 1/3 0/1Twitter 7/7 1/1 1/22 0/4 4/8 3/3 1/1WhatsApp 5/11 6/11 38/48 7/10 1/2 0/2iGroningen 1/1 1/1 1/1 9/9 2/2 1/1NS 1/4 0/1 0/1 0/1total 9/10 2/2 12/53 6/13 0/5 107/298 13/23 6/9 0/5 0/1 2/5 2/6 1/3

Blu

eto

oth

ad

dre

ss

on

locati

on

chan

ged

last

kn

ow

nlo

cati

on

GS

MC

ID

GS

ML

AC

IME

I

on

call

state

chan

ged

on

cell

locati

on

chan

ged

ph

on

enu

mb

er

netw

ork

cou

ntr

yIS

O

netw

ork

op

era

tor

sim

cou

ntr

yIS

O

SIM

op

era

tor

total

Dropbox 1/2 8/18zxing 0/9Humble Bundle 0/1Tweakers 0/1 0/1 1/1 1/5Sudoku Free 0/4 6/10 5/7 1/3 1/1 5/10 2/2 25/50Reddit Sync 0/1 1/1 4/7Word Search 1/1 7/9 3/3 2/4 0/1 13/23gReader 1/1 31/187Shazam 1/2 3/5 5/6 1/1 4/6 37/53Buienradar 1/1 2/3 0/1 0/1 0/1 0/2 5/17Twitter 1/1 1/1 0/1 1/1 20/50WhatsApp 2/2 2/4 0/2 0/1 0/1 0/4 3/5 0/1 1/2 65/108iGroningen 1/1 2/2 3/3 1/1 1/1 4/4 1/1 2/2 30/30NS 2/2 1/5 4/14total 2/2 8/14 21/36 0/1 0/1 19/26 0/2 0/1 2/5 3/7 22/36 3/5 3/4 243/572

38