Security in mobile banking apps

Security in mobile bankingapplications

Alexandre Teyar

Tutor: Prof. Abdelmalek Benzekri

Supervisors:

Prof. Christian Rohner

Prof. Thiemo Voigt

A thesis presented for the degree of

Master Engineering

Department of Information TechnologyUppsala University

SwedenSeptember 18, 2015

Security in mobile banking applications

2 Chapter 0 Alexandre Teyar


Acknowledgment

On the very outset of this thesis report, I would like to express my special gratitude toMr. Christian Rohner and Mr. Thiemo Voigt who gave me the great opportunity to cometo Sweden in order to pursue my master thesis within the Department of InformationTechnology in Uppsala University. I would also like to thank them for theirs guidance andsupervision throughout the master thesis period and for theirs support in completing theproject.

I express my profound and sincere gratitude to Mr. Abdelmalek Benzekri for hissupport and understanding of the issues that I encountered during my stay in Uppsala.

I extend my acknowledgements to Mrs. Jackie Leroux and Mrs. Anna-Lena Forsbergfor theirs involvement in my administrative procedures. And for the support that Anna-Lena shown in my constant quest of finding accommodations.

I also acknowledge with deep sense of reverence my parents and members of my family,who have always supported me.

At last but not least, my thanks and appreciations go to all the Department of Infor-mation Technology working staff, who made me feel comfortable among them during mystay in Uppsala University.

Chapter 0 Alexandre Teyar 3


Alexandre Teyar

Abstract

It has become a habit for more and more users to manage theirs banking accountsusing dedicated banking applications on their smart-phones. Consequently, necessaryprecautions to protect sensitive data including the banking credentials from cyber criminalsattacks must be taken to guarantee secure and safe communications.

The purpose of this thesis is to examine and evaluate the wireless security of mobileapplication communications with a focus on banking applications.

We have seen in a large static code analysis that there is wide spread security flawsregarding the wireless communications of mobile applications. We have also saw that thetool that performs this static code analysis generates a significant number of false-positives.In this thesis, we try to reduce that number. We answer to this issue with a new methodof analysis that includes a dynamic code analysis and a manual log files analysis. For thatwe trace the applications runtime method calls (dynamic code analysis) and then we havea procedure to identify the critical functions involved in the (in)security of the applicationswireless communications based on those traces (manual log files analysis).

Resume

C’est devenu une habitude pour de plus en plus d’utilisateurs de gerer leurs comptesbancaires en utilisant des applications bancaires dedies sur leurs telephones mobiles. Parconsequent, des precautions necessaires afin de proteger les donnees sensibles, incluant lesidentifiants bancaires d’attaques de cyber-criminels doivent etre prises pour garantir descommunications securisees et sures.

Le but de cette these est d’examiner et d’evaluer la securite des communications sansfil d’applications mobiles avec un accent particulier sur les applications bancaires.

Nous avons vu dans une large analyse de code statique qu’il ya de d’importantes faillesde securite concernant les communications sans fil d’applications mobiles. Nous avonsegalement vu que l’outil qui effectue cette analyse statique de code genere un bon nombrede faux positifs. Dans cette these, nous essayons de reduire ce nombre. Nous repondonsa cette problematique avec une nouvelle methode d’analyse qui comprend une analyse decode dynamique et une analyse manuelle de fichiers journaux. Pour cela nous traconsles appels des methodes d’applications en cours d’execution (analyse de code dynamique)ensuite nous disposons d’une procedure pour identifier les fonctions critiques impliquesdans la securite des communication sans fil d’applications mobiles base sur ces fichiersjournaux (analyse manuelle de fichiers journaux).

4

Contents

1 Introduction 8

2 Background and Related Work 102.1 Secure connection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

2.1.1 Certificates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102.1.2 SSL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112.1.3 Man in the middle attack . . . . . . . . . . . . . . . . . . . . . . . . 13

2.2 SSL implementation in the Android’s platform[4] . . . . . . . . . . . . . . . 142.2.1 Pinning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162.2.2 Blacklisting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

2.3 Android Platform[7][8] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162.3.1 Linux Kernel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172.3.2 Libraries and Android Runtime . . . . . . . . . . . . . . . . . . . . . 172.3.3 Application Frameworks . . . . . . . . . . . . . . . . . . . . . . . . . 182.3.4 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

2.4 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

3 Static code analysis on the Google Play Store 223.1 Analysis context . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

3.1.1 Challenges . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 233.2 Static code analysis[12] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 233.3 Outcomes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 253.4 Manual analysis of insecure banking applications . . . . . . . . . . . . . . . 28

4 Identifying the critical code section in the certificate validation process 304.1 Reverse engineering[13] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 304.2 Generation of the applications traces . . . . . . . . . . . . . . . . . . . . . . 314.3 SSL certificate validation critical code section identification . . . . . . . . . 34

5 Conclusion 365.1 My feedback . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

A Appendix Graphs 39

B Appendix Tables 41

5

List of Figures

2.1 The structure of a certificate . . . . . . . . . . . . . . . . . . . . . . . . . . 112.2 HTTP vs HTTPS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122.3 SSL handshake . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122.4 Man in the middle attack in a SSL context . . . . . . . . . . . . . . . . . . 142.5 Browser certificate warning message . . . . . . . . . . . . . . . . . . . . . . 142.6 Interfaces of the javax.net.ssl library . . . . . . . . . . . . . . . . . . . . . . 152.7 Android stack . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162.8 Android Sandboxing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172.9 Compilation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 182.10 Android Application Framework Bundles . . . . . . . . . . . . . . . . . . . 182.11 Android Activity Lifecycle . . . . . . . . . . . . . . . . . . . . . . . . . . . . 192.12 Eclipse Project Explorer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 202.13 The structure of an APK . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

3.1 Overview of the hermes script . . . . . . . . . . . . . . . . . . . . . . . . . . 243.2 Types of verifiers observed . . . . . . . . . . . . . . . . . . . . . . . . . . . . 253.3 Applications containing a bad verifier grouped by the year of publication

or last updated, and percentage of such applications in that group . . . . . 263.4 Applications containing a bad verifier grouped by download, and percentage

of such applications in that group . . . . . . . . . . . . . . . . . . . . . . . . 273.5 Applications containing a bad verifier grouped by rating, and percentage of

such applications in that group . . . . . . . . . . . . . . . . . . . . . . . . . 273.6 (In)Security cases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

4.1 Decompiling process of an Android app. . . . . . . . . . . . . . . . . . . . . 314.2 Disassembly process of an Android app. . . . . . . . . . . . . . . . . . . . . 314.3 Overview of the smali-code-injector script . . . . . . . . . . . . . . . . . . . 324.4 A method call entry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 324.5 Application trace . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 334.6 Intra-method calls concept . . . . . . . . . . . . . . . . . . . . . . . . . . . . 334.7 Indented trace . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 334.8 The manual analysis procedure . . . . . . . . . . . . . . . . . . . . . . . . . 354.9 Dynamic code analysis tool chain . . . . . . . . . . . . . . . . . . . . . . . . 35

A.1 Usage of the Internet permission . . . . . . . . . . . . . . . . . . . . . . . . 39A.2 Distribution of verifier types over categories. . . . . . . . . . . . . . . . . . . 40

6

List of Tables

B.1 Distribution of applications with internet permission over categories . . . . 41B.2 Distribution of applications over verifiers types . . . . . . . . . . . . . . . . 42B.3 Applications with bad verifier grouped by the year they were published or

last updated, and share of such applications in that group . . . . . . . . . . 42B.4 Applications with bad verifier grouped by their download number, and share

of such applications in that group . . . . . . . . . . . . . . . . . . . . . . . . 42B.5 Applications with bad verifier grouped by their rating, and share of such

applications in that group . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

7

Chapter 1

Introduction

Nowadays, smart-phones can be viewed as ”pocket computers” or ”digital wallets”. Wecarry them with us almost every day and their processing power and memory capacityhave increased to a point that they are now able to run any kind of applications.

They also contain personal and sensitive data including but not limited to the real timeuser location, passwords and banking credentials. Moreover, smart-phones are designed tobe nomad devices that connect from a network to an other. Including untrusted networkswhere theirs data are freely exposed to wireless attacks making them an easy and privilegedtarget for cyber criminals.

The purpose of this thesis is to examine and evaluate the wireless security of mobileapplication communications with a focus on banking applications which are critical finan-cial applications. This thesis uses the Android platform which is currently the world mostused smart-phone operating system with 84.6 percent share of global smart-phone ship-ments in 2014, according to research by Strategy Analytics. And over 1,400,000 availableapplications as of November 2014, according to AppBrain Stats.

A key of the Android smart-phones success resides in the Google Play Store and An-droid software development which are relatively open and unrestricted (see Chapter 3.Everyone can publish an application on the Google Play Store without any strong verifi-cation from Google. This offers both developers and users more flexibility and freedom,but also creates significant security challenges.

The Android ecosystem is all about communicating, a large number of mobile appli-cations uses web-services or are client-server based. This is motivated by the developerswill to extend their applications functionalities and to provide custom services for theirclients.

Applications can communicate using the HTTP protocol, which provides no securityat all and makes it easy to intercept data. But they can also use the HTTPS protocol,which basically is HTTP over SSL. In theory HTTPS makes it harder, if not impossible,to intercept data.

However, in 2012, Fahl et al. performed an analysis on 13,500 applications in theAndroid Market (known today as the Google Play Store) and they found out that 1,074applications make an incorrect use of HTTPS configurations which either accepts anycertificate or any certificate signed by a trusted Certificate Authority.

It is in this context that we decided to write this thesis on the ”Security in mobilebanking applications”. In this thesis, the security aspect of mobile applications is limitedto the wireless communication of Android devices. In other words, we study the securitythreats posed by the misuse of SSL in Android wireless communications with a focus onthe banking applications.

We provide an overview on the technologies that will be used in this thesis as well asthe related works that have been done on the Android SSL security topic in the Chapter

8


Background and Related Work.In Chapter Static code analysis on the Google Play Store, we perform a static

code analysis on 4.795 different applications from the Google Play Store to detect whetheror not they implement broken certificate verifiers that may make them vulnerable to wire-less attacks. We also manually analyse some banking applications that have been labelledas vulnerable to wireless attacks by the previous static code analysis. This manual analysisconsists in real situation Man In The Middle Attacks to check the banking applicationswireless security. This static code analysis highlighted its untrustworthy nature since themanual analyses of the selected banking applications that followed shown that they are infact secure to wireless attacks even though they contain incorrect certificate verifiers.

We noticed a high false-positive rate of the static code analysis. With the goal toimprove the analysis we did an additional code analysis. In chapter Identifying thecritical code section in the certificate validation process, we aim to identify thecritical code section in applications certificates validation process using dynamic codeanalysis. We also attempt to automatize this finding of applications certificates validationcritical code section to provide a new method of evaluating the wireless security of Androidapplications.

We finally conclude in chapter Conclusion with the contributions and work thatimprovement that can be done to complete this dynamic code analysis method that westarted to develop.


Chapter 2

Background and Related Work

This background section reviews the main concepts related to the establishment of secureconnections and theirs different types of implementation in the Android’s platform. Wealso provide a quick introduction to the Android’s platform stack.

2.1 Secure connection

A secure connection ensures the protection of the communication between communicatingparties from being eavesdropped or tampered through a man in the middle attack. Asecure connection relies on the SSL protocol1 which uses a certificate system to guaranteeauthentication, data integrity, and data confidentiality.

2.1.1 Certificates

Authentication is a central concern in the SSL protocol that is ensured by the use of digitalcertificates. ”Authentication is the process of determining whether someone or somethingis, in fact, who or what it is declared to be.” [1]

”A digital certificate certifies the ownership of a public key by the named subject of thecertificate” [2], but also a number of fields and attributes whether mandatory or optionalrelated to its identity (see Figure 2.1). In short a digital certificate can be viewed as asort of ”identity card”. This allows others (relying parties) to rely upon signatures or onassertions made by the private key that corresponds to the certified public key.

1https://tools.ietf.org/html/rfc6101

10


Figure 2.1: The structure of a certificate

Source: https://i-technet.sec.s-msft.com/dynimg/IC196464.gif

The X.509 certificates model2 is the most commonly used certificate model, nonethelessthe OpenPGP certificate model is also supported in SSL.

A X.509 certificates can either be self-signed or signed by a Certificate Authority (refersto Section 2.1.1). In the case that the other party’s certificate is self-signed, the verifierneeds to have this self-signed certificate hard coded into its trust pool. In all other casethe verifier needs to trust the CA which signed the other party’s certificate, or a CA whichsigned the intermediate CA (chain of trust).

Certificate Authorities

”In cryptography, a Certificate Authority or Certification Authority (CA) is an entity thatissues digital certificates” [2]. A CA is a trusted third party trusted both by the subject(owner) of the certificate and by the party relying upon the certificate. Usually, clientsoftware, for example, browsers include a set of trusted CA certificates.

2.1.2 SSL

This section reviews the Secure Sockets Layer (SSL) protocol and how it is used in theHTTPS protocol.

HTTP is a protocol3 for sending requests and receiving answers, each request andanswer consisting of detailed headers and (possibly) some content. The HTTP traffic isunencrypted, all the data are sent in plain-text which allows an attacker to intercept andmodify it without being detected by any of the participants involved in the communication.Therefore the HTTP protocol is vulnerable to eavesdropping and tampering.

To deal with these issues HTTPS protocol4 was formed in 1994 by Netscape Commu-nications and became a standard in May 2000. HTTP is meant to run over a bidirectionaltunnel for arbitrary binary data; when that tunnel is an SSL/Transport Layer Socket

2https://tools.ietf.org/html/rfc52803https://tools.ietf.org/html/rfc72304https://tools.ietf.org/html/rfc2660



(TLS) connection assuring the security of the communication between two hosts, then thewhole thing is called ”HTTPS”.

Figure 2.2: HTTP vs HTTPS

Source: https://www.instantssl.com/images/http-vs-https.png

The term SSL and TLS5 are often used interchangeably or in conjunction with eachother (SSL/TLS) as TLS is the normalized successor of SSL.

SSL handshake[3]

Before the client and the server can begin to exchange application data over SSL, theencrypted tunnel must be negotiated: the client and the server must agree on the versionof the SSL protocol, choose the cipher-suite, and verify certificates if necessary. Theagreement upon these parameters can be compared to a handshake, therefore it is calledSSL handshake (see Figure 2.3).

Figure 2.3: SSL handshake

Source: https://www.identrustssl.com/images/learn ssl diagram.gif

5https://www.ietf.org/rfc/rfc5246.txt



• The client sends a ClientHello specifying information such as the highest supportedTLS version, a random number, a list of suggested cipher suites and compressionmethods.

• The server answer with a ServerHello containing the chosen protocol version, a ran-dom number, cipher suite and compression method from the choices offered by theclient. The server may also send its X.509 Certificate6 message and ServerKeyEx-change (depending on the selected cipher suite, this may be omitted by the server).When the server is done with the handshake negociation it sends a ServerHel-loDone.

• The client may responds with a ClientKeyExchange message, which may containa PreMasterSecret, public key, or nothing. (Again, this depends on the selectedcipher.) This PreMasterSecret is encrypted using the public key of the server cer-tificate.

• After this negotiation phase, both the server and client compute a common secretkey called the MasterSecret using the random number and PreMasterSecret.

During the second phase the client sends a ChangeCipherSpec message whichbasically means: ”Everything I tell you from now on will be authenticated (andencrypted if encryption parameters were present in the server certificate)”.

• Finally, the client sends an authenticated and encrypted Finished message, con-taining a hash and MAC over the previous handshake messages.

• The server then follows the same procedure.

This is an example of the one-way SSL - only the server authenticates itself with acertificate. There is also the two-way SSL - the client has to authenticated (in addition tothe server) with a certificate. The one-way SSL is the most used SSL handshake method.

At this point, the ”handshake” is complete and both parties can start to send appli-cation data securely (encrypted and with authentication). However, if an error happensduring the SSL handshake like an error in the client received server’s certificate verificationfor instance, the SSL handshake is aborted.

2.1.3 Man in the middle attack

A man in the middle attack (MITMA) is an attack where the attacker is in a positionto intercept the messages sent between two parties who believe they are directly commu-nicating with each other (see Figure 2.4). In a passive MITMA, the attacker can onlyeavesdrop (spy) on the communication, and in an active MITMA, the attacker can alsotamper (alter) with the communication.

6https://tools.ietf.org/html/rfc5280



Figure 2.4: Man in the middle attack in a SSL context

Source: http://www.consumer.ftc.gov/sites/default/files/blog/spoofed-security-certificate.png

MITMAs against mobile devices are somewhat easier to execute than against tradi-tional desktop computers, since the use of mobile devices frequently occurs in changingand untrusted networks.

Specifically, the use of open access points and evil twin attacks make MITMAs againstmobile devices a serious threat not to mention other tricks such as DNS or ARP poisoningthat can be performed to obtain a MITM position within a given network.

In addition, most of the effective defences against MITMAs can only be found onrouter or server-side.

A failure in a certification validation may indicates that someone is eavesdropping/tamperingthe communication (see Figure 2.5). In this case the peer gets the attacker’s certificateinstead of the other peer’s certificate (see Figure 2.4. However, a success in the certificatevalidation does not always imply that the communication is safe.

Figure 2.5: Browser certificate warning message

Source: http://img15.hostingpics.net/pics/509896certwarning.png

2.2 SSL implementation in the Android’s platform[4]

It’s possible that an application might use SSL incorrectly such that malicious entities maybe able to intercept an application’s data over the network (refers Section 2.1.3). This isa consequence of the possibility offered to change the native implementation of the SSLinterface in the javax.net.ssl library extensively used by Android (see Figure 2.6).



Figure 2.6: Interfaces of the javax.net.ssl library

Source: http://oi57.tinypic.com/2s775z6.jpg

The two main Java entities responsible for the certificate validation on Android clientsare:

• TrustManagers: ”TrustManagers are responsible for managing the trust mate-rial that is used when making trust decisions, and for deciding whether credentialspresented by a peer should be accepted.”[5]

• HostnameVerifier: ”Verifies that the specified hostname is allowed within thespecified SSL session.”[6]

This native implementation is by nature non vulnerable to MITMAs. However Androiddevelopers must use a custom implementation to use self-signed, non trusted or non knowncertificates. The main reason why developers use self-signed certificates is because beingcertified by a trusted CA cost money and most of the android developers do not want tospend money when they are not even sure about the profitability of their applications. Thesafe method to use a self-signed certificate is to add the CA used to sign the self-signedcertificate in the application trusted CAs. This method is known as certificate pinning(see 2.2.1). This method requires to implement custom verifiers. Therefore, a customverifier is not always equal to an insecure application, the actual problem is that therestill are developers who are not aware of those methods and chose the easy way to acceptany certificate without any verification.

There are implementations of the SSL interface that are very dangerous for applicationsrunning on non trusted networks because they open the door for MITMA by theirs lackof security. These implementations that offers no security are the following:

• The use of trust managers that do not check certificate chains from remote servers,making it possible for an MITMA to succeed.

Verifying certificates to ensure that they are signed by a known and trusted Certi-fying Authority (CA) is an integral part of certificate- based, client-server commu-nication.

• The replacement of platform hostname verifiers by application hostname verifiersthat do not verify the hostname of the remote server.

Having a trust manager that checks certificates is not sufficient in this case, as theattacker may have a certificate signed by a trusted certifying authority and maypresent a valid certificate chain. Therefore, to prevent a MITMA, the hostname ofthe server extracted from the CA-issued certificate must match the hostname of theserver the application that intends to connect.



• Applications ignoring SSL errors when they use WebKit to render server pages inmobile applications.

With server communications that use SSL/TLS, any errors generated should becaught. Otherwise we open up the vulnerable applications to MITM attacks thatmay exploit vulnerabilities such as Javascript Binding Over HTTP (JBOH).

2.2.1 Pinning

”An application can further protect itself from fraudulently issued certificates by a tech-nique known as pinning”[4]. This technique basically restricts ”an application’s trustedCAs to a small set known to be used by the application’s servers. This prevents the com-promise of one of the other 100+ CAs in the system from resulting in a breach of theapplications secure channel”[4].

2.2.2 Blacklisting

”SSL relies heavily on CAs to issue certificates to only the properly verified owners ofservers and domains. In rare cases, CAs are either tricked or, in the case of Comodo orDigiNotar, breached, resulting in the certificates for a hostname to be issued to someoneother than the owner of the server or domain.

In order to mitigate this risk, Android has the ability to blacklist certain certificates oreven whole CAs. While this list was historically built into the operating system, startingin Android 4.2 this list can be remotely updated to deal with future compromises”[4].

2.3 Android Platform[7][8]

Android is a mobile operating system (OS) currently developed by Google. Android OSconsists of several layers each with its own purposes (see Figure 2.7). In this section, wegive a high level overview of the Android operating system stacks. The focus of this workis on layer 4 (Applications) and the layer 2 (Libraries and Android Runtime).

Figure 2.7: Android stack

Source: http://img15.hostingpics.net/pics/301709androidstack.jpg



2.3.1 Linux Kernel

Android is built on top of the Linux Kernel. This choice has been made because Linuxis a mature open-source operating system that provides a pre-built, already maintainedoperating system kernel which includes a lot of hardware abstraction related componentsthat are readily available. Also, Linux is a highly secure system. All the Android ap-plications run as separate Linux processes with permissions set by the Linux system (seeSection 2.3.1).

This gives to the Android developers a complete, sane basis to start with and thepossibility to modify the Linux kernel to fit their needs (limited processing power, memorycapacity and battery life for instance).

Sandboxing

Running each application as separate processes with different permissions is known asSandboxing. Sandbox isolates your application data and code execution from other ap-plications. One application cannot access data in another application sandbox withoutexplicit permission.

Figure 2.8: Android Sandboxing

Source: http://www.pix-host.com/allimages/49100858.jpg

2.3.2 Libraries and Android Runtime

Native Libraries

The native libraries are C/C++ libraries, often taken from the open source community inorder to provide necessary services to the Android application layer. Some of the librariesare:

• WebKit: A fast web-rendering engine used by Safari, Chrome, and other browsers.

• SQLite: A full-featured SQL database.

• OpenGL: 3D graphics libraries.

• OpenSSL: The secure locket layer.

• And many others.

Android Dalvik Virtual Machine

Android use Java as primary programming language. Java compiles into bytecode andthen executed on Java Virtual Machine (VM). By using Java VM, it is possible to executeJava code on every machine that runs Java VM without have to recompile the Java code.



For Android, team at Google created a new virtual machine named Dalvik designedspecifically for mobile devices which has constraint on battery life and processing power.This makes the compilation process a bit different from standard Java (see Figure 2.9).After you created Java bytecode, you recompile it once again using the Dalvik compilerto Dalvik byte code. It is this Dalvik byte code that is then executed on the Dalvik VM.

Figure 2.9: Compilation

Source: http://img15.hostingpics.net/pics/253629dvmjvm.png

Dalvik is based on JIT (just in time) compilation. It means that each time you runan application, the part of the code required for its execution is going to be translated(compiled) to machine code at that moment. As you progress through the application,additional code is going to be compiled and cached, so that the system can reuse the codewhile the app is running. Since JIT compiles only a part of the code, it has a smallermemory footprint and uses less physical space on the device. However, there will be lagfrom the moment you click the application until its running.

2.3.3 Application Frameworks

Application Framework sits on top of native libraries, android runtime and Linux kernel.This framework come pre-installed with high-level building blocks that developers canuse to program applications. Following are the most important application frameworkcomponents for Android development in general.

Figure 2.10: Android Application Framework Bundles

Source: http://ows.edb.utexas.edu/sites/default/files/users/nqamar/AppFW.JPG



Activity Manager

Activity is a single focused thing. Activities can run in the foreground giving directinteraction to the user e.g. current window/tab, they can run as background services orthey can be embedded in other activities.

The entire lifecycle is defined by certain methods or states as shown in Figure 2.11.

Figure 2.11: Android Activity Lifecycle

Source: http://ows.edb.utexas.edu/sites/default/files/users/nqamar/activity.JPG

Contents Provider

Content provides handles data across applications globally. Android comes with a set ofbuilt in content providers to handle multi-media data or contacts etc. Developers canmake up their own providers for flexibility or they can incorporate their data in one of theexisting providers. For our specific application, we are interested in content://browser toaccess online data through browser interface.

View System

View system binds all the classes together that handle graphical user interface (GUI)related elements. All views elements are arranged in a hierarchical single tree manner.They can be called from a java code or included in XML layout files. One good thing thatwe noticed about Android development is extensive use of XML files. These files providea nice abstraction between backend java code and layout elements. Designing UI relatedelements is done in the same fashion as HTML based web designing.

Resource Manager

Resource Manager handles all non-code things. These can be anything ranged from icons,graphics or text. Such resources reside under res directory as can be seen under EclipseProject Explore in the following Figure 2.12.

All the icons and design work that we have done so far using Adobe Illustrator willreside under these layout directories.

Location Services

This bundle supports fine-grained location providers such as GPS and coarse-grained lo-cation providers such as cell phone triangulation. LocationManager system service is thecentral component of the location framework. This system service provides an underlying



Figure 2.12: Eclipse Project Explorer

Source: http://ows.edb.utexas.edu/sites/default/files/users/nqamar/explorer.jpg

API to access device location information. Besides LocationManager class, are severalother important classes from android.location that are important to location aware appli-cations such as Geocoder, GpsSatellite and LocationProvider.

2.3.4 Applications

At the top layer of the Android stack there is the Applications (or apps) layer. Theseapplications are what end users find valuable about Android. They can come pre-installedon the device or can be downloaded from the Google Play Store.

The file format used to distribute and install application software and middlewareonto Google’s Android operating system is the Application Package File (APK) (seeFigure 2.13). APK files are basically ZIP file formatted packages based on the JAR fileformat, with .apk file extensions.

Figure 2.13: The structure of an APK

Source: http://www.etechtube.com/wp-content/uploads/2014/09/directories-of-apk-file.jpg

The classes.dex file is the Dalvik Executable file that contains the Java libraries thatan application uses. This file is the most important material in the context of our work.Indeed, using reverse engineering techniques on it, we can obtain the Java and assemblysource code of any application.

Those Java and assembly source code are then used in Section 4 to evaluate the wirelesssecurity of the studied applications with the help of well known Android security tools.



2.4 Related Work

In 2012, Georgiev et al. shown in ”The most dangerous code in the world: validatingSSL certificates in non-browser software”[9], that the certificates validation process inSSL which is the world standard for secure Internet communications is completely brokenin many security-critical applications and libraries. They highlighted the importance ofgood authentication when dealing with encryption. That an improper usage of certificateswould result in a badly authenticated communication where an attacker can easily removethe encryption without any party noticing the attack. And therefore, getting access tosensitive data, such as passwords, and being able to modify the content of the traffic.They demonstrated with concrete examples that the root causes of these vulnerabilities arebadly designed APIs of SSL implementations and data-transport libraries which presentdevelopers with a confusing array of settings and options. They focused theirs work onnon-browser software, including diverse applications and libraries on Linux, Windows,Android, and iOS.

Then, later in 2012, Fahl et al. in ”Why Eve and Mallory Love Android: An Analysisof Android SSL (In)Security”[10] performed a static code analysis on a total of 13,500applications from the Android Market (currently called Google Play Store) using mallo-droid, a Python script that has been developed as part of their work. This static codeanalysis was designed to detect applications using incorrect HTTPS configuration whicheither accepts any certificate or any certificate signed by a trusted Certificate Author-ity. They found out 1,074 over the 13,500 analysed applications that are vulnerable toknown wireless attacks such as Man In The Middle Attacks. They limited theirs study toincorrect uses of SSL configuration on Android applications.

This thesis focuses on the wireless security of secure communications in the Androidplatform. This thesis differs from the previous related papers in the fact that we donot make a study on the Android API and documentation. Even though, we provide anoverview on those concepts which are really critical to get a good understanding on thecertificate validation process in the Android platform. Nonetheless, we reiterate the staticcode analysis that have been developed by by Fahl et al. in 2012 and then performed byChristopher Brodd-Reijer, a previous master thesis student at the Uppsala University in2014. Here, the purpose of reiterating the Fahl et al. static code analysis is to evaluatethe wireless security of the Google Play Store applications at this time. Furthermore, wemanually test the results and reliability of the static code analysis method in the contextof banking applications. We also explore the possibility to develop an other method todetect broken certificates validation in Android applications based upon dynamic codeanalysis.


Chapter 3

Static code analysis on the GooglePlay Store

In this thesis work, we examine a large sample of applications all downloaded from theGoogle Play Store. This analysis is performed using static code analysis.

The Google Play Store originally called the Android Market, is the Google officialdigital distribution platform. Among other things, it allows users to browse and downloadapplications published through Google. In 2015, it ”has reached more than 1.43 millionapplications published and over 50 billion downloads”[11].

The static code analysis consists in determining which classes and methods a givenapplication contains and what those methods return. This allows the detection of anycertificate validators inside the code of the given application which would validate invalidcertificate.

First, we analyse 4.795 applications from all the categories, and then we manually testthe accuracy of the obtained results for the most used Scandinavian banking applicationsthat have been labelled as potentially vulnerable by the static code analysis. The manualtest consists in exposing the applications to MITMAs using pandora-box1 which is aPython script that has been created as part of this research to perform wireless attacks.

3.1 Analysis context

The exclusive nature of the Google Play Store as the Android digital distribution platform,plus, the huge number of downloaded applications make it a critical link in the evaluation ofthe wireless security of Android applications. Therefore, it is important to draw statisticson the Google Play Store.

The Google Play Store is structured in the 27 following categories:

• Book and References

• Business

• Comics

• Communication

• Education

• Entertainment

• Finance

• Games

• Health and Fitness

• Libraries and Demo

• Lifestyle

• Live Wallpaper

• Media and Video

• Medical

• Music and Audio

• News and Magazines

• Personalization

• Photography

• Productivity

• Shopping

• Social

1https://github.com/AresS31/pandora-box

22


• Sports

• Tools

• Transportation

• Travel and Local

• Weather

• Widget

Each category is then structured as the 3 following sub-categories:

• Popular

– Free

– Paid

• New

– Free

– Paid

• Top grossing

– Free

– Paid

We select applications from different categories and with different year of publication,download number, and users rating as material for the static code analysis. With thesedifferent criteria, we get in hand all the data required to carry out an analysis regardingthe wireless security evolution of the applications distributed through the Google PlayStore. And perhaps more importantly, we also get an estimation of the number of usersusing insecure applications.

3.1.1 Challenges

We try to keep an even spread number of applications for each categories, but some of thecategories contain very few applications and the Google Play Store allows for retrieving”only” 500 applications under each of the six subcategories under all 27 categories thisgives a theoretical maximum number of 81,000 applications.

However, hermes, the Python script that we use to carry out the static code analysisis ”obsolete” in the sense of non compatible with the latest Google protocols version. Wehad to spend a considerable amount of time to make it operable.

Indeed, some of the requests that we send to the Google Play Store servers are rejecteddue to a protection added to the Google Play Sore servers against crawlers (defined acrawler in an annexe). This protection consists in solving a captcha capture after sendingtoo many requests. We use an escaping method (waiting X sec between each request) tobreak this protection down.

Nevertheless, the fairness of the static code analysis is compromised by the randombehaviour of the Google Play servers which does not always deliver the requested appli-cations. Still, we got a sample of 4,795 different applications (seventeenth times lowerthan the theoretical sample size) with at least 100 applications for each of the 27 differentcategories (verify this data). This sample size is sufficient enough in the context of ourwork and it takes around 50 processing hours to analyse it.

3.2 Static code analysis[12]

We carry out the Google Play Store static code analysis using the existing following tools:

• hermes2: A Python script that automatize the download, analyse and statisticgeneration of Android applications.

• googleplay-api3: An unofficial Python API that let you search, browse and down-load Android applications from the Google Play Store.

• mallodroid4: A Python script that search for broken SSL certificate validation ingiven applications.

2https://github.com/ephracis/hermes3https://github.com/egirault/googleplay-api4https://github.com/sfahl/mallodroid



• androguard5: A reverse engineering framework for Android applications.

We perform the static code analysis in three steps. First, we browse the Google PlayStore and list the applications to process. Second, we download and analyse the listedapplications one at a time before erasing them. And finally, we convert the results intographs and tables.

The hermes script performs these three steps. Christoffer Brodd-Reijer, a previousmaster thesis student at the Uppsala University developed this script. The hermes scriptuses several third party tools to perform these tasks. It uses the googleplay-api to interactwith the Google Play Sore. For the static code analysis, it uses mallodroid, created byFahl et al. mallodroid is a module of the androguard framework. Figure 3.1 shows howthese components are connected.

Figure 3.1: Overview of the hermes script

Source:[12]

hermes fetches meta data for each application when browsing the Google Play Store.These meta data include the author of the application, when the application was lastupdated, the rating of the application, the number of times the application has beendownloaded, whether or not the application requires the Internet permission, the price ofthe application, and the category the application belongs to.

Once the list of applications to analyse in each categories is created, hermes starts thestatic code analysis but only on the free applications that require an Internet permission.Indeed, we discard all the applications not requiring an Internet permission because theydo not connect to any network and thus it is waste of time and resources to study what typeof SSL certificate verifier they implement. We also discard all the non-free applicationsbecause we have to pay on the Google Play Store to get them and since we are running thestatic code analysis on several thousands applications the cost will too important. Then,those applications are downloaded and analysed one by one using the imported mallodroidmodule. The results are finally saved in additional meta data for each of the processedapplication.

The mallodroid static code analysis consists in searching certain pieces of code in theapplication Java source code in order to detect insecure code. These insecure pieces ofcode are the following:

• Custom TrustManager: A custom TrustManager is required for handling self-signed certificates or situations were the signing CA is unknown to the platform. ATrustManager is unsecure if it contains a checkServerTrusted method which alwaysreturns either true or void, and throws no exception. This is highly insecure as thiswill accept any certificate without proper validation.

5https://github.com/androguard/androguard



• Custom HostnameVerifier: Similarly to the TrustManager, HostnameVerifiercan sometimes be necessary but must not by itself be insecure. If a TrustManagercontains a verify method that always returns either true or void it is indeed in-secure. Another type of insecurity arises when the verify method instantiates anAllowAllHostnameVerifier object.

• Insecure SSLCertificateSocketFactory: Applications are also scanned for codewhich calls the static GetInsecure method of the SSLCertificateSocketFactory classas this will return a SSLSocketFactory with all the verification checks disabled.

3.3 Outcomes

During the static code analysis a total of 4,795 applications where found and downloadedfrom the Google Play Store using the googleplay-api. Of these applications a total of 4,419applications (92.16%) required an internet permission which means that they are likelyto establish network communications with remote servers. Among these 4,419 applica-tions, 3,754 applications were available free of charge. As stated in Section 3.2 only theapplications that are free on charge and require an internet permission are analysed. Thisresulted in 3,754 applications being analysed using the static code analysis method.

In the static code analysis results that will follow we use some words that first need tobe defined:

• Custom verifier: An application containing either a custom TrustManager or acustom HostnameVerifier is classified as containing a custom verifier.

• Naıve verifier: A custom verifier is considered naıve if it contains an empty check-ServerTrusted or verify method.

• Bad verifier: A custom verifier is considered naıve if it contains an AllowAllHost-nameVerifier or a SSLCertificateSocketFactory-¿getInsecure().

A full breakdown using the above terminology of the applications that have undergonethe static code analysis is shown in Figure 3.2. This figure shows that 51.31% of the appli-cations contain either a native verifier or no verifier at all. Only 8.74% of the applicationscontain a correct/secure custom verifier. The remaining 39.95% is for the applications thatmake an incorrect usage of SSL and thus open some doors for possible wireless attacks. In2014 Christoffer Brodd-Reijer[12] carried out a similar static code analysis and observedthat 29.06% of his applications sample implement insecure code. Keeping the differencesin the analysis context in mind we notice an increase in insecure applications (refer toSection 3.1).

0 500 1,000 1,500 2,000 2,500 3,000 3,500 4,000

Verifier type

AppsNative Custom Naive Bad

Figure 3.2: Types of verifiers observed



This insecure applications increase over time is also confirmed by the results that we gotof the applications containing bad verifiers grouped by their publication years (see Figure3.3). In this Figure 3.3, we can only make a comparison between the applications releasedor last updated in 2014 or 2015 since we do not have enough data for the other years(see Section 3.1). In the year 2015 we notice an increase of 11.71% in the applicationscontaining bad verifiers in comparison with the year 2014. This number is similar tothe increase of 10.89% in insecure applications in the static code analysis performed byChristoffer Brodd-Reijer in February 2014[12].

0 200 400 600 800

Unknown

2008

2009

2011

2010

2013

2012

2014

2015

Apps

Applications

0 20 40 60 80 100

Percentage

Percentage

Figure 3.3: Applications containing a bad verifier grouped by the year of publication orlast updated, and percentage of such applications in that group

It is also interesting to notice that the most downloaded applications are surprisinglynot more secure than the less downloaded ones. It is even the opposite, there is 9.87%more insecure applications among the applications that have been downloaded more than100,000,000+ times in comparison with the application that have only been downloadedbetween 0-99 times (see Figure 3.4).



0 100 200 300 400

0-99

100-9,999

10,000-999,999

1,000,000-99,999,999

100,000,000+

Apps

Applications

0 20 40 60 80 100

Percentage

Percentage

Figure 3.4: Applications containing a bad verifier grouped by download, and percentageof such applications in that group

The case of applications rating is similar to the applications download case. It is notbecause an application has more stars than an other one that this application is moresecure. This makes sense since users usually rate applications regardless of their security.In Figure 3.5 we observe that from the applications rated with no star or one star and theapplications rated with four or five stars there is an increase of 19.57% in the applicationscontaining a bad verifier.

0 200 400 600

0-1

1-2

2-3

3-4

4-5

Unknown

Apps

Applications

0 20 40 60 80 100

Percentage

Percentage

Figure 3.5: Applications containing a bad verifier grouped by rating, and percentage ofsuch applications in that group

Finally, those results shown that almost one application over two available on theGoogle Play Store is likely to be insecure. And that the download numbers or the pub-lication years and even the applications rating cannot be used as a trust index to checkwhether or not it is safe to download an application. The users that wish to use theapplications distributed on the Google Play Store needs to be aware of their security flawsand should only try to connect to trusted networks. However all the results shown in thisSection and in Section 5.1 must be taken with a grain of salt, indeed it happens that ap-plications containing a bad or naıve verifier are in reality totally secure to wireless attacks(see Section 3.4. More graphs and tables are available in Section 5.1.



3.4 Manual analysis of insecure banking applications

From the collected results (see Section 3.3), we focus on the insecure applications classifiedunder the Business category. Several banking applications are among them. Therefore,we compile a list with few of the biggest Scandinavian banking applications to carry outreal wireless attacks on each of them.

To perform the wireless attacks, we develop and use the pandora-box script which is aPython script created as part of this thesis works. The pandora-box script allows an userto do the following:

• To create a rogue access-point.

• To perform an evil twin attack.

• To sniff all kind of traffic including SSL traffic.

• To de-authenticate users from their legit Acess Point (AP).

• To monitor the rogue AP.

• To boost-up the network wireless interface power.

• To spoof the MAC addresses of the used network interface cards.

After testing our banking applications with the pandora-box script, we are surprised toobserve that all of those applications labelled as in-secure by mallodroid are in fact secure(see Figure 3.6). Indeed, according to the static code analysis with mallodroid, we shouldhave been able to intercept the selected banking applications SSL traffics, which was notthe case. They effectively contain insecure SSL certificate verifiers but it seems that thoseinsecure verifiers might never be used during the applications runtime. Therefore, they canbe considered as false-positives as the main purpose of mallodroid is to detect applicationsthat are vulnerable to wireless attacks by analysing if they contain SSL certificate verifiersthat are incorrectly implemented.

Figure 3.6: (In)Security cases

The banking applications behaviour under a MITMA, is not to connect to our APand instead displaying an error message basically saying that an error occurred and tocheck the network settings. This message means that the verifier implemented in thoseapplications do not validate our AP certificate during the SSL handshake. The reason inthis case is a hostname mismatch the applications expect a certificate from XXX.XXX(banking server) and receive instead one from YYY.YYY (the address of the computerrunning the pandora-box script).

From this ascertainment, we judge interesting to understand why mallodroid labelsthese banking applications as insecure. It comes out that mallodroid is not a hundred



percent reliable, it can generate false-positives since the static code analysis does not takein consideration the unused pieces of code nor the conditions under which the code shouldbe executed (if user is an administrator then run an insecure verifier else run a secureverifier) for instance.

From this point, finding a way to improve the accuracy of the analysis becomes relevantin the context of our work and we decide to perform a dynamic code analysis (see Chapter4).


Chapter 4

Identifying the critical codesection in the certificate validationprocess

As stated in Chapter 3.4, we cannot entirely rely on mallodroid which is the scriptdetecting broken SSL certificate validation. This script basically checks if insecure [Trust-Manager/HostnameVerifier/SSLCertificateSocketFactory] classes are present in the code,without verifying if this code is actually called or whether the certificate validation is doneby other means.

Therefore, we decided to develop an other analysis technique which is more reliablebecause generating no false-positives. This technique is based on a dynamic code analysisthat consists in generating applications log files during their runtime. Then those log filesare manually analysed to detect any signs of broken SSL certificate validation.

To do this analysis, we first need to generate the log files of the applications executionflow (we will call those log files traces) that we desire to examine. We do this by addinga stack trace to the every method that are contained in those applications. We usereverse engineering and code injection techniques to achieve this first step. Then once theapplications are traced, we run them to generate two dynamic log files (with and withoutundergoing a MITMA) containing their runtime method calls. Finally, we proceed to amanual analysis of the generated traces following a specific procedure that we developedin order to locate the critical code sections involved in the SSL certificate validation forfurther analysis.

4.1 Reverse engineering[13]

In this thesis work, we used reverse engineering techniques in order to study the functioningof the SSL validation in the selected studied banking applications.

Reverse engineering is the ability to generate the source code from an executable. Thistechnique is used to examine the functioning of a program or to evade security mechanisms,and so forth.

In other words, reverse engineering can be stated as a method or process of modifyinga program in order to make it behave in a manner that the reverse engineer desires.

We used popular Android reverse engineering tools which are the following ones:

• For the decompilation process, we used d2j-dex2jar1 to generate a .jar file fromapplications classes.dex file. Then, we used JD-gui2 to decompile the Java byte

1https://github.com/pxb1988/dex2jar2http://jd.benow.ca/

30


code (contained into the .jar file) into human readable Java source code.

Figure 4.1: Decompiling process of an Android app.

• For the disassembly process, we used baksmali3 to generate applications .smali filesfrom theirs classes.dex file. Baksmali is a word that comes from Icelandic languageand literally means Disassembler.

Figure 4.2: Disassembly process of an Android app.

• For the recompilation process, we used smali4 to assemble .smali files into a validapplications. Smali is also a word that comes from Icelandic language and literallymeans Assembler.

d2j-dex2jar, JD-gui, smali and baksmali are all available and pre-installed on Santoku5

which is a Linux distribution.We used both methods decompiling and disassembly in order to get the disassembled

Smali source codes and the decompiled Java source codes. However, we mainly worked onthe disassembled Smali source codes. Because in spite of the difficulty of dealing with alarge number of files in assembly language, it is an easy process to reassemble .smali filesback into a valid .apk. Which is not the case for decompiled Java source codes. Indeed,at this time, there is no technique known to recompile decompiled Java byte code backinto an .apk.

Therefore, we only used decompiled Java source code for the purpose of getting anoverview and a better understanding of the application implementations. Since Java is ahigh level programming language it is more easy to deal and understand .java files than.smali files which are written in Smali a low level assembly language.

Nevertheless, in addition of the impossibility to compile the decompiled Java sourcecodes back into an application, decompiled Java source codes are not reliable. Decompilingthe code source of an application can be viewed as translating a Chinese text to English,then to Hebrew and finally back to Chinese. It results a source codes that often make nosense, with for example no method termination; the Java decompilers handle loops andmultiple conditions very poorly.

4.2 Generation of the applications traces

To generate the applications traces, we used the Dalvik Debug Monitor Server (DDMS)6

which is an Android debugging tool. We used its logcat function that allows users to printand log applications debugging messages using the Java command Log.d(String tag, Stringmsg) or a similar command from the android.util.Log ; library.

3https://github.com/JesusFreke/smali4https://github.com/JesusFreke/smali5https://santoku-linux.com/6http://developer.android.com/tools/debugging/ddms.html



We developed smali-code-injector7 which is a Python script performing the codeinjection as well as the entire reverse engineering chain of Android applications (see Figure4.3):

Figure 4.3: Overview of the smali-code-injector script

• Android applications disassembling.

• Codes injection (inside of the .smali files).

• Re-assembling of .smali files into Android applications.

• Android applications signing with jarsigner 8.

The smali-code-injector script is capable of injecting any kind of code. The codeis directly injected into the applications .smali files and the script deals with registriesallocations and dependencies by parsing the assembly files and extracting the relevantinformation such as the registers number, the registers type, the methods names, and soon.

smali-code-injector can also be used for other purposes like hijacking any genuineAndroid applications in order to obtain an unlimited access to the host smart-phonesresources and data where the applications are installed. However, for this thesis work werestricted the usage of smali-code-injector to its tracing feature (the printStackTracefeature). We used this feature to inject a stack trace into every methods contained in thestudied applications; commonly several thousands methods are traced.

The applications traces consist in a repetition of entries each representing a methodcall (see Figure 4.4). The traces length are extending as long the applications are runningand calling methods.

Figure 4.4: A method call entry

7https://github.com/AresS31/smali-code-injector8jarsigner



Figure 4.5 shows an example of an application trace.

Figure 4.5: Application trace

For a human reader traces like shown in Figure ?? may not be easy to read andunderstand. The blames that can be imputed to those kind of traces is that they do nothighlight the intra-method calls (refer to Figure 4.6 to understand the intra-method callsconcept) nor announce the end of a method.

Figure 4.6: Intra-method calls concept

Therefore, to address those problems, we created indentation-fixer9 which is aPython script that format the traces generated with the smali-code-injector printStack-Trace feature and the Android debugger into a more friendly format. This new formattingallows users to get a better understanding of the applications execution flow by using anindentation system as well as a tag system (see Figure 4.7 for an example of a logfilegenerated by smali-code-injector and then treated with indentation-fixer).

Figure 4.7: Indented trace

9https://github.com/AresS31/indentation-fixer



4.3 SSL certificate validation critical code section identifi-cation

We tried to design a procedure to follow in order to find the method(s) handling the SSLcertificate validation for given applications. This procedure is based on the dynamic tracesthat we generated in Section 4.2.

This procedure involves the generation of two different traces for a same application.The first trace is a trace generated with the application running in a normal/neutralenvironment. The second trace is a trace generated whilst performing a MITMA on theapplication, the MITMA is performed with the pandora-box script (see Section 3.4).

The traces that are generated with smali-code-injectore and the Android debugger areusually very long, they usually contain hundred of thousand lines. It is a considerableamount of work to manually analyse such long traces. We addressed this issue by addinga new feature to the smali-code-injector script. This new feature is a filtering featurethat allows users to inject their codes inside methods containing specific keywords. Somespecificities of the Smali language made this feature very easy to implement. Indeed, Smalilike most of the low level assembly/programming language represents a method argumentsand returned variable(s) only with their types and the types are represented as the path ofthe file containing the type object (Ljava/lang/Integer, Ljava/lang/Boolean, and so on).Consequently, we pre-compiled a list of keywords that includes all the different possiblevariables types that are dealing with networking and SSL.

With this additional step, we lowered the length of the traces from hundred of thou-sands entries to ”only” a few thousand entries without loosing any relevant data for ourstudy of the SSL certificate validation.

Then, we used the Linux diff command to generate a logfile containing the differencesbetween two traces. The resulting logfile that we will call diff-log can almost never beempty because even in the case where an application is secure to wireless attacks therewill always been differences between each running instance of the application. For examplewe can have a method that generate a different id for each session, a method returningthe current time or whatever method generating random numbers. Those differences arenoises in the context of our study.

Finally, we have to proceed to a manual analysis of the diff-log. The procedure to dothis analysis, is to go at the beginning of the diff-log and to look at the first differences,applications containing methods with explicit method names are more easy to handlefor this analysis. At each difference that we judge relevant, we note the line where thismethod is present in the pointed trace and we go back to the original trace to examine theapplication execution flow before this noted line and see what may cause this difference.We open as well the decompiled Java source code (or the Smali source for people skilled inSmali language) of the application and we take a look at what the pointed method does.We repeat this operation as many times as necessary until we find the method responsibleof the SSL certificate validation (see Figure 4.8).



Figure 4.8: The manual analysis procedure

We automated most of this analysis process by using our own tool chain containingsmali-code-injector, pandora-box, indentator-fixer and the Linux diff command (see Figure4.9). Only the manual analysis procedure needs to be automated (see Figure 4.8). InChapter 5, we talk about the possible improvement of this method.

Figure 4.9: Dynamic code analysis tool chain


Chapter 5

Conclusion

We did a static code analysis on 4,795 applications from the Google Play Store, and wegot some disturbing results regarding the wireless security of the applications distributedthrough this platform. However, after manual analyses of some Scandinavian bankingapplications we found out that the static code analysis is not a reliable method to detectinsecure applications because its generate numerous false-positives. From this observationwe decided to develop a new analysis method that would not generate any false-positive.The main reason of this false-positive generation is that mallodroid the script that searchfor broken certificate verifiers takes as input the whole applications source code includingthe codes that are never called during the applications runtime (noise). Therefore, wedeveloped a dynamic analysis method. First, we decided to trace all the methods containedin the studied banking applications in order to generate a log files of their executioncode flow, so we can have all the functions that are actually called and used by thoseapplications. From this point, we used a manual analysis on the traces. Three traces weregenerated for each studied applications, one with the application undergoing a MITMA, another with the applications running in a normal environment and the third one was simplya log file containing the differences between the two first log files. Then, following a strictprocedure that we developed that consist in tracing back all the differences which seemto be relevant, we were able to locate the method(s) responsible for the SSL certificatevalidation. Future work can be done to improve this new method that we developed.We could for example try to automate the manual log files analysing which is currentlyvery heavy and time consuming. We could also develop an hybrid method that wouldconsist in giving as mallodroid inputs only the codes that are actually executed during theapplications runtime. Now that we are able to generate a list of all the methods which arecalled during the applications runtime using our dynamic code analysis method it wouldnot be very difficult to do this. Therefore, we could get rid of the noises which wouldsignificantly improve the mallodroid accuracy. We can also think about linking all theresults that would be obtained regarding the wireless security of Android applications toa database so users could be able to check out the wireless security of the applicationsthat they wish to install (this would be very important for the critical applications suchas banking applications).

5.1 My feedback

This thesis was really interesting, I do not regret that I chose it over the other propositionthat I got at this time. The thesis topic interested me a lot and since I desire to work inIT security making a good work was my main motivation.

However, the thesis approach surprised me at first. Indeed, the fact that I had nospecific missions, problematic was at first a little bit disturbing since I was not used to

36


this. But after a while we could precisely define the direction to take and what to do.Moreover, I think that six months was a too short period of time for one person to achievea such big project. Nevertheless, I think that I laid the first brick of an ambitious projectand I hope that someone will continue this work. In definitive, this thesis was a greatinsight in the Research and Development and doctorate universe.

Through this thesis, I acquired and extended a lot of precious knowledge including butnot limited to Python, Shell, Smali, Java, Debugging, Android Operating System, Reverseengineering, Wireless Security, Android Security...

On a non technical level, I learnt how to work autonomously mainly due to the distancewhere I lived to the University and my lack of transportation. I also learnt how to be selfmotivated, how to manage my time. I had weekly meeting with my supervisor though. Ialso gain a lot on a personal level living in an other country an meeting people from allaround the world.


Bibliography

[1] Margaret Rouse. authentication definition. Online; accessed 20-Sept-2015. 2015. url:http://searchsecurity.techtarget.com/definition/authentication.

[2] Certificate authority. Online; accessed 20-Sept-2015. 2015. url: https://en.wikipedia.org/wiki/Certificate_authority.

[3] Transport Layer Security. Online; accessed 20-Sept-2015. 2015. url: https://en.wikipedia.org/wiki/Transport_Layer_Security.

[4] Google. Security with HTTPS and SSL. Online; accessed 20-Sept-2015. url: https://developer.android.com/training/articles/security-ssl.html.

[5] Oracle. Interface TrustManager. Online; accessed 20-Sept-2015. url: http://docs.oracle.com/javase/7/docs/api/javax/net/ssl/TrustManager.html.

[6] Android. HostnameVerifier. Online; accessed 20-Sept-2015. 2015. url: http : / /

developer.android.com/reference/javax/net/ssl/HostnameVerifier.html.

[7] Arif Setiawan. Android Stack. Online; accessed 20-Sept-2015. 2014. url: https:

//github.com/devacademy/android-fundamental-one/blob/master/modules/

stack.md.

[8] The University of Texas at Austin. Android (SDK). Online; accessed 20-Sept-2015.url: https://developer.android.com/training/articles/security-ssl.html.

[9] Subodh Iyengar Dan Boneh Suman Jana Vitaly Shmatikov Martin Goergiev RishitaAnubhai. “The Most Dangerous Code in the World: Validating SSL Certificates inNon-Browser Software”. In: ACM Conference on Computer and CommunicationsSecurity ().

[10] Thomas Muders Matthew Smith Lars Baumgartner Bernd Freisleben Sascha FahlMarian Harbach. “Why Eve and Mallory Love Android: An Analysis of Android SSL(In)Security”. In: ACM Conference on Computer and Communications Security ().

[11] Google Play. Online; accessed 20-Sept-2015. 2015. url: https://en.wikipedia.org/wiki/Google_Play#cite_note-5.

[12] Christoffer Brodd-Reijer. “An evaluation of smartphone communication (in)security”.In: ACM Conference on Computer and Communications Security ().

[13] Reverse engineering. Online; accessed 20-Sept-2015. 2015. url: https://en.wikipedia.org/wiki/Reverse_engineering.

38

Appendix A

Appendix Graphs

0 50 100 150 200 250 300 350 400

Travel and LocalComics

News and MagazinesSocialGame

ShoppingWidgetsLifestyle

EntertainmentEducation

SportsTransportation

Books and ReferenceHealth and Fitness

FinanceBusiness

Music and AudioMedia and Video

Weather

PhotographyCommunication

ToolsMedical

ProductivityLive WallpaperPersonalization

Libraries and Demo

Apps

Applications

0 10 20 30 40 50 60 70 80 90 100

Percentage

Percentage

Figure A.1: Usage of the Internet permission

39


0 50 100 150 200 250 300 350 400

Game

Social

Widgets

Entertainment

News and Magazines

Media and Video

Business

Communication

Live Wallpaper

Personalization

Tools

Music and Audio

Productivity

Education

Shopping

Photography

Travel and Local

Medical

Books and Reference

Lifestyle

Finance

Weather

Sports

Transportation

Health and Fitness

Comics

Libraries and Demo

Apps

Native Custom Naive Bad

Figure A.2: Distribution of verifier types over categories.

40 Chapter A Alexandre Teyar

Appendix B

Appendix Tables

Category Total Internet permission Internet permission

Travel and Local 151 151 100.00%

Comics 65 65 100.00%

News and Magazines 267 265 99.25%

Social 302 299 99.01%

Game 400 396 99.00%

Shopping 142 138 97.18%

Widgets 353 343 97.17%

Lifestyle 136 132 97.06%

Entertainment 336 326 97.02%

Education 229 221 96.51%

Sports 141 136 96.45%

Transportation 78 75 96.15%

Books and Reference 263 251 95.44%

Health and Fitness 109 104 95.41%

Finance 109 104 95.41%

Business 258 245 94.96%

Music and Audio 191 181 94.76%

Media and Video 304 287 94.41%

Weather 195 184 94.36%

Photography 220 202 91.82%

Communication 278 252 90.65%

Tools 285 258 90.53%

Medical 165 144 87.27%

Productivity 233 200 85.84%

Live Wallpaper 268 226 84.33%

Personalization 272 216 79.41%

Libraries and Demo 123 70 56.91%

Total 4795 4419 92.16%

Table B.1: Distribution of applications with internet permission over categories

41


Verifier type Apps Percentage

Native or none 1926 51.31%

Custom 328 8.74%

Naive 546 14.54%

Bad 954 25.41%

Table B.2: Distribution of applications over verifiers types

Year Bad Bad

Unknown 0 0.00%

2008 0 0.00%

2009 0 0.00%

2011 0 0.00%

2010 0 0.00%

2013 7 6.42%

2012 3 13.64%

2014 93 16.26%

2015 851 27.97%

Table B.3: Applications with bad verifier grouped by the year they were published or lastupdated, and share of such applications in that group

Downloads Bad Bad

0-99 169 22.99%

100-9,999 140 22.22%

10,000-999,999 281 22.88%

1,000,000-99,999,999 341 31.26%

100,000,000+ 23 32.86%

Table B.4: Applications with bad verifier grouped by their download number, and shareof such applications in that group

Rating Bad Bad

0-1 11 7.33%

1-2 1 5.00%

2-3 41 21.24%

3-4 332 26.02%

4-5 569 26.90%

Unknown 0 0.00%

Table B.5: Applications with bad verifier grouped by their rating, and share of suchapplications in that group

42 Chapter B Alexandre Teyar

Documents

Security in mobile banking apps