A Method for Mobile Download Conversion Rate Measurement …mint.fh-hagenberg.at/wp-content/uploads/2014/07/momm2013... · 2014-10-30 · Mobile devices have limited JavaScript capabilities,

A Method for Mobile Download Conversion RateMeasurement based on Device Fingerprinting

Alexander Schuch, Clemens Holzmann, Florian LettnerUniversity of Applied Sciences Upper Austria

Department of Mobile ComputingSoftwarepark 11, 4232 Hagenberg, Austria

{firstname.lastname}@fh-hagenberg.at

ABSTRACTUsers download mobile applications after being drawn to theapplication stores, including referrals from advertising cam-paigns on websites. To determine if a mobile applicationinstall originates from a specific campaign, mobile devicesneed to be uniquely identified before referral to the mobileapplication store and after a successful install. However,the mobile sandbox environment makes it impossible to ex-change device identifiers between a mobile web browser andother mobile applications on the same device. This paperintroduces an alternative approach that makes it possibleto create identifiers based on measurable device character-istics. The proposed use of device fingerprinting allows touniquely identify devices even across multiple mobile appli-cations and regardless of the mobile operating system. Thepresented comparison algorithm is capable of finding two in-dependently created identifiers that were measured on thesame mobile device, which makes it possible to determinesuccessful mobile application installs that originated from aspecific website.

Categories and Subject DescriptorsH.3.5 [Information Storage and Retrieval]: Online In-formation Services—commercial services, web-based services;I.5.5 [Pattern Recognition]: Design Methodology—fea-ture evaluation and selection

General TermsMeasurement, Algorithms

1. INTRODUCTIONThe Apple AppStore1 and the Google PlayStore2 are the

two largest mobile platform-specific application stores in

1http://itunes.apple.com2http://play.google.com

Permission to make digital or hard copies of all or part of this work forpersonal or classroom use is granted without fee provided that copies arenot made or distributed for profit or commercial advantage and that copiesbear this notice and the full citation on the first page. To copy otherwise,or republish, to post on servers or to redistribute to lists, requires prior spe-cific permission and/or a fee. MoMM2013, 2-4 December, 2013, Vienna,Austria. Copyright 2013 ACM 978-1-4503-2106-8/13/12...$15.00.

terms of the number of applications and the number of down-loads made by smartphone users. At the moment, bothstores only provide statistical data about the total numberof applications downloaded. However, none of them offersanalytical data to determine, how a user was referred to oneof the store’s download pages. Knowledge about the ori-gin application downloads is crucial in order to measure thenumber of downloads for a specific marketing campaign.

Because of the lack of information provided from the storesthemselves, it is difficult to associate mobile application down-loads to a specific advertising campaign. Being able to com-pare the number of downloads of different marketing cam-paigns with each other puts advertisers in control of thepotential outcome of similar campaigns in the future. Com-paring the number of referrals for one origin to the numberof users actually downloading and installing an application,helps to determine whether the money for a campaign was“well spent”.

In this paper we propose a methodology, which is basedon so-called device fingerprinting, to uniquely identify smart-phones across a native mobile browser and a native mobileapplication in order to measure the number of applicationinstalls from a specific marketing campaign. Device fin-gerprinting is used to gather minor technical nuances andvariations of different devices. These device-specific varia-tions can then be utilised to uniquely identify or distinguishsmartphones from each other.

Device fingerprinting is already a reliable way to distin-guish and identify users among different browsers on desktopcomputers. First, we introduce a scenario for device finger-printing on mobile devices and why it can be important todistinguish mobile devices from each other. Second, we willpresent related fingerprinting approaches for mobile devicesand how our approach is different. Third, we will explainthe proposed fingerprinting approach in detail. Therefore,we will show which features are available and how the match-ing process works. Additionally, some details on the imple-mentation are provided. Fourth, we present results of anevaluation, which consists of two different parts: Results ofa controlled experiment in order to evaluate the quality ofthe selected feature set and the performance of the appliedmatching strategy and results of a field study in order tomeasure the entropy of the selected features.

2. UNIQUE USER IDENTIFICATIONTo measure and compare a user’s activity of downloading

a mobile application, users have to be identified at discretepoints during the course of the activity. The possibility to

uniquely identify users is mandatory in order to establish aconnection between separately taken actions and to analysethe behaviour of users within a given context.

2.1 ScenarioOne common option to increase application downloads is

to introduce marketing campaigns (e.g. advertising on so-cial media platforms), that refer users to the appropriatemobile application store to download the advertised appli-cation. A download that is generated through a specific mar-keting campaign can generally be referred to as a conversionfrom that campaign [10]. It is important for marketers andadvertisers to measure different marketing campaigns andtheir impact on downloads and sales. To compare differentmarketing campaigns, the conversion rate is defined as thepercentage of all visitors of a campaign who performed adesired action and can be calculated using a simple formula.

Conversion Rate =Desired Actions

V isits(1)

Figure 1: Formula to calculate conversion rates. [7]

Similar marketing metrics can be calculated to give in-sights on the cost per click and cost per install of a cam-paign, thus giving an indication if a campaign is actuallyworth the money spent [9].

A conversion funnel can be described as an array of ac-tions that finally leads to a successful conversion. A mo-bile application download conversion funnel can be simpli-fied into two steps: The user being referred to the mobileplatform specific application store, and the purchase andinstall of an application. During each step of the conver-sion funnel, the user has to be uniquely identified in orderto find out if the final download of an application was thedirect consequence of a digital marketing campaign.

2.2 FingerprintingTo calculate the conversion rate of downloads from a mar-

keting campaign using Formula 1, the number of clicks (i.e.visits) for a campaign and the number of application installs(i.e. desired actions) have to be measured. In order to ob-tain the number of installs, the user needs to be uniquelyidentified in the referring mobile browser and later withinthe downloaded mobile application.

Figure 2: Illustration of the concept for a digitaldevice fingerprint.

Device fingerprinting is a way to create unique identifierseven on mobile devices, where communication and exchange

of identifiers between applications is restricted by a sand-boxed environment. Many electronic devices have subtlebut measurable variations that make it possible to create afingerprint. A set of variations can be measured in order touniquely identify or distinguish devices from each other [11,3]. In terms of fingerprinting there are several different setsof data that can be collected from the connection of the de-vice to a web server, a mobile web browser, and the device’snative software development kit (SDK):

• User settings: Specific settings to customise the userexperience on the device, such as language, time zone,font size or text color settings.

• Operating system: Settings beneath the applicationlayer, such as TCP/IP stack implementation or oper-ating system version, which can not be modified by theuser of the device.

• Mobile device hardware: Screen dimensions, screenpixel depth or MAC address.

2.2.1 Considerations for Mobile DevicesDevice fingerprinting capabilities on mobile devices are

much more limited compared to desktop computers due tolimited device capabilities and the sand-boxed environment[3, 4, 5]:

• Mobile devices have limited JavaScript capabilities,features like Adobe Flash3 support might be missing.

• Mobile operating systems like iOS or Android also ap-pear to have more static parameters in comparison tomany user settings on desktop computers.

• IP addresses of mobile devices change more often com-pared to desktop computers.

• Mobile devices tend to be updated more quickly thandesktop computers, which again makes them more uni-form to fingerprinting.

• There is no access to HTTP Cookies from other appli-cations than the mobile web browser and unique iden-tifiers provided by the mobile SDKs are not accessiblein the mobile web browser.

Despite these limitations, the methodology of this paper re-quires the ability to retrieve fingerprinting features acrossthe mobile web browser and a native application. Only theintersection of features available in both of the environmentscan be used to create identifiers that can be matched at alater point in time.

3. RELATED WORKBesides the approach presented in this paper, there are

other techniques to track conversions within a web browseror native mobile applications that are widely used for mar-keting purposes.

3http://get.adobe.com/flashplayer

3.1 Conversion Measurement using CookiesThe most common way to track conversions throughout

web browsers is via HTTP cookies [3]. For example, GoogleAnalytics4 and AdWords5 use cookies to measure conver-sions on advertisements displayed on websites and can evenbe used to track other conversion goals. HTTP cookies storeinformation within the client’s web browser cache. Whena user visits a website, this data can be retrieved from thecache and is usually used to maintain state between requestsor personalisation [6]. HTTP cookies can only be accessedby the website that initially created that cookie for privacyreasons [5]. However, content from a tracking server can beplaced on a website in order to give the tracking server ac-cess to cookies created by another website. Cookies are alsoused by third party web tracking and statistics services inorder to identify a user even across multiple websites [8].

Besides standard HTTP cookies, websites can make use ofFlash cookies for better persistence [13]. For user identifica-tion, Flash cookies also have a lot of advantages in compar-ison to standard HTTP cookies. Flash cookies are availablethrough Adobe Flash and are not directly stored in the webbrowser’s cache, but on some other central location on thedevice. This makes Flash cookies harder to find and delete,but also makes them accessible not only to the application(e.g. web browser) that initially set the cookie, but to allapplications on the same device [13].

With the rise of the HTML5 standard6, new possibilitiesto store content on the client side are supported by modernweb browsers out of the box. Like Flash cookies, HTML5local storage is a more persistent way to store tracking dataor regenerate HTTP cookies [1]. However, as one big advan-tage over Flash cookies there is no need for any third-partyadd-ons that have to be installed.

3.2 Cookies in the Mobile DomainHTTP cookies can also be set and accessed within mobile

browsers and used to measure mobile download conversions,e.g. in the Hotels.com iOS application7. The user is firstredirected to a website that sets an HTTP cookie with anunique identifier for the campaign that wants to be tracked.After redirection to the mobile application store, the userdownloads the mobile application to the device. However,upon the first launch of the application, the mobile browserhas to be launched programmatically to redirect the user toa website that checks if a campaign cookie was previouslyset. The website then triggers a custom URL schema thatreturns back to the mobile application.

3.3 DiscussionTo measure mobile download conversions, it is impossible

to use HTTP cookies or HTML Local storage, as it is notpossible to share HTTP cookie or local storage informationbetween the mobile web browser and a native mobile ap-plication. Flash cookies are not supported due to the lackof Flash support on iOS devices which is also increasinglyremoved from Android devices.

The approach of using cookies for mobile download con-version measurement redirects to another application, which

4http://www.google.com/analytics5http://www.google.com/adwords6http://www.w3schools.com/html/html5 intro.asp7https://itunes.apple.com/us/app/id284971959

is clearly visible to the user and might be very slow depend-ing on the device’s internet connection. In addition the mo-bile browser opened by the application needs to be the samebrowser that redirected the user to the mobile applicationstore. The presented approach in this paper is much moreuser friendly and unobtrusive, as the user does not even no-tice that conversion tracking takes place.

4. FINGERPRINTING CONCEPTA set of 7 fingerprinting features was found to be avail-

able through the mobile web browser and the native mobileapplication, thus can be used for cross application finger-printing. These features only represent an intersection offeatures that can be measured within both applications. Anadditional set of 6 features is exposed to a web server upon aHTTP request of the mobile device, thus are available acrossapplications. Table 1 shows an overview of the fingerprintingfeatures found within the mobile browser and a native appli-cation that can be used to uniquely identify mobile devicesto measure mobile application installs.

B = Mobile Web-Browser N = Native ApplicationC = Client Side S = Server Side

Available UsedFeature B N C S

HTTP cookies√

Flash cookiesDevice model

√ √

SDK UDID√

Browser plugins√

Screen pixel depth√ √

Screen color depth√ √

MAC address√

Operating system√ √ √

OS version√ √ √

System language√ √ √

Screen width√ √ √

Screen height√ √ √

Device pixel ratio√ √ √

IP address√ √ √

Name servers√ √ √

SYN time to live√ √ √

SYN options√ √ √

SYN window size√ √ √

SYN DF bit√ √ √

Timezone offset√ √ √

Table 1: Fingerprinting features that are availablein a mobile web browser, a native mobile applicationand on a web server.

The client-side features represent the intersection of avail-able information exposed within the mobile web browser(via JavaScript) and within the native mobile SDKs. HTTPcookies, Flash cookies, browser plugins, unique device iden-tifiers and MAC addresses could not be measured withinboth of the required environments and were therefore nottaken into further consideration. Screen pixel depth andscreen color depth would theoretically be available to thebrowser and a native application, but could not be accessedby the respective Android or iOS APIs. The device model

name is not always reported by the HTTP User Agent andwas chosen not to be incorporated into the final fingerprint.

TCP/IP stacks are implemented on the operating systemlayer and every HTTP request in the application layer isprocessed by the operating system’s TCP/IP stack. Differ-ent operating systems implement differing TCP/IP stacks,which sometimes even change between versions of the sameOS [2]. Name servers of the IP address are measured to seedifferences in uniqueness compared to the IP address itself.

4.1 OverviewMobile application marketing campaigns usually provide

a link to the mobile application that wants to be promoted,referring users to one of the platform specific mobile applica-tion stores. After clicking this referral link, users get directlyredirected to the mobile application store, where they canpurchase and download the promoted mobile application.However, using a direct referral link, there is no possibilityto gather any identifying information about the user thatwas referred to the store. Figure 3 gives a general overviewof a new proposed workflow that makes it possible to iden-tify a user before arrival to the mobile application store andto determine if a user actually installed a mobile applicationafter being referred by a specific marketing campaign.

Figure 3: Proposed workflow of the concept to mea-sure and persist browser and device fingerprints.

Instead of directly referring the user to the mobile ap-plication store, the user is referred to a specific campaignshort URL provided by a web service. This additional stepmakes it possible to measure data within the mobile webbrowser that identifies this user before arrival on the mobileapplication store, and marks the first step in the market-ing campaign’s conversion funnel for a mobile applicationdownload conversion.

After download and successful install of the mobile ap-plication, the user has to be uniquely identified within theinstalled application to make sure the last step in the down-load conversion funnel has been reached. The fingerprintmeasured within the native mobile applications can be com-pared to fingerprints previously measured within the mobilebrowser. If two matching identifiers are found, a successfulconversion from the top to the bottom end of a specific con-version funnel occurred, thus a successful conversion tookplace.

4.2 Browser FingerprintTo measure a browser fingerprint of a device, users are

first referred to a website that collects and persists subtlenuances and variations about the device that can be usedas an identifier. Identifying TCP/IP information is exposedon the server backend upon an HTTP request to the cam-paign’s short URL. The device’s operating system and oper-ating system version can be extracted from the HTTP UserAgent string that is sent through HTTP headers. An addi-tional set of information, that is not available to the serverbackend, is available via client-side JavaScript. A simpleHTML view is presented to the web browser, which exe-cutes the view’s embedded JavaScript. JavaScript is able toperform queries on features (i.e. screen dimensions, devicepixel ratio, system language and time zone offset), directlythrough the window and navigator objects provided by theDOM. The collected features are asynchronously sent to aweb server endpoint, which merges the two collected setsof features to a final browser fingerprint. All the collecteddata has to be persisted to a database to make it possibleto retrieve browser fingerprints at a later point in time. Thepersisted browser fingerprints in the database are later usedto find a match for a specific device fingerprint.

After the browser fingerprint was measured successfully,the user is redirected to the appropriate mobile applicationstore detail page to purchase and download the mobile ap-plication.

4.3 Device FingerprintAfter the user is referred to the mobile application store,

it is assumed that the user downloads and installs the mobileapplication to their mobile phone. Until this point of time,there is no way to identify a successful conversion, even if themobile application was installed through a specific campaignreferral. Another identifier (i.e. device fingerprint) has to becreated within the installed native application to identify theuser on its last step within the download conversion funnel.

In order to create a device fingerprint and identify thedevice within a mobile application, the mobile applicationhas to be opened by the user at least once. A native clientlibrary that gathers device fingerprinting data has to be in-cluded in the mobile application that wants to be tracked.Upon start of the mobile application, identifying data can becollected through the client library by querying native func-tions provided by the Android or iOS software developmentkit. In order to compare to previously measured browser fin-gerprints, device fingerprints with the same set of data haveto be collected from the native mobile application. The datacollected within the native mobile application should resem-ble the data of the browser fingerprint that was collectedwithin the client-side browser (via JavaScript) as closely aspossible. The collected device fingerprint information is sentto the web service, which persists the device fingerprint toa database. In addition to the information available on thedevice itself, the same server-side TCP/IP information thatis collected for the browser fingerprint can be collected fromthe request of the device. Combining server- and client-sideinformation forms the final device fingerprint, that includesthe same set of features that are included in previously col-lected browser fingerprints.

4.4 Matching FingerprintsUntil this point, browser and device fingerprints have been

collected independently and have no meaningful relationshipto each other. However, this relationship is the key to mea-sure successful conversions for a specific marketing funnel. Abrowser fingerprint and a device fingerprint measured on thesame device, truly belong together and have to be assignedto each other to represent this relationship, which resemblesa successful install.

After a device fingerprint is successfully registered withthe web service, the correct browser fingerprint that belongsto the device fingerprint has to be found. This is generallythe hardest part of the whole process, as the matching al-gorithm has to make sure that only fingerprints that trulybelong together are actually matched and assigned to eachother.

All browser fingerprints that were measured previous to adevice fingerprint are possible candidates for a match, how-ever the number of possible browser fingerprint candidatescan be reduced by taking a few facts into consideration:

• It is only possible to match browser fingerprints thatare associated with a marketing campaign for the samemobile application in which the device fingerprints aremeasured.

• It is assumed that the referral of the user to the mobileapplication store and the first open of the applicationhappens within a time frame of less than 15 minutes.This time frame seems to be a reasonable amount oftime, since the user already committed an action indi-cating to be willing to download the application. Thus,all browser fingerprints that were measured more than15 minutes before the measurement of the device fin-gerprint, can be excluded as well.

• Only browser fingerprints that have not already beenassigned to a device fingerprint are considered for apossible match.

Considering all of the above criteria, only a subset of possiblebrowser fingerprint candidates for a match remain. Each ofthe remaining browser fingerprint candidates is compared toa single device fingerprint. If a browser fingerprint matchesall of the identifying properties, it is assigned to the corre-sponding device fingerprint and a successful conversion forthe campaign’s source is measured. If no match is found, itcan be assumed that the user downloaded and installed themobile application through another source.

Problems might arise if certain features of the comparedfingerprints do not match, even if these two fingerprints trulybelong together. This might be due to the fact, that browserand device fingerprints are measured in two completely dif-ferent environments (i.e. mobile browser and native appli-cation) and that features are not reported in the same way.The matching algorithm therefore has to account for possi-ble errors in the measurement process and compensate themappropriately.

4.4.1 Compare using HashesThe first implementation of a matching algorithm was

based on the assumption that browser and device finger-prints that truly belong together, would share the exactsame features. All identifying features of a single finger-print are concatenated to a string and hashed using the MD5

hash function which returns a single 16-byte string value.The fingerprint hash is only computed once and stored ina database along with the corresponding browser or devicefingerprint features. Once a new device fingerprint is reg-istered and browser fingerprint candidates are selected, thehash of a newly created device fingerprint is used to look upbrowser fingerprints with the exact same hash value.

Whilst the implementation of this algorithm is fairly easy,problems arise when the identifying features of browser anddevice fingerprints do not exactly match. This might be dueto one of the following reasons:

• Android screen width and height: Android brow-sers, especially on Android 2.3, seem not to be report-ing the actual screen size of the device, but rather a vir-tual view port that changes inconsistently after sometime.8

• Device rotation: If a device is rotated, the reportedscreen resolution might change.

• Change of IP: If the device switches its IP addressbetween the measurement of browser and device fin-gerprint (e.g. switch to Wireless LAN connection).

The approach of hashed fingerprints is therefore infeasible todetermine fingerprints that truly belong together, if not all ofthe identifying features exactly match. A pattern matchingalgorithm can be used to eliminate some of the problemsand account for slightly changing features in fingerprints (seesection 4.4.2).

4.4.2 Finding the Nearest NeighbourIn order to solve the problem of matching tuples of browser

and device fingerprints, whose features only differ very little,a nearest neighbour algorithm was implemented. The inputof the algorithm are a single device fingerprint and an ar-ray of possible browser fingerprint candidates. It determinesthe single most similar browser fingerprint to the device fin-gerprint and provides a percentage of the confidentiality ofthe match. Because of the different types of collected fin-gerprinting features, different features need to be comparedusing different comparison methods. For example, two IPaddresses have a different comparison method, than featuresfor the system language.

The array possible browser fingerprint candidates is loopedto compare every browser fingerprint to the provided devicefingerprint, by comparing single features to each other. Thefeatures of the two current fingerprints are compared one byone using specific comparison methods based on the typeand characteristics of the feature as listed below. The com-parison methods are initialised with a single feature of thetwo fingerprints compared. Each of the comparison meth-ods returns a floating point value between 0 and 1, where1 indicates that the two features exactly match. Every re-turn value below 1 means that the two compared featuresdiffer from each other and 0 indicates that the values arecompletely different.

• Exact similarity: The two given values are checkedagainst complete equality. 1 is returned if the twofeatures match, 0 otherwise.

8http://tripleodeon.com/2011/12/first-understand-your-screen/

Feature Comp. method Max. num.IP address IPName servers Word arrayOperating system WordOS version WordDevice pixel ratio Number 3Language WordScreen width Number 1980Screen height Number 1080Timezone offset Number 14× 60SYN time to live Number 255SYN Window size Number 65535SYN options WordSYN DF bit Exact equality

Table 2: Fingerprinting features and the comparisonmethods used by the nearest neighbour algorithm todetermine the similarity of values between a browserand a device fingerprint.

• Word similarity: Calculates the similarity of twogiven strings. The strings are split into separate parts(i.e. tokens) of two characters each. These tokens arepushed to an array which is then sorted in descendingorder. The two resulting arrays (i.e. a and b) of tokensare compared to find the number of tokens that exactlymatch (i.e. x).

similarityword =2× length(x)

length(a) + length(b)(2)

• Word array similarity: Calculates the similarity ofan array of words. Each word in the two given inputarrays is compared using the word similarity methodoutlined above. The return value is calculated as theaverage of all the single return values of the word sim-ilarity method.

• IP similarity: Each octet of the two given IPv4 ad-dress strings is compared. Each IP address is split intoits four octets and each of the octets in the resultingarray is subtracted from 255. The difference betweeneach octet of the first and the second IP address isdetermined. The sum of all differences is divided bythe maximum possible difference of 4 × 255 and sub-tracted from 1 to obtain the similarity of the two givenIP addresses.

• Number similarity: Calculates the similarity of twogiven numbers. The two numbers are subtracted fromthe largest expected number (e.g. 255 for TTL). Thedifference of these two resulting numbers is divided bythe large number and subtracted from 1 to obtain thesimilarity of the two given numbers.

Table 2 shows a list of features and the comparison meth-ods used to compare two features of different fingerprintswith each other. The average of all resulting values fromall the comparison methods is calculated to obtain the totalsimilarity of a browser fingerprint compared to the singlegiven device fingerprint. This process is applied to all the

browser fingerprints in the array of possible browser finger-print candidates. The browser fingerprint with the highestsimilarity score (ideally 1 for a 100% match) is returned bythe nearest neighbour algorithm. If a browser fingerprint isfound and has a similarity score over a certain amount, it isassumed to be a match to the device fingerprint and countedtowards a successful conversion.

5. PROTOTYPE IMPLEMENTATIONBased on the proposed concepts for browser and device

fingerprinting as well as a nearest neighbour algorithm tomatch fingerprints, a web service is set up to measure mobileapplication download conversions.

5.1 Server ApplicationsTo measure, compare and match devices, browser and de-

vice fingerprints have to be persisted to a database. Twoseparate database tables for browser and device fingerprintsare used to save the collected fingerprinting data. The twodatabase tables mostly have the same set of columns, whichrepresent the different sets of fingerprinting features that aremeasured (see Section 4). In order to assign a device finger-print to a previously measured browser fingerprint, a thirddatabase table operates as a join table to assign tuples ofbrowser and device fingerprints. Every entry in the join ta-ble is also assigned to a specific campaign, to keep track ofthe context of the campaign the fingerprints were measuredin.

As shown in Fig. 4, browser fingerprints are created fromdata available on the server backend as well as client-sidedata, which is collected from the browser via JavaScript.Upon arrival of the user at the campaign’s short URL, theIP address and incoming port are recorded on the serverbackend. An array of name servers is directly queried bythe IP address. The iptables9 firewall is configured to writeincoming SYN packets to a log file. The previously gath-ered IP address and incoming port are used to find the cor-responding line for the current request in the iptables logs,which is then parsed using regular expressions to obtain theSYN fingerprinting features. For browser fingerprints, theHTTP User Agent is parsed as well.

5.1.1 JavaScript FingerprintAfter the server-side fingerprinting measurements were

performed and the results saved to the database, the client-side browser fingerprinting features are gathered inside themobile web browser using JavaScript. In order to performthese measurements, a visually empty HTML view is loaded.The collected client-side fingerprinting features then have tobe assigned to the browser fingerprint that was previouslycreated during collection of server-side data.

A unique identifier to the previously measured (server-side) browser fingerprint in the database is passed to theJavaScript implementation in order to assign client-side data.A request to the web server is issued to send the collectedclient-side data to a web service endpoint for further pro-cessing as shown in the last part of Fig. 4. After the requestwas sent, the user is immediately redirected to the appropri-ate mobile application store by setting the browser’s win-

dow.location to the URL of the correct mobile applicationstore.

9http://www.netfilter.org/projects/iptables/index.html

Figure 4: Architecture of the proposed process ofmeasuring client-side browser fingerprint featuresand updating a previously saved browser fingerprint.

The web service receives the request from the browser,which contains the client-side browser fingerprinting infor-mation in JSON format. A previously measured (server-side) browser fingerprint with the unique identifier sent inthe request URL is looked up in the database and updatedwith the client-side data that was received.

5.2 Mobile Client LibrariesThe iOS and Android mobile client libraries are the coun-

terpart to the remote web service and are implemented inthe mobile application that wants to be be tracked. Devicefingerprints are created from data available on the serverbackend as well as natively available features of the mobiledevice. In comparison to collection of browser fingerprints,the collection of data can not make use of any data from theHTTP User Agent as well as data collected from JavaScript.The native collection of fingerprinting features uses a varietyof information that is available through the mobile operat-ing system SDK on the mobile device. The natively collectedfeatures for the device fingerprint try to mirror the browserfingerprinting features collected in the mobile browser usingJavaScript and the features queried from the HTTP UserAgent.

As shown in Fig. 5, after successful collection of the avail-able fingerprinting features on the device, the features aresent to the remote web service for persistence. Further fin-gerprinting data, which is available from the connection to

the server (see Section 5.1), is collected and merged with thenatively available data to form a single device fingerprint.

Figure 5: Architecture of the proposed process tomeasure client-side device fingerprinting featuresand send them to a remote web service for furthercollection of data and persistence.

6. EVALUATIONBased on the implementation of the web service to mea-

sure download conversion rates, an experiment was set upto evaluate the process of collecting browser and device fin-gerprints as well as testing the proposed algorithms for theiraccuracy in a real-world environment.

6.1 Test ScenarioTwo separate experiments have been conducted. First,

an experiment was carried out under controlled conditionsto determine the performance of the prosed matching algo-rithm. Second, a field test, in with fingerprints from realusers were collected, has been carried out in order to mea-sure the entropy of the identified features on the one handand to identify fault-prone features on the other hand.

6.1.1 Lab ExperimentA set of 5 browser and 5 device fingerprints was collected

in order to test the accuracy of the proposed nearest neigh-bour algorithm that measures successful installs. Two Sony

B1 B2 B3 B4 B5D1 0.9558 0.7788 0.5273 0.6412 0.4709D2 0.8557 0.8809 0.5874 0.7241 0.5424D3 0.4950 0.6320 1 0.7954 0.9032D4 0.6185 0.6281 0.7954 0.9901 0.7397D5 0.5031 0.6516 0.9032 0.7397 1

B1, D1 Sony Xperia U #1 B4, D4 Apple iPhone 5B2, D2 Sony Xperia U #2 B5, D5 Apple iPad 2B3, D3 Apple iPhone 4S

Table 3: Results of 25 nearest neighbour algorithmcomparisons.

Xperia U Android devices, an Apple iPhone 4S, an AppleiPhone 5 and an iPad 2 were used to collect these finger-prints. One single browser and one device fingerprint weremeasured for each of the test devices. Notes about the affili-ation of the test device and the measured browser and devicefingerprint were kept. This set of data collected in a con-trolled environment makes it possible to verify if a browser ordevice fingerprint belongs to a specific test device or not. Forevery device fingerprint, the nearest neighbour algorithm isperformed on the whole set of browser fingerprints resultingin a total of 5× 5 comparisons.

6.1.2 Field StudyIn order to evaluate the quality of the 13 selected features,

a bigger set of data needs to be analysed. A simple market-ing website was set up which encouraged visitors to click ona button to download a mobile application. On click of thedownload button, a browser fingerprint was created and vis-itors were redirected to the appropriate mobile applicationstore to download the mobile application. An iOS and anAndroid application were submitted to the Apple AppStoreand Google PlayStore, which each included native client li-braries to measure device fingerprints and send them backto the web service.

The collected browser fingerprints from this study are usedto pollute the set of controlled browser fingerprints in orderto see if the nearest neighbour algorithm is able to find thecorrect browser fingerprint to a given controlled device fin-gerprint, even in a larger set of fingerprints. Furthermore,the collected data is used to analyse differences in browserand device measurements and find the most identifying fin-gerprinting features. In total, 63 browser and 671 devicefingerprints from different mobile devices were collected. 39browser fingerprints were measured from Android devices,24 from visitors on iOS devices. The majority of 638 devicefingerprints was measured by the iOS client library, com-pared to 33 device fingerprints measured within the mobileAndroid application.

6.2 Accuracy of Nearest Neighbour

6.2.1 Match controlled set of fingerprintsWithin the controlled set of 5 browser and 5 device finger-

prints, all device fingerprints could be correctly matched totheir corresponding browser fingerprint as shown in Tab. 3.The proposed nearest neighbour algorithm is 100% accuratein this small set of fingerprints, without a single false positivematch. The similarity score between comparisons of finger-prints from the two Sony Xperia U test devices is higher

than average, which is due to the fact that they share thesame operating system and vendor. Nevertheless, the finger-print tuple that truly belongs together could still be foundsuccessfully. In a similar manner, comparisons of features ofthe iPhone 4S and iPad 2 test devices returned a high sim-ilarity score around 0.9, but the comparison of the correctfingerprint tuples resulted in an even higher similarity scoreover 0.99. This shows that even between similar devices (i.e.same vendor, operating system or same device model), thenearest neighbour algorithm is able to correctly determinethe fingerprint tuple that truly belongs together.

6.2.2 Controlled and real-world fingerprintsIt is expected that the number of false positive matches

returned from the nearest neighbour algorithm increases ex-ponentially with the set of possible browser fingerprints fora single device fingerprint. 4 of the 5 (80%) controlled de-vice fingerprints could be correctly matched even within alarger set of 68 browser fingerprints (5 controlled and 63real-world), with only one false positive match.

6.2.3 Match real-world set of fingerprintsIn contrast to the experiments on the set of controlled

fingerprints, the accuracy of the nearest neighbour algo-rithm within the set of real-world fingerprints can not beexactly determined. Because no notes about the affiliationof browser and device fingerprints were taken, there is noway to make sure that a successfully matching fingerprinttuple truly belongs together. It is shown that device fin-gerprints are very uniform as over 61% (411 of 671) founda result with an average similarity score > 0.8. In com-parison, the number of successful matches with a similarityscore over 0.85 decreases from 411 to 243 (−59%). Even 32%less fingerprints could be matched with a similarity score> 0.9. However, all of these results match a higher numberof browser fingerprints than the actual number of 63 browserfingerprints that were collected, which gives the indicationthat the matching algorithm did not return correct resultsin this case.

6.3 Time makes a differenceBy matching all real-world fingerprints, the average time

between the creation dates of matching browser and devicefingerprints is over 7 days apart. Although there are nostudies about the time between a referral to the mobile ap-plication store and an actual download, it is very unlikelythat someone installed the mobile application over a weekafter being referred to the store.

Within the set of real-world fingerprints, limiting the set ofpossible browser fingerprint candidates by a 15 minute timespan between the creation date of the browser and the devicefingerprint has the positive effect of a 0.03 increase of theaverage similarity score. However, the average time betweena referral to the mobile application store and the first openof the downloaded application decreases to 3 minutes and13 seconds. Just over 3 minutes to download, install andopen the mobile application is much more reasonable andthe average similarity score did also rise, which leads to theassumption that a time restriction is beneficial in order toincrease the accuracy of the overall result.

Only 4% of all device fingerprints (versus 54% withoutlimiting browser fingerprint candidates by time) could bematched to a browser fingerprint with a similarity score over

0.8, whilst the number of matches over 0.85 decreases from32 to 29 (−10%). 17% less fingerprints could be matchedwith a similarity score > 0.9. All results are below the totalnumber of 63 browser fingerprints collected in this study,which is a plausible result.

The initially low matching rate of 4% is due to the factthat both of the mobile applications were publicly availableon the mobile application stores, thus a lot of people down-loaded the mobile applications without knowing about theexperiment. Therefore the majority of device fingerprintsdoes not belong to one of the measured browser fingerprintsand the maximum number of matches is 63 or 9% of 671collected device fingerprints. This indicates that much moreaccurate results can be achieved by limiting the number ofpossible browser fingerprinting candidates by their creationtime.

6.4 Differences between browser and deviceBrowser and device fingerprints are measured in the mo-

bile browser and within a mobile native applications. Mea-surements of certain features return different values in bothof these environments. All iOS devices with a similarityscore > 0.85 could be exactly matched. This shows thatmeasurement of browser fingerprints in the mobile browseris as accurate as measuring fingerprinting features within anative iOS application.

On Android devices, 56% had mismatched screen widthand 65% mismatched screen height features, due to the factthat they were not correctly measured within the mobilebrowser. The measured JavaScript screen dimension fea-tures did not return correct results. Instead of the screenresolution, the current size of the website view (actual heightminus tool bars, etc.) was returned. 17% of Android devicesdid not exactly match the SYN options or language features,which is most likely because of some false positive matcheswithin the nearest neighbour algorithm. This also comesclose to the 20% false positive rate of matches outlined inSection 6.2.2.

For 5 of the 13 measured fingerprinting features (38%)a mismatch could be counted. None of the returned fin-gerprinting tuples had mismatching features for IP address,name servers, time zone offset, operating system, operat-ing system version, window size, SYN time to live and SYNdon’t fragment bit. This leads to the conclusion that thesefeatures are measured very accurately in both of the testedenvironments and are valuable parts of the final fingerprints.

6.5 Most identifying fingerprinting featuresBased on a set of fingerprints, it can be determined which

fingerprinting features have the most unique values and there-fore most identifying characteristics, thus giving an indica-tion on the necessity to measure and incorporate a specificfeature into the final fingerprint and to prove that the cho-sen features are actually useful to form unique device identi-fiers. The measure of entropy was chosen to be the measureof uniqueness amongst all measured fingerprinting features.

6.5.1 Entropy MeasurementThe entropy of a set of features is the level of disorder

within this given set of features. The entropy therefore givesa good indication on the overall uniqueness of the featureswithin a given set. Minimal entropy is measured if all fea-tures in the given set have the exact same value. Maximal

entropy is measured if every single feature in the set has adifferent value. Based on the formula shown in Fig. 6, theentropy of every single fingerprinting feature can be calcu-lated in bits.

Entropy(X) = −n∑

i=1

p(xi) log2 p(xi) (3)

Figure 6: Formula to calculate the entropy of a setof values (X) in bits. [12]

p(xi) refers to the probability of the distinct occurrenceof a single feature in the given set of features. For example,the probability of a feature that occurs twice in a set of 10features, is 0.2. In order to compare different entropy cal-culations for different sizes of datasets, the results for eachdataset have to be normalised. All resulting entropy cal-culations of a single dataset (e.g. Browser fingerprints iOS)are divided by the maximum entropy found in this set, thusreturning a value in the range of 0 to 1, where 1 refers tothe highest entropy in this dataset.

6.5.2 ResultsShown in Fig. 7, the IP address, SYN options and name

servers are the most identifying fingerprinting features forbrowser and device fingerprints measured for both of themobile operating systems in this study. In fact, 57 of 63(90%) browser fingerprints and 664 of 671 (98%) device fin-gerprints in the set of real-world fingerprints have a uniqueIP address. This matches the result of the study taken byFlood et al. [4] which also identified the IP address as themost identifying fingerprinting feature. However, the datacollected in this study shows that this does not only applyto desktop computers, but to mobile devices as well.

The entropy of the screen width and screen height is morethan twice as high for fingerprints measured on Android de-vices compared to iOS devices, due to the fact that thereare many different Android phones from different vendorsavailable, all featuring different screen sizes. Screen sizesfor iOS devices are shown to be much more uniform, whichalso corresponds to the fact that there are only 3 differentscreen sizes available for Apple mobile devices (i.e. iPhone

Figure 7: Comparison of the normalised entropy ofbrowser and device fingerprints separated by theiroperating system.

≤4 , iPhone 5, iPad). The measured screen width of all iOSdevice fingerprints has an entropy of 0 bits, which is due tothe fact that the iOS application was only available for theiPhone, which has a screen width of 320px over the rangeof all available iPhone device models to date. The entropyof the screen width for iOS browser fingerprints is higher,as iPad devices are also contained in the analysed dataset.iOS device fingerprint screen height is a bit less uniform asiPhone 5 devices feature a different screen height than pre-vious models.

The device pixel ratio also seems to be less uniform onAndroid compared to iOS devices, which is due to the factthat Android devices in the real-world dataset feature de-vice pixel ratios of 0.75, 1,1.5 and 2, whilst iOS devices onlyhave device pixel ratios of 1 and 2. The operating systemversion is a more identifying fingerprinting feature for An-droid devices, whilst measured iOS operating system ver-sions are much more uniform. This also corresponds to theconclusion by Flood et al. [4] that mobile devices are up-dated more quickly than desktop computers. The operatingsystem itself has the minimum entropy as the graph’s data issorted by this feature. In a nutshell, it can be said that iOSdevices are generally much more uniform compared to An-droid devices, which makes it harder to uniquely distinguishthem from each other.

7. CONCLUSIONSIn this paper we presented an approach for download con-

version rate measurement based on device fingerprinting.Therefore we identified 21 features of which 13 could bemeasured both within mobile browsers (i.e. with JavaScript)as well as in native applications. For fingerprint match-ing, we combined a nearest neighbour search algorithm witha rule-based approach in a preliminary step, as some con-straints could be identified during the evaluation (e.g. timeconstraint). As a consequence of our evaluation, we alsofound out that not all of the 13 selected features can be con-sidered suitable. One the one hand we showed that there istoo little entropy for some features (e.g. all iPhones 3G havethe same screen resolution) to distinguish between users andone the other hand we found out that some features cannotbe measured reliably (e.g. Android browsers report wrongscreen sizes).

Although the proposed system to measure mobile down-load conversions did report plausible results, more researchhas to be done in order to improve the nearest neighbourmatching algorithm. It is necessary to further test the algo-rithm with a larger set of controlled fingerprints in order tosee how it performs and to make improvements in terms ofits accuracy and sustainability against possible false positiveresults. Also an evaluation of more sophisticated matchingalgorithms will be performed in the future to find out if theaccuracy can be increased.

However, the proposed system can help to compare differ-ent marketing campaigns and determine the overall benefitin terms of the number of downloads versus the amount ofmoney spent. Although the number of measured downloadsis just a good estimation on the actual number of mobileapplication downloads, it is accurate enough to give a bet-ter understanding on the success or failure of future mobileapplication marketing campaigns.

8. ACKNOWLEDGEMENTSThe research presented is conducted within the project

“AUToMAte – Automated Usability Testing of Mobile Ap-plications”funded by the Austrian Research Promotion Agency(FFG) under contract number 839094.

9. REFERENCES[1] M. Ayenson, D. J. Wambach, A. Soltani, N. Good,

and C. J. Hoofnagle. Flash cookies and privacy ii:Now with html5 and etag respawning. Social ScienceResearch Network Working, pages 2–7, 2011.

[2] R. Beverly. A robust classifier for passive tcp/ipfingerprinting. In Passive and Active NetworkMeasurement. Springer Berlin / Heidelberg, 2004.

[3] P. Eckersley. How unique is your web browser?Technical report, Electronig Frontier Foundation,2009.

[4] E. Flood and J. Karlsson. Browser fingerprinting.Master’s thesis, Chalmers University of Technology,University of Gothenburg, Sweden, 2012.

[5] G. Gulyas, R. Schulcz, and S. Imre. Comprehensiveanalysis of web privacy and anonymous web browsers:Are next generation services based on collaborativefiltering?

[6] A. Juels, M. Jakobsson, and T. N. Jagatic. Cachecookies for browser authentication. In In IEEESymposium on Privacy and Security, pages 301–305.IEEE Computer Society, 2006.

[7] C. Juon, D. Greiling, and C. Buerkle. InternetMarketing Start to Finish. Que Biz-Tech Series. Que,2011.

[8] B. Krishnamurthy and C. E. Wills. Generating aprivacy footprint on the internet. In Proceedings of the6th ACM SIGCOMM conference on Internetmeasurement, IMC ’06, pages 65–70. ACM, 2006.

[9] J. Michie. Street Smart Internet Marketing: Tips,Tools, Tactics & Techniques to Market Your Product,Service, Business Or Ideas Online. PerformanceMarketing, 2006.

[10] B. Mordkovich and E. Mordkovich. Pay-Per-ClickSearch Engine Marketing Handbook: Low CostStrategies to Attracting New Customers Using Google,Yahoo & Other Search Engines. MordComm, 2005.

[11] K. Mowery, D. Bogenreif, S. Yilek, and H. Shacham.Fingerprinting information in javascriptimplementations. In H. Wang, editor, Proceedings ofW2SP 2011. IEEE Computer Society, 2011.

[12] V. Singh. Entropy Theory and its Application inEnvironmental and Water Engineering. Wiley, 2013.

[13] A. Soltani, S. Canty, Q. Mayo, L. Thomas, and C. J.Hoofnagle. Flash cookies and privacy. In AAAI SpringSymposium: Intelligent Information PrivacyManagement. AAAI, 2010.

Documents

A Method for Mobile Download Conversion Rate Measurement …mint.fh-hagenberg.at/wp-content/uploads/2014/07/momm2013... · 2014-10-30 · Mobile devices have limited JavaScript capabilities,