12
An empirical study of third party APK’s URL using scriptable API and fast identifier-specific filter Ruo Ando, National Institute of Informatics, Japan Yuuki Takano, Shinsuke Miwa, National Institute of Information and Communications Technology, Japan ICCSN 2017: 2017 9th IEEE International Conference on Communication Software and Networks Guangzhou University city Guangdong University of Technology May 6-7

Iccsn2017 slideshare

Embed Size (px)

Citation preview

Page 1: Iccsn2017 slideshare

An empirical study of third party APK’s URL using scriptable API and fast

identifier-specific filter

Ruo Ando, National Institute of Informatics, Japan

Yuuki Takano, Shinsuke Miwa, National Institute of Information and Communications Technology, Japan

ICCSN 2017: 2017 9th IEEE International Conference on Communication Software and NetworksGuangzhou University city Guangdong University of Technology May 6-7

Page 2: Iccsn2017 slideshare

Abstract: URLs of Android third party’s APK files

• With rising popularization of Android application, third party of APK market has become attractive target of attackers. In this paper, we present a framework to inspect URL strings to which third party APK connects using headless browser and fast URL filter. In our system, for collecting APK files, navigation scripting with JavaScript enables more interactive web page crawling in order to fetch the results after dynamic web page loading.

• Besides, FARIS (fast uniform resource identifier-specific filter) is applied for matching URL strings in APK with black list in AdBlock Plus which is one of the most popular ad blockers.

http://www.cmcm.com/blog/en/security/2016-01-20/925.html

Android App Stores Become Significant Sources for Malware

Page 3: Iccsn2017 slideshare

System Overview: mining destination URL in APK’s

Casper JS

800,000 APK files and extracted 12,000 URLs

Phantom JS

Ablock Plus

Faris VMAdBlock's syntax Regular Expression

* /.*/

| of the beginning of the line /^/

| of the end of the line /$/

|| of the beginning of the line /[\w\-]+\/+/

0

50000

100000

150000

200000

250000

RANKING OF DESTINATION(TOP 40)

Enhanced APK crawler

Enhanced String Matchng

FARIS is byte code interiter for regular expressions, but for simplicity, it provides only four instructions.

Page 4: Iccsn2017 slideshare

Overview of D2 (Droid Dowser)

perl

CasperJS

PhantomJS

Xpath templates

AWS Cloud Formation

Qt WebKit

Stack Templatesdeploy

Loop generation

Lightweight DSL for each Distribution sites

API invocation

Qt Metacall incovaion

Crawler deployment for parallel retrieval

Page 5: Iccsn2017 slideshare

PhantomJS - sendEvent

Event Loop

Qt Metacall

SendEvent

void WebPage::qt_static_metacall(QObject *_o, QMetaObject::Call

_c, int _id, void **_a)

{

switch (_id) {

case 0: _t->initialized(); break;

case 31: _t->sendEvent((*reinterpret_cast< const

QString(*)>(_a[1])),(*reinterpret_cast< const

QVariant(*)>(_a[2])),(*reinterpret_cast< const

QVariant(*)>(_a[3])),(*reinterpret_cast< const

QString(*)>(_a[4])),(*reinterpret_cast< const QVariant(*)>(_a[5])));

break;

* - eventType: "keypress", "keyup" or

"keydown" (default: "keypress")

#4 0x000000000041b603 in WebPage::sendEvent (this=0x2cd5370, type=...,

arg1=..., arg2=..., mouseButton=..., modifierArg=...)

at webpage.cpp:1449

#5 0x000000000041b7a2 in WebPage::sendEvent (this=0x2cd5370, type=...,

arg1=..., arg2=..., mouseButton=..., modifierArg=...)

at webpage.cpp:1465

#6 0x0000000000467c4f in WebPage::qt_static_metacall (_o=0x2cd5370, _c=

QMetaObject::InvokeMetaMethod, _id=33, _a=0x7fffffffd9f0)

at moc_webpage.cpp:265

#7 0x00000000004687d6 in WebPage::qt_metacall (this=0x2cd5370, _c=

QMetaObject::InvokeMetaMethod, _id=33, _a=0x7fffffffd9f0)

at moc_webpage.cpp:361

#8 0x0000000000543b9f in JSC::Bindings::QtRuntimeMetaMethod::call(JSC::

ExecState*) ()

https://software.intel.com/zh-cn/forums/topic/289577

Page 6: Iccsn2017 slideshare

CasperJS – navigation scripting without callbacks

Start()

then()

run()

evaluateExecute function

Start() run()

callbacks

Qt Metacalls

PhantomJS

CasperJS

Query Selector

Dom Elements

Response(async)

Send event

Passing function

Return native type

Page 7: Iccsn2017 slideshare

Headless Browser with Scriptable JavaScript API

casper.run(function() {test.done();});});

var x = require('casper').selectXPath;

casper.options.viewportSize = {width: 1300, height: 700};

casper.test.begin('test', 1, function(test) {

casper.start('http://www.freewarelovers.com/android', function() { });

casper.waitFor(function check() {

return this.click(x("//*[\@id=\"fieldset\"]/table/tbody/tr[2]/td[1]/p[1]/a[1]"))

!= 0},

casper.start(ARGV[1], function() {

this.capture('google.png');

});

Page 8: Iccsn2017 slideshare

Perl: Xpath templates, loop generation and timeout this.click(x("//*[¥@id=¥"fieldset¥"]/table/tbody/tr

[2]/td[1]/p[1]/a[1]")),215

3 this.click(x("//*[¥@id=¥"fieldset¥"]/table/tbody/tr

[2]/td[1]/p[1]/a[2]")),255

for($counter=1;$counter<$item;$counter++) {

print "casper.waitFor(function␣check()␣{␣¥n";

print "return␣this.click(x(";

print "¥"";

print "//*[¥@id=";

print "¥¥";

print "¥"fieldset";

print "¥¥";

print "¥"]";

print "/table/tbody/tr[1]/td[3]/table/tbody/tr[";

print $counter."]/td/p/b/a¥"))␣!=0;␣¥n";

print "},";

print "function␣then()␣{␣¥n";

for($counter=1;$counter<$item;$counter++){

$TIMEOUT = 10;

eval {

local $SIGfALRMg = sub fdieg;

8 alarm($TIMEOUT);

$str = "/home/ubuntu/casperjs/bin/

casperjs␣test␣"

$pid = fork;

if ($pid == 0) {

exec($str);

}

else f

wait;

}

my $timeleft = alarm(0);

}

if ($@) f{

# timeoit

kill(SIGKILL, $pid);

Generating Java Scripts

Page 9: Iccsn2017 slideshare

FarisVM and AdBlock’s syntax

it has two registers, i.e., the string pointer (SP) and program counter (PC), as well as a frame stack for the SP and PC.

AdBlock's syntax Regular Expression

* /.*/

| of the beginning of the line /^/

| of the end of the line /$/

|| of the beginning of the line /[\w\-]+\/+/

URL filters can be efficiently and practically expressed. For example, ads.com, which is an exact pattern, does not distinguish between http://ads.com/b.gif and http: //ads.com/idx.html; however, ads.comˆ*.gif will filter only the former.

Page 10: Iccsn2017 slideshare

FARIS VM

• FARIS is based on a virtual machine approach for regular expressions, but for simplicity, it provides only four instructions.

• FARIS is a bytecode interpreter. Thus, to perform pattern matching, AdBlockPlus’s rules are translated into its machine instructions. FARIS interprets the four instructions as follows: char,skip_to, skip_scheme, match.

AdBlock's syntax Regular Expression

* /.*/

| of the beginning of the line /^/

| of the end of the line /$/

|| of the beginning of the line /[\w\-]+\/+/

input instruction

*c skip_to c

*^ skip_to separator

c char c

^ char separator

|| + line skip_scheme

| + line char head

line + | char tail

Page 11: Iccsn2017 slideshare

Experiments: MATCHING URL WITH ADBLOCK

list FARIS (ms)grep with regex (ms)

easylist_france 62416 3079

easylist_germany 487361 50454

easylist_italy 58318 1978

easyprivacy 4745 6740

fanboy_annoyance 4760 11276

japanese 56241 6992

japanese_tohu 1090 1383

malwaredomains_full 1032 15407

FARIS should be quite suitable for Web

browsers or browser extensions. AdBlock Plus

is one of the most popular browser extensions,

but it is implemented inefficiently. Using FARIS

could increase AdBlock Plus’s performance and

reduce its large memory utilization. Thus,

embedding FARIS into Web browsers or

JavaScript engines is a good choice for

improving overall performance.

Table VI shows the comparison of processing

time in matching strings in ADBLOCK Plus. We

have measured computing time in coping with

strings in ADBLOCK Plus with basic regular

expressions and FARIS. The results are different

according to item of ADBLOCK Plus. However,

it can be concluded that proposal method with

FARIS can work with reasonable processing

time compared with conventional regular

expressions.

Page 12: Iccsn2017 slideshare

Conclusion: investigating URLs of Android third party’s APK files using Faris VM

With rising popularization of Android application, third party of APK market has become attractive target of attackers. Unfortunately, there have been very few research efforts on empirical studies of the large number of APKs distributed by third party market. In this paper, we present a framework to inspect URL strings to which third party APK connects using headless browser and fast URL filter.

In experiment, we have collected 800,000 APK files and extracted 12,000 URLs. For matching URLs with AdBlock, we have applied FARIS for inspecting URL strings with list such as easylist, easy privacy and malware domains full. Experiment show that FARIS can process these strings in reasonable computing time compared with conventional regex method.