48
Taint Tracking Through UTF Extension by Bože Zekan supervised by Dr. Mark Shtern, Dr. Vassilios Tzerpos Computer Science and Engineering Faculty York University funded by NSERC USRA Grant

Taint Tracking Through UTF Extension

  • Upload
    salali

  • View
    30

  • Download
    0

Embed Size (px)

DESCRIPTION

Taint Tracking Through UTF Extension. by Bože Zekan supervised by Dr. Mark Shtern, Dr. Vassilios Tzerpos Computer Science and Engineering Faculty York University funded by NSERC USRA Grant. Topics To Be Covered. Some threats from user input Taint tracking Previous work Our work. - PowerPoint PPT Presentation

Citation preview

Page 1: Taint Tracking Through UTF Extension

Taint Tracking Through UTF Extension

byBože Zekan

supervised byDr. Mark Shtern, Dr. Vassilios Tzerpos

Computer Science and Engineering FacultyYork University

funded byNSERC USRA Grant

Page 2: Taint Tracking Through UTF Extension

Topics To Be Covered

• Some threats from user input

• Taint tracking

• Previous work

• Our work

Page 3: Taint Tracking Through UTF Extension

Topics To Be Covered

Our work

• Unicode

• Implementations

• Results

Page 4: Taint Tracking Through UTF Extension

The Problem We Are Addressing

• Estimated that > 80% of web services contain security vulnerabilities 1

• Many of these (50 to 82%) are user command injection vulnerabilities 1

[1] Chin, Erika, and Wagner, David. Efficient Character-level Taint Tracking for Java. In Procedings of SWS’09, November 13, 2009, Chicago, Illinois, USA. ACM 978-1-60558-789-9/09/11

Page 5: Taint Tracking Through UTF Extension

Our Goal

Reduce security vulnerabilities that may occur when dealing with user input

User input: - input from an actual physical person - input from another program, file, database, etc

OR- any data that is not a literal constant in our

program or has not been generated by the manipulation of literal constants in our program

Page 6: Taint Tracking Through UTF Extension

Some User Command Injection Threats:

• SQL injection

• Cross-site scripting (XSS)

• Path traversal

• Shell injection attacks, http response splitting, ...

Page 7: Taint Tracking Through UTF Extension

SQL Injection

query = "SELECT * FROM students WHERE name = '" + studentName + "'";

SELECT * FROM students WHERE name = 'bobby'

Page 8: Taint Tracking Through UTF Extension

SQL Injection

From: Exploits of a Mom webcomic at http://xkcd.com/327/

Page 9: Taint Tracking Through UTF Extension

SQL Injection

SELECT * FROM students WHERE name = 'bobby'; DROP TABLE students; --'

query = "SELECT * FROM students WHERE name = '" + studentName + "'";

Page 10: Taint Tracking Through UTF Extension

Cross-Site Scripting (XSS)

<p>Anonymous </br>0 Hours Ago </br> Have you noticed that Soros spelled backwards is still Soros? Coincidence, I think not!</p>

html="<p>" + name + " </br>" + when + " </br>" + comment + "</p>";

Page 11: Taint Tracking Through UTF Extension

Cross-Site Scripting (XSS)

<p>Anonymous </br>0 Hours Ago </br> <script> window.location="http://www.mybadsite.com/"</script></p>

html="<p>" + name + " </br>" + when + " </br>" + comment + "</p>";

Page 12: Taint Tracking Through UTF Extension

Path Traversal

filename: /srv/www/users/bobby/myhomework1.doc

filename = "/srv/www/users/bobby/" + filename;

Page 13: Taint Tracking Through UTF Extension

Path Traversal

filename: /srv/www/users/bobby/../cse3000/tentativetestquestions.doc /srv/www/users/cse3000/tentativetestquestions.doc

filename = "/srv/www/users/bobby/" + filename;

Page 14: Taint Tracking Through UTF Extension

To Prevent the Propagation of Malicious Data

Possible solution #1: Carefully parse/sanitize/analyze all data being sent to a sensitive data sink

SELECT * FROM students WHERE name = 'bobby'

SELECT * FROM students WHERE name = 'bobby'; DROP TABLE students; --'

<p>Anonymous </br>0 Hours Ago </br> Have you noticed that Soros spelled backwards is still Soros? Coincidence, I think not!</p>

<p>Anonymous </br>0 Hours Ago </br> <script>window.location = "http://www.mybadsite.com/"</script></p>

/srv/www/users/bobby/myhomework1.doc

/srv/www/users/bobby/../cse3000/tentativetestquestions.doc

... and hope that you catch everything from among all the possibly combinations, and don't discard any valid requests

Page 15: Taint Tracking Through UTF Extension

To Prevent the Propagation of Malicious Data

Possible solution #2: Carefully parse/sanitize/analyze all user supplied data being sent to a sensitive data sink

SELECT * FROM students WHERE name = 'bobby'

SELECT * FROM students WHERE name = 'bobby'; DROP TABLE students; --‘

<p>Anonymous </br>0 Hours Ago </br> Have you noticed that Soros spelled backwards is still Soros? Coincidence, I think not!</p>

<p>Anonymous </br>0 Hours Ago </br> <script>window.location = "http://www.mybadsite.com/"</script></p>

/srv/www/users/bobby/myhomework1.doc

/srv/www/users/bobby/../cse3000/tentativetestquestions.doc

... and hope that you catch everything from among all the possibly combinations, and don't discard any valid requests

Page 16: Taint Tracking Through UTF Extension

Taint Tracking Makes Possible Solution 2

Taint tracking consists of three main steps:

1. Identifying untrusted input at the point that it enters the program and marking that it is untrusted (i.e., tainted). 2. Propagating the taint information At each subsequent computation, mark as tainted all data that

is derived from an untrusted source.

3. Checking all data going into sensitive data sinks (e.g., a database,

or output response, or file) Use the taint information to identify potential attacks.

Page 17: Taint Tracking Through UTF Extension

Taint Tracking

Taint tracking comes in two possible flavours:

1. String level – mark the entire string as tainted

2. Character level- mark individual characters as tainted- allows for finer granularity

Page 18: Taint Tracking Through UTF Extension

How Can Character Level Tainting Be Achieved?

One method, by Chin and Wagner, of USC Berkley 1

Expand the structure of the Java String class to include a boolean array which stores the taint status for each character in the string.

[1] Chin, Erika, and Wagner, David. Efficient Character-level Taint Tracking for Java. In Procedings of SWS’09, November 13, 2009, Chicago, Illinois, USA. ACM 978-1-60558-789-9/09/11

Page 19: Taint Tracking Through UTF Extension

The Chin and Wagner method

Their achievement: Implementing a solution which minimizes the need to rewrite existing application code while transparently decreasing the vulnerability of applications to threats tracking

Their shortcomings:• Specific to Java

• Increases the memory required to store a string in Java

• The taint status of the java char primitive cannot be determined

• Not readily adapted to other programming languages

• Their taint information cannot propagate onwards to a database, or an application, script, or procedure running in another programming language.

Page 20: Taint Tracking Through UTF Extension

How can character level tainting be achieved?

Our method:Expand Unicode to include tainted characters

Our achievements: · Implement a solution which minimizes the need to rewrite existing application source code while

transparently decreasing the vulnerability of applications to threats. · Is not specific to Java

· Does not increase the memory required to store a string in Java

· The taint status of the java char primitive can be determined

· Is readily adapted to other programming languages · The taint information can propagate onwards to a

database, or an application, script, or procedure running in another programming language

Page 21: Taint Tracking Through UTF Extension

What is Unicode?

• A scheme that assigns a codepoint to each character in current use throughout the world

• Has been implemented in XML, Java, Microsoft.NET, web browsers, databases, and modern operating systems.

Page 22: Taint Tracking Through UTF Extension

Unicode

• Can accomodate 1,114,112 codepoints in 17 “planes” of 65,536 characters each

• Most of the codespace is still unassigned• Mechanisms (ex. UTF-8, UTF-16 ...) exist

that already allow software to manipulate and store all these codepoints even if no characters have been assigned to them

Page 23: Taint Tracking Through UTF Extension

Our Design, Part 1Tainting & Propagating Taint• We create a “tainted” character for every

character and assign it an unused codepoint

Ex. Untainted Tainted

(ascii: 41hex) A A (Unicode: U+0041) (Unicode:U+E041)

(ascii: 7Ahex) z z (Unicode: U+007A) (Unicode:U+E071)

• Now wherever a character’s codepoint goes, it’s tainted or untainted status goes with it

Page 24: Taint Tracking Through UTF Extension

Tainting Algorithms

• To taint a user input character x: __codepoint(tainted x) = codepoint(x) + OFFSET

• To check if character x is tainted or not:

if (codepoint(x) is in tainted codepoint range) ___character x is tainted //is user supplied else character x is untainted

• To remove taint from tainted character x: __ codepoint(x) = codepoint(tainted x) - OFFSET

Page 25: Taint Tracking Through UTF Extension

Our Design, Part 2The Transparent Protection Framework

Consider a typical vulnerable web application:

Page 26: Taint Tracking Through UTF Extension

Designing The Added Transparent Protection Framework

Consider a less vulnerable web application:

• User’s OS has fonts which incorporate tainted characters• Request Intercept Wrapper uses custom taint aware

classes/functions and is generic for a given technology• Application is on a server w/taint awareness built into its

library functions• Database Driver Intercept Wrapper uses custom taint aware

classes/functions specific to the database to check for SQL injection, and drop malicious queries

Page 27: Taint Tracking Through UTF Extension

Implementation Details: The Font

For a final, universally adopted application:• System fonts would be expanded to include tainted

characters, which would look identical to their untainted counterparts

Ex. untainted ABCDE ... vs tainted ABCDE ...

For our proof of concept: • Tainted vs untainted character appear different

– to easily distinguish them on computer screens and in documents

Ex. untainted ABCDE ... vs tainted ...

Page 28: Taint Tracking Through UTF Extension

Implementation Details: The Font

• We used Type-Light freeware to modify Window's Courier New font

- installed it by dragging out the original ttf file from the Fonts directory, and dragging in our new ttf file

Page 29: Taint Tracking Through UTF Extension
Page 30: Taint Tracking Through UTF Extension

Implementation Details: The Application

• Has no knowledge of taint• Counts the number of visits of this user

• 1st query to db checks if user’s name is in the db.

If no, then insert name into db and sets visits count to 1

If yes, then increment visits count by 1 in the db

• 2nd query to db outputs the number of visits for the user‘s _name from the db’s record

Page 31: Taint Tracking Through UTF Extension

Implementation Details: The Transparent Protection Framework

We implemented our framework on our typical web application in four different technologies:

1. PHP/Mysql on Apache (under Windows XP)

2. PHP/DB2 on Apache (under Linux) 3. Java Servlet/DB2 on Tomcat7 (under Linux) 4. PHP on Apache (under Linux) calling Java Servlet/DB2

----on Tomcat7 (under Linux) To do this we set the UTF-8 or Unicode encoding option

everywhere it was available, and Courier New as the selected font wherever possible.

Page 32: Taint Tracking Through UTF Extension

Implementation Details: The Transparent Protection Framework

Page 33: Taint Tracking Through UTF Extension

Implementation Details: The Form Page

Page 34: Taint Tracking Through UTF Extension

Implementation Details: The Transparent Protection Framework

Page 35: Taint Tracking Through UTF Extension

Implementation Details: The Request Intercept Wrapper

• Two versions were used: 1. PHP version which uses cURL to interact with the

application 2. Java Servlet version which uses a connection to interact

with the application

• Both versions handled both the post and get requests.

• Browser only sees wrapper's url, never the application page's url

• Both will work with any form, no matter the combinations of controls

Page 36: Taint Tracking Through UTF Extension

Implementation Details: The Transparent Protection Framework

Page 37: Taint Tracking Through UTF Extension

Implementation Details: PHP Application & Db Driver Intercept

• Four applications exist

- essentially the same code with minor variations

• Two Database Driver Intecept Wrappers exist

- essentially the same code with minor variations

- they are php include files

- each file has taint aware functions that wrap the _query and fetch array functions of their respective _databases

Page 38: Taint Tracking Through UTF Extension

Implementation Results: PHP Application & Db Driver Intercept

• Was not totally transparent - application needed modification to specify the

include files, and rename two functions

• But we did successfully: - propagate taint from user input all the way back

to the user output - transparently detect and stop SQL injection - show our method work on different databases and

different operating systems - produce an easy to implement solution to increase

the security of legacy programs

Page 39: Taint Tracking Through UTF Extension

Implementation Results: PHP Application & Db Driver Intercept

Page 40: Taint Tracking Through UTF Extension

Implementation Results: PHP Application & Db Driver Intercept

Page 41: Taint Tracking Through UTF Extension

Implementation Results: PHP Application & Db Driver Intercept

Page 42: Taint Tracking Through UTF Extension

Implementation Details: Java Application

• One application, reachable in two ways

• Has modified String & Character classes that will not break application at ("A").equals(" ") or ('A').equals(' ')

Page 43: Taint Tracking Through UTF Extension

Implementation Details: Java DB2 Database Intercept Wrapper

• Is a collection of custom taint aware classes

• The original ibm.db2.jdbc.app.DB2Driver class is wrapped with our taint aware Db2DriverIntercept class

• We then drill down and also wrap the Connection, PreparedStatement, and ResultSet interfaces and augment their existing methods to provide transparent SQL injection protection

Page 44: Taint Tracking Through UTF Extension

Implementation Results: Java Application & Db Driver Intercept

• Was not totally transparent - application needs to call our driver instead of the

IBM’s database driver

• But we additionally showed that our character level taint method could:

- work on different programming languages (php and java) and paradigms (procedural and OOP)

- propagate between different languages and different servers

- could be handled transparently by modifying Java’s String and Character class operations

Page 45: Taint Tracking Through UTF Extension

Application Breaks & Work Arounds

• Java: the char is a primitiveif ('A'==' ') … is as far as we can keep taint

information accurate Thereafter, taint information is lost no further propagation

- if allowed to alter source code then replace ('A'==' ')with taint aware custom method ('A'.equals(' '))to allow taint to propagate even further within an application.

Page 46: Taint Tracking Through UTF Extension

Application Breaks & Work Arounds

• php: strings are considered primitiveif ("AB"==" ") … is as far as we can keep taint

information accurate Thereafter, taint information is lost no further propagation

- if allowed to alter source code then replace ("AB"==" ") with taint aware custom method (("AB".equals(" "))to allow taint to propagate even further within an application.

NB! If our method were to be adopted universally, the above could be overcome by modifying the JVM or PHP engine

Page 47: Taint Tracking Through UTF Extension

Other Possible Uses of Our Character Level Tainting Method

• Tainting and tracking of multiple input sources– there are a lot of unassigned codepoints– many tainted character sets could be created to

indicate different data sources (ex. keyboard, file, database, remote login, ...)

• Storing tainted characters in log files to make user input immediately recognizable

• Tainted characters can be stored in a database & retrieved via using taint in queries

Page 48: Taint Tracking Through UTF Extension

Other Possible Uses of Our Character Level Tainting Method