13
Journal of Digital Forensics, Journal of Digital Forensics, Security and Law Security and Law Volume 14 Number 4 Article 4 April 2020 Teaching Data Carving Using The Real World Problem of Text Teaching Data Carving Using The Real World Problem of Text Message Extraction From Unstructured Mobile Device Data Message Extraction From Unstructured Mobile Device Data Dumps Dumps Gary D. Cantrell Southern Utah University, [email protected] Joan Runs Through [email protected] Follow this and additional works at: https://commons.erau.edu/jdfsl Part of the Computer Law Commons, Curriculum and Instruction Commons, Educational Methods Commons, and the Information Security Commons Recommended Citation Recommended Citation Cantrell, Gary D. and Runs Through, Joan (2020) "Teaching Data Carving Using The Real World Problem of Text Message Extraction From Unstructured Mobile Device Data Dumps," Journal of Digital Forensics, Security and Law: Vol. 14 : No. 4 , Article 4. DOI: https://doi.org/10.15394/jdfsl.2019.1603 Available at: https://commons.erau.edu/jdfsl/vol14/iss4/4 This Article is brought to you for free and open access by the Journals at Scholarly Commons. It has been accepted for inclusion in Journal of Digital Forensics, Security and Law by an authorized administrator of Scholarly Commons. For more information, please contact [email protected]. (c)ADFSL

Teaching Data Carving Using The Real World Problem of Text

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Journal of Digital Forensics, Journal of Digital Forensics,

Security and Law Security and Law

Volume 14 Number 4 Article 4

April 2020

Teaching Data Carving Using The Real World Problem of Text Teaching Data Carving Using The Real World Problem of Text

Message Extraction From Unstructured Mobile Device Data Message Extraction From Unstructured Mobile Device Data

Dumps Dumps

Gary D. Cantrell Southern Utah University, [email protected]

Joan Runs Through [email protected]

Follow this and additional works at: https://commons.erau.edu/jdfsl

Part of the Computer Law Commons, Curriculum and Instruction Commons, Educational Methods

Commons, and the Information Security Commons

Recommended Citation Recommended Citation Cantrell, Gary D. and Runs Through, Joan (2020) "Teaching Data Carving Using The Real World Problem of Text Message Extraction From Unstructured Mobile Device Data Dumps," Journal of Digital Forensics, Security and Law: Vol. 14 : No. 4 , Article 4. DOI: https://doi.org/10.15394/jdfsl.2019.1603 Available at: https://commons.erau.edu/jdfsl/vol14/iss4/4

This Article is brought to you for free and open access by the Journals at Scholarly Commons. It has been accepted for inclusion in Journal of Digital Forensics, Security and Law by an authorized administrator of Scholarly Commons. For more information, please contact [email protected].

(c)ADFSL

TEACHING DATA CARVING... JDFSL V14N4

TEACHING DATA CARVING USING THEREAL-WORLD PROBLEM OF TEXT

MESSAGE EXTRACTION FROMUNSTRUCTURED MOBILE DEVICE DATA

DUMPSDr. Gary Cantrell1 and Joan Runs Through2, ED.S., M.S.

1Southern Utah UniversityComputer Science and Information System

[email protected] State University

Digital Forensics Crime [email protected]

ABSTRACTData carving is a technique used in data recovery to isolate and extract files based on filecontent without any file system guidance. It is an important part of data recovery and digitalforensics. However, it is also useful in teaching computer science students about file structureand the binary encoding of information, especially within a digital forensics program. Thiswork demonstrates how the authors teach data carving using a real-world problem theyencounter in digital forensics evidence processing involving the extracting of text messagesfrom unstructured small device binary extractions. The authors have used this problem forinstruction in digital forensics courses and other computer science courses.

Keywords: Mobile Forensics, Small Device Forensics, Mobile Triage, Digital Triage, DataRecovery, Data Carving, Binary Files.

1. INTRODUCTION

Data carving is the search and extraction oflost files based on internal file structure orcontent instead of external file system meta-data. This process is often necessary whenthe file system has been damaged, corrupted,or reformatted. It is also essential to datarecovery when the extraction of digital mem-ory leaves the technician with a binary imagethat does not have a file system to assist in

isolating individual files within that image.Data carving is an important topic in a digitalforensics course or any technology course thatdiscusses data recovery. It can also be usedin any computer science course to facilitateunderstanding of file structure and binaryfile IO. This manuscript presents a methodused by the authors to teach data carving toundergraduate and law enforcement studentsusing a real-world problem encountered in adigital forensics laboratory.

c© 2020 JDFSL Page 1

JDFSL V14N4 TEACHING DATA CARVING...

The foundation of the problem introducedto the students is that of mobile triage andtext message extraction. Short Message Ser-vice files, often referred to as SMS or textmessages, do not have a standardized fileformat or an easy to predict structure; there-fore, automated tool extraction is limited.Using this real-world problem as a basis forinstruction demonstrates that learning howa technique works is more important thanlearning about tools that perform a techniquewith little user direction.

The remainder of this manuscript willpresent the information in the same orderused to instruct undergraduate students. Itis presented as a real-world problem and solu-tion as commonly encountered by the authors.Presenting this information in this mannerhelps undergraduates not only understanddata carving but comprehend how it couldbe used in real-world situations.

2. SMALL DEVICETRIAGE PROCESS

In addition to digital forensic education, theauthors of this work specialize in the forensicanalysis of mobile devices for law enforcementagencies, including the direct extraction ofthe data on the NAND memory chip oftencalled chip off forensics. The end result of thechip off forensics process, when performed ondevices programmed with proprietary or un-supported operating systems, is an unstruc-tured physical memory dump. The processpresented in the classroom is the method usedby the authors in the recovery and presenta-tion of SMS and MMS text messages fromthese unstructured physical memory dumps.This process involves:

• Isolating potential messages.

• Filtering out redundant and meaninglessdata.

• Applying client feedback to identify andverify message threads of importance.

This method is applicable when no file systemsupport or SMS/MMS format is available. Inother words, this activity demonstrates howdata carving works without the blind aid ofautomated tools. This simple model demon-strates data carving at a base level and can beadapted to any type of electronically storedinformation stored in an unsupported file sys-tem or file format. In the digital forensicsarena, this process would be classified as atriage process due to the limitations inherentin data carving. The resulting SMS messageshave no file system assigned time stamp, noinformation is available about their storagestructure, and this technique does not re-verse engineer the file system in any way. Itsprimary purpose is for quick extraction andanalysis without file system support. Digitaltriage techniques are those techniques appliedto live or dead systems either on-scene forquick intelligence or in-house for evidenceevaluation (Cantrell et al. 2012). Digitaltriage concentrates on finding the most use-ful information in the least amount of time.Mobile device triage is often discussed as aneed during search and seizure (Richard IIIet al. 2005; Walls et al. 2011). However,the techniques discussed in this work wouldnot be useful for on-scene analysis. There-fore, the authors describe this method as adigital triage technique for in-house evalu-ation. Since it is a digital triage technique,there are several things that should be clearlyexplained to students. First, the resulting in-formation should be used cautiously, if atall, in full court proceedings. It does notalways produce timestamps and ownershipinformation. Secondly, there is no reverseengineering of the file formats or file systemattempted. The information is gathered byisolating potential messages without consid-ering any file or file system format. Thirdly,

Page 2 c© 2020 JDFSL

TEACHING DATA CARVING... JDFSL V14N4

an additional evaluation would likely be nec-essary if the extracted information is neededfor use in court proceedings to confirm thevalidity of the extracted information. Finally,this method, although not practical in thefield, is intended to gather as much infor-mation as possible in the least amount oftime for in-house evaluation. For class pur-poses, pre-owned phones can be purchasedfrom “the wild.” For example, two websites forpurchasing used phones are www.ebay.comor www.shopgoodwill.com. Phone image ex-tractions were made with software/hardwareforensic tools already in house for evidence ex-amination or instructional purposes. Imageswere scanned for adult content to preventexposure of such material in the classroom.However, a classroom disclosure is highly rec-ommended as initial scans can fail, and it isnearly impossible to prevent adult languageexposure from phones collected this way.

3. TEXT MESSAGEEXTRACTION,

FILTERING ANDPRESENTATION

Since this technique is utilizing data carving,it is assumed that we are starting with a bi-nary extraction without a file system. Therewill be no individual files to be examined.Thus, the first step in this process is to iso-late possible text messages from within thephysical image. The goal is to create a fileper text message. This will allow for easierfiltering and report generation later. Thestandard data carving method is to identifywhere a file was once stored by searchingfor the first few bytes of known standardfile formats as a header and then to extractbytes until a footer is found, a predeterminedlength is reached, or the data no longer ap-pears valid. Individual text messages, by

their nature, are predictably short when com-pared to other communications. Instead ofsetting a predefined footer, setting a 200 -400 byte limit is generally sufficient to ex-tract the entire message. Unfortunately, mosttext messages do not have a standardizedheader. Often, the found text messages aresimply proprietary database fragments in un-allocated space. Therefore, it is necessary tomanually determine a binary marker to usefor carving. This can be accomplished usingany raw file viewer that allows for searchingor the Linux command line tools such as xxdand grep. These tools allow a technician orstudent to view the file as the file is stored onthe disk independent of the file type. Thesetools display the file as hexadecimal valuesand typically provide a window for showingthe interpretation of those values as text char-acters. Text message isolation is performedby carving out each individual message basedon a discovered binary marker. This can re-sult in a significant number of files, includingpartial and duplicate messages. However,with the use of some simple tools the result-ing files can be easy to organize and filterfor useful information. For more advancedcomputer science students, this carving ap-plication can be generated by the studentas an assignment involving binary file IO.For more introductory level courses or morevocational law enforcement training wherestudents have not yet mastered a programinglanguage, any existing data carver that willallow user defined headers and footers canbe used. The open source file carver Scalpelis a useful example (Richard III et al. 2005).Its included configuration file contains manystandard file formats from which users canuse for file carving. It also allows the user toadd carvers of their own design by editing theconfiguration file. Newer versions also allowfor the use of regular expressions instead ofset values for headers and footers. Regularexpressions can be especially useful with text

c© 2020 JDFSL Page 3

JDFSL V14N4 TEACHING DATA CARVING...

messages as they allow for the use of a phonenumber as a marker for data carving eitheras a header or footer depending on the cellphone make/model. Step one in this modelis to identify a binary marker common toall text messages located within the binarydump in question. This marker will likely beunique to the make/model of the particulardevice from which the binary dump was ex-tracted. The subject binary dump, or rawimage, can be loaded up in any hex editorthat will handle large files. One such examplewould be the Windows tool FTK Imager byAccessData. This tool is provided free andis useful for this purpose. In the Linux envi-ronment, the basic commands xxd and grepsuffice for image searching. For classroominstruction, the authors use both a graphicaltool and Linux command line depending onpreference and level of technical expertise ofthe class.

If the text of the SMS/MMS messages arenot in standard ASCII, then an additionaleffort is required, and the process changesfrom triage to analysis, and this model is nolonger applicable. For example, some phonesuse the PDU format or 7-bit GSM formatfor storing text messages. This involves a7-bit shifting alphabet that is not interpretedby most hex editors, and thus it will not besearchable (Henry-Labordere 2004). How-ever, most of the phones examined by theauthors store text messages in plain ASCII.Identifying text messages within the unstruc-tured data dump can be done with simpleword searches. Knowing the phone detailsis useful for constructing word search lists.However, common words and acronyms areoften sufficient. Some examples include: lol,lmao, dude, hi, love you, etc. It is also oftenuseful to search for common expletive words.The longer and more unique the word, theless chance for false positives, but this lowersthe chance of finding any hits at all. At thisstep the user is trying to locate a sample set

of text messages to analyze as opposed to alltext messages on the raw image.Once several text message examples have

been identified, it is useful to print a hardcopy of some of the results including both hex-adecimal and ASCII and to examine this hardcopy with highlighters in hand in an attemptto identify markers for carving. The authorshave had consistent success with this processas an in-class activity. Once the marker hasbeen identified in the hard copy, students areled through an activity using the Linux com-mands xxd, grep, and wc to get an estimateof how often the marker appears.xxd “image name” | grep “keyword” | wcIf keyword searches fail, it is necessary to

escalate and determine if no ASCII or Uni-code messages exist. A phone should notbe dismissed from this method until it hasbeen tested in this way. A program that fil-ters binary images for text is useful in thisdetermination. For example, the commandline program Strings effectively pulls ASCI-I/Unicode from a binary dump. Dependingon instructor preference, the analysis can bedone from a Linux prompt or open sourceversions of Strings are available for Windows.The command filters through a binary imageand extracts all consecutive text characters ofa predefined length. The default length is typ-ically 4 characters, but the user should con-firm with the program documentation. Forexample, the following command will createan output file containing all characters thatoccur in groups of 4 or more: strings “imagefile” > out.txt Commands can vary dependingon the version of Strings being used. Withthe phones tested four consecutive characterstypically results in a very large output file.The consecutive character length can be ad-justed by the user during execution to reducethe output. For example: string -n 10 “imagefile” > out.txtwill return all characters that occur in

groups of 10 or more and store them in

Page 4 c© 2020 JDFSL

TEACHING DATA CARVING... JDFSL V14N4

out.txt. Setting this value too low will re-sult in more false positives, and setting thisvalue too high might cause it to miss someinformation of interest. Utilizing the flag -ocan also be helpful as this causes Strings tooutput the raw image offset along with thetext string allowing the user to more easilyidentify that memory location.Using Strings is a very useful method for

extracting information and deciding if thereis information worth parsing, but the outputcan be formidable to work through. UsingStrings for text message extraction or search-ing is a last-ditch effort. In this process, it isused to confirm the absence of messages. Ab-sence, in this case, could be due to encryption,non-ASCII/non-UNICODE message format,or simply lack of text messages present. Thiscan also be performed prior to searching formessages as a check to determine if the imagecontains useful information.Assuming that the keyword search does

not fail, the student is able to identify mark-ers for carving by examining the areas aboveand below the extracted sample as the ma-chine language/file format coding used by thefile system to store these files leave residualand identical hex patterns prior to each SMSmessage. These markers can then be usedto code a solution or used within Scalpel orany other customizable carver to extract eachmessage into a separate file.Image 1 and Image 2 are screenshots of

individual files carved in this manner froma chip off extraction performed on a ZTEx501 Cricket phone. (Personal informationhas been redacted in these images for legalpurposes.) This is a phone model commonlyused by the authors in class. This techniqueis especially useful on feature phones mar-keted by pay-as-you-go providers such as theZTE x501. Phones of this type are knownas disposable or burner phones. They oftendo not have working data ports, and directbinary extraction from the NAND memory

chip (chip off forensics) is the only option toretrieve a binary image, which often resultsin an image that has to be carved for data.

Figure 1. Hex editor view of ZTE x501 carvedtext message

Figure 2. Hex editor view of ZTE x501 carvedtext message)

In the case of this example, no header wasidentified, but as is common, each text mes-sage includes a phone number, and in thiscase, the phone number occurs at the topof the message. This was entered into theScalpel configuration file as:

SMS y 400 /[0̂-9][2-9][0-9] [0-9]-[2-9][0-9][0-9]-[0-9][0- 9][0-9][0-9][0̂-9]/

This line tells Scalpel to carve using a U.S.format phone number, represented by a reg-ular expression, as a header, and to carveout 400 bytes at a time. No footer was usedin this example. However, the reverse canalso be performed using the phone numberas a footer and a hex pattern as a header incases where the phone number comes afterthe message. For this example, students re-cover approximately 400 individual hits. Themajority (85%) of these are false hits andduplicates. Having a vast majority of recov-ered messages being not useful is not uncom-mon in real practice. Depending on the class,some simple filtering can be explored thatgreatly reduces the results. If students codethe application themselves, this problem canbe introduced as a next step. If the studentshave not mastered a programming languageyet, a simple script can be demonstrated that

c© 2020 JDFSL Page 5

JDFSL V14N4 TEACHING DATA CARVING...

eliminates all duplicates, and all files that didnot contain genuine messages. In this exam-ple, the introduced script reduces the resultsdown to 60 individual files. Further manualexamination brings the amount down to 30unique messages. It is important to incorpo-rate scripting and intelligent searching at thisphase and teach students to work smarter,not harder. Prior to the development of phys-ical processes such as chip off extraction andJTAG, text messages were often recorded byinvestigators who took pictures of the phonescreen. A report compiled using this methodon the same phone prior to memory extrac-tion is shown to the class to demonstratethat not all of these messages were availablethrough the cell phone interface, indicatingsome of the messages recovered were deletedmessages. This adds value to the methoddemonstrated to the students and empha-sizes that it is useful even if it is not the onlyoption available. As previously mentioned,phones used in class are purchased used fromvarious sources such as eBay and Goodwill.The phone models can be chosen based onthe applicability of this technique. For in-stitutes that do not have access to chip offequipment, there are software methods forobtaining images that do not require the di-rect data extraction from the NAND chips.The steps introduced as a triage type processand are especially useful when the primarygoal is to determine if the phone is worthadditional effort. It is important to empha-size this is triage, not forensic analysis. Nofile system interpretation is performed andcertain assumptions are made about the data.The steps introduced as a triage type processand are especially useful when the primarygoal is to determine if the phone is worthadditional effort. It is important to empha-size this is triage, not forensic analysis. Nofile system interpretation is performed andcertain assumptions are made about the data.

1. Searching for keywords to identify mes-sage examples within the raw file (if nokeywords are found Strings can be usedto verify the absence of messages).

2. Identifying markers that can be used asheaders and/or footers.

3. Carving out individual messages usingthese markers with an automated carv-ing tool or student developed applica-tion.

4. Filtering the resulting files based for du-plicates and non-interesting output.

The recovered individual files can easily becombined into a basic report using Strings toextract all text content, echo to write a line ofasterixis between each record, and redirect tocombine the outcome into a text document.The development of this command is part ofthe in-class work of the students.In a classroom setting, the authors intro-

duce a short script to combine all files into asingle output file with each line representingthe text of a single file. This results in a verybasic report containing one message per line.In an effort to generate a more useful report,it is advisable to also recover the time themessage was sent/received, if possible. Thefollowing section will discuss this importantadditional step.

4. TIME STAMPIDENTIFICATION

Section 3 discusses a method for basic datacarving. Classroom instruction could stopprior to time stamp identification if demon-strating the mechanics of data carving is theonly goal. Instruction could also logicallymove from the recovery of ASCII to the ex-traction of non-text-based files and the moreadvanced techniques used by automated tools.However, if this process is being introduced

Page 6 c© 2020 JDFSL

TEACHING DATA CARVING... JDFSL V14N4

as part of a digital forensics course or theinstructor wants to complete the real-worldscenario, the next logical step would be toextract additional information to chronolog-ically order and corroborate the messageswith the case. This means finding and in-terpreting the message timestamp if present.This section will discuss that process. Times-tamp information, although important addsa more complicated level of analysis. Times-tamps are typically stored as the number oftime intervals from a specific date commonlyreferred to as an epoch. For example, Unixtime, also called POSIX time, is the numberof seconds since January 1st, 1970. Win-dows time is stored as the number of 100nanosecond intervals since January 1st, 1601.Mobile devices, which often utilize their ownproprietary file systems, use the epoch ortime interval of their choosing. For example,the Brew file system (an open source pseudooperating system) utilizes a time epoch ofJanuary 6, 1980. Other systems, such asmany Samsung devices, use a FAT file sys-tem derivative and fall back on a January1, 1980 time epoch (Henry-Labordere 2004).This can complicate hex value interpretations.Table 1 lists some sample time epochs forknown systems to demonstrate this obstacle.The authors often use the same method forlocating timestamps as they do for identify-ing binary markers. Print out hard copiesof the found samples and use highlighters toexamine the printouts. The key is observingchanging patterns. Logically the timestampshould occur in at least close to the samebinary offset within each message and shouldchange only slightly between text messages.

The timestamp information can be storedin little-endian-order or big-endian order.This means the hexadecimal values observedin the editor will be read left to right or rightto left, depending on how the file systemstores the timestamp. This is a common com-plication in file analysis when a file map is not

Figure 3. Timestamp epoch samples withassociated systems

available. This ordering is most commonlyat the byte level, which is represented by twohexadecimal values per byte in hexadecimaleditors (although it can be in reverse nibblewhich would reverse the entire hexadecimalsequence). For example, the date October31st, 1996, 11:00:00 AM would be stored as846759600 seconds in POSIX time. In a hexeditor, it would look like 0x327886B0 in bigendian and 0xB0867832 in little endian. Notethe pattern is reversed in sequences of two.Once the ordering is understood, the con-version can be done within a variety of hexeditors. Many open source tools and Webpages will also perform these conversion taskseasily. It is also an interesting problem toprovide as an assignment for more advancedstudents.

c© 2020 JDFSL Page 7

JDFSL V14N4 TEACHING DATA CARVING...

The pattern to look for within a hex editoris a number as described that does not changemuch from message to message. The lengthwill vary depending on the time unit used,but seconds stored in 4 bytes is very com-mon. Time intervals between text messagesas conversations will not span long periodsof time. The previous example was based inthe year 1996. 1996 in POSIX time wouldstart with 820454400 seconds and end with852076799 seconds. These are hex strings0x30E72400 and 0x32C9A8FF respectively.Within the entire year, the highest order hexvalues go from 0x30 to 0x32. The month ofOctober would be 844128000 seconds through846806399 seconds. These hex strings wouldbe 0x32505F00 and 0x32793D7F respectively.In this case, the highest order byte is thesame for the entire month 0x32. The closerthe text messages are in time the more higherorder values will be the same. Searching forand identifying timestamps is a great in classexercise and demonstrates concepts withinfile structure and time stamp storage.

As part of the real-world scenario, it shouldbe explained that unless documentation canbe found and cited that describes how times-tamps are stored on the system under ex-amination, then there should always be adisclaimer included with the reported results.This disclaimer should explain that the times-tamps displayed are only an estimate andneed to be confirmed, or the method used toconfirm their accuracy should be presented.As this is used as a triage type method, thisis an acceptable compromise as long as therequesting agent understands the limitationsof timestamp provided. Composing such adisclaimer can and should be included aspart of the instruction. For example, Im-age 3 and Image 4 are the same as Image1 and Image 2, respectively, with the per-ceived timestamps underlined. Image 3’stimestamp is 0x3E2BE48D, which translatesto a date of Mon Jan 20 04:59:09 2003 in

POSIX time, and image 4’s timestamp is0x3E25CA47 which translates to date of WedJan 15 13:53:27 2003 in POSIX time. (Itshould also be noted that the timestampsdo not occur at exactly the same offset caus-ing difficulty with automated extraction.)

Figure 4. Hex editor view of ZTE x501 carvedtext message with timestamp underlined

Figure 5. Hex editor view of ZTE x501 carvedtext message with timestamp underlined

Once found, these dates can be corroboratedwith other messages. In the absence of filesystem corroborative information, the follow-ing three questions should be asked by theexaminer/student:

1. Could the timestamps be a product ofrandom chance, or do the timestampsoccur at the same or similar offsets?

2. Do ordering the messages chronologicallyby this timestamp appear to produce alogical conversation?

3. Do the timestamps found provide rea-sonable dates based on the make/modelof the phone and the case details?

In this example question 3 fails. The year2003 is not reasonable based on the case in-formation and phone type. The make andmodel of the device examined (ZTE X501)was not approved for use by the FCC untilFebruary 24, 2012, and therefore was not inuse in the year 2003. However, questions 1and 2 indicate that the time interval is rea-sonably correct; therefore, the epoch is the

Page 8 c© 2020 JDFSL

TEACHING DATA CARVING... JDFSL V14N4

likely problem. As stated earlier, the Brewfile system, commonly used on many CDMAfeature phones, uses seconds as time inter-vals, but the epoch is 10 years and 5 daysafter the Unix epoch, which was the epochoriginally used in this example. Advancingthe timestamp in this example by the Brewoffset moves the dates to a reasonable rangein the year 2013.Even if not exact, any timestamps recov-

ered can be useful. They can be used forthe chronological ordering of messages, andif the phone is still in working order, compar-ing the timestamps of a few raw text mes-sages to their counterparts as viewed on thelive device can reveal the correct adjustmentnecessary to make the recovered timestampsaccurate. Once the epoch offset is obtained,time stamps can be more accurately reportedtaking care that the adjustments considerleap years and other caveats. This activityemphasizes that timestamps should be usedwith great care when presenting the results tothe requesting agent. When presented withdata obtained through this triage model, therequesting agent must be made to understandthat the provided timestamps must be con-firmed prior to use in court. The importanceof exact timestamp values is dependent onthe case under evaluation, but it is vital thesetimestamps are not referred to as the correcttime unless it has been validated, and eventhen, they should be reported as computedtimestamps, not actual timestamps.

5. PROCESSAUTOMATION

The lengthier part of this method is the exam-ination of the individual files created duringthe carving process. The example presentedproduced 500 individual files that were even-tually reduced to 30 actual messages. Thisis not necessarily typical. The number ofindividual messages can range from a few

hundred to thousands. The quantity of mes-sages found on a device is dependent on thesize of the memory on the phone, the op-erating system installed on the device, thetendency of the user to write and receiveSMS, and the markers used for carving thedata. Extracting corrupted files and dupli-cates is a common problem with any type ofdata carving and should be discussed withstudents. Recovering the data is only partof the solution. Practitioners, whether doingdata recovery or digital forensics, need to con-sider the client’s needs. Presenting the clientwith an overwhelming number of files to sortthrough is not good practice, and providinga simple report with just text messages andno timestamps may not be sufficient.

While processes such as finding the mark-ers, carving out the data, and extracting thetimestamp can be performed manually, timeconstraints hold that it is better if the filter-ing, timestamp extraction, and reporting areautomated. The authors typically demon-strate this automation with the use of Perlscripts. Perl was chosen for its strong regularexpression ability that can facilitate the fil-tering of messages (Christiansen et al. 2012).However, any scripting or programming lan-guage with which the instructor is familiarshould suffice. Published examples exist ofsimilar uses for scripting during digital foren-sics (Garfinkel 2009; Cantrell et al. 2013).The authors provide sample scripts and teachthis process during academic, collegiate, andlaw enforcement training. Including how tocreate these automation scripts here is be-yond the scope of this work.

How much scripting to include in the class-room environment is dependent on the tech-nical level of the class. Scripts can be intro-duced as complete with the instructions onhow to run the script, or scripts can be pro-vided with the instruction that they need tobe adapted or edited to fit the subject binary

c© 2020 JDFSL Page 9

JDFSL V14N4 TEACHING DATA CARVING...

dump, or scripts can be created from scratchby the students.During script execution, each file is ac-

cessed; a check is run to see if it has beenencountered previously; some simple tests arerun to see if it is a valid message. The out-put is then filtered for ASCII characters, anddivided into at least phone number and mes-sage, but can also include incoming/outgoing,timestamp, and to/from columns. This out-put is then sent to a comma separated valuefile that can be further manually edited forreadability with a spreadsheet program.Optionally timestamps are extracted and

converted to human readable dates. This isa simple matter if they occur in predictableoffsets, but as shown in the previous example,this is not always the case. As also discussed,the higher order values should fall within apredictable range once identified. For exam-ple, as already described, if one was lookingfor timestamps within the year 1996, thescript could be written to scan for the hexvalues 0x30, 0x31, and 0x32. This can alertthe program that the following, or the previ-ous, depending on ordering, 3 bytes can becombined with this byte to form a plausibledate. In the case of reverse nibble, the searchwould simply be reversed to 0X03, 0x13, and0x23.

The narrower the user can make the search,the less chance of a false positive. However,being such small files, the probability of thisrandom occurrence is reduced, leading to fewor no false positives. Of the phones evaluatedby the authors at present, few have producedfalse positives due to random chance whensearching in this manner.Automating the process is introduced as

adaptive scripting, not as complete softwaredevelopment. In the experience of the au-thors, it is more useful to have multiple ad-justable scripts than to try and create a singlerobust application. The variety of phones anexaminer will see in the lab, combined with

constantly changing technology, makes it dif-ficult for one application to suffice.

6. AGENT FEEDBACKThis process has been described numeroustimes in this manuscript as a triage method.As such, it is presented to the client as an in-formative report only. It is vital that studentsinclude disclaimers and present the informa-tion with this made clear. Part of the processis to present the information to the client. Ifthe client wishes to use the information ascourt admissible evidence, further analysiscan be done. In the lab, the authors typicallyrequest that the client mark any messagesof importance, and further analysis can bedone if needed. For example, further analysiscould be the reverse engineering of the phoneby purchasing a like phone, planting mes-sages at known times, and extracting themfor analysis.

It is the experience of the authors that thisis seldom the case. Even if the triage reporthas limits as court admissible evidence, it stillserves as good intelligence. Often the clientwill not come back for further analysis eitherbecause they learn the phone text messagesdo not contain useful evidence or the infor-mation suffices as intelligence only. In eithercase, the triage report often suffices freeingup valuable lab time for the examination ofother devices.

7. DIGITAL TRIAGEMETHOD SUMMARY

As a small device digital triage method, theentire process presented is as follows:

1. Text messages searched for using key-words.

2. Text message are evaluated for markers.

Page 10 c© 2020 JDFSL

TEACHING DATA CARVING... JDFSL V14N4

3. Text messages are isolated into individ-ual files using custom carvers.

4. Individual files are filtered for duplicatesand viable content.

5. Timestamp evaluation is performed.

6. Report is compiled.

7. Report is presented to the requestingagent for feedback.

8. Further evaluation is done if requested.

As a classroom exercise steps 7 and 8 willlikely not occur, but as a real-world problemit should be made clear these steps are nec-essary. Triage does not result in admissibleevidence, but the information presented canbe used to determine if further analysis iswarranted; it can be used in establishing aplea-bargain; it can be used for quick intelli-gence in emergency situations.

8. CONCLUSION ANDREFLECTION

Data carving is the extraction of useful in-formation within unstructured binary data.It is commonly used for data recovery. Thismanuscript presented a real-world problemused by the authors to teach how data carv-ing works. At several points, the authorstried to present the options as to how aneducator might take the problem further byrequiring students to develop their own fullapplication or simple scripts. Coding detailswere left out of this manuscript to preventstudents from using this paper as a solutionguide.

The authors have used this real-world sce-nario to teach data carving in both digitalforensics based courses and more traditionalcomputer science courses. They have ob-served that the use of a real-world problem

increases student interest and participation.In fact, the authors have been able to applystudent innovation to their digital forensicspractice and develop more elegant and effi-cient solutions for actual case work.It is the hope of the authors of this

manuscript that this teaching method will beof use to other digital forensics and computerscience educator.

REFERENCES[1] Australia, School of Computer and In-

formation Science. Mawson Lakes: Uni-versity of South Australia.

[2] Mislan, R.P., Casey E., and Kessler, G.C.(2010), “The growing need for on-scenetriage of mobile devices,” Digital Inves-tigation, vol. 6, no. 3-4, 2010, pp. 112 –124.

[3] Richard III, G. and Roussev, V, (2005),“Scalpel: A frugal, high performancefile carver,” Digital Forensics ResearchWorkshop, New Orleans, LA.

[4] Walls, R., Levine, B, and Learned-Miller, G. (2011), “Forensic triagefor mobile with DEC0DE" USENIXSymposium (2011). Available at:works.bepress.com/erik_learned_miller/52

[5] Zimmermann, C., Spreitzenbarth, andM, Schmitt, S., (2011), Reverse En-gineering of the Android File System(YAFFS2). Technical Report CS-2011-06, Friedrich-Alexander-University ofErlangen-Nuremberg.

[6] Breeuwsma, M., de Johngh, M., Klaver,C., van der Knijff, R., Roeloffs, M.(2007). Forensic Data Recovery fromFlash Memory. Small Scale Digital De-vice Forensics Journal, 1 (1), 1-17.

c© 2020 JDFSL Page 11

JDFSL V14N4 TEACHING DATA CARVING...

[7] Cantrell, G. and Dampier, D. (2013),“Implementing the automated phases ofthe partially-automated digital triageprocess model”, Journal of Digital Foren-sics, Security and Law, Vol 7, No 4.

[8] Cantrell, G., Dampier, D., Y. Dandass,Niu, Y., and Bogen, C. (2012), “Re-search Toward a Partially-automated,and Crime Specific Digital Triage Pro-cess Model,” Computer and InformationScience, vol. 5, no. 2, pp. 29–38.

[9] Christiansen, T. D Foy, B., Wall, L. andOrwant, J. (2012), “Programming perl:Unmatched power for text processingand scripting Fourth edition,” O’ReillyMedia, Sebastopol, CA.

[10] Garfinkel, S. L. (2009). AutomatingDisk Forensic Processing with SleuthKit,XML and Python. Systematic Ap-proaches to Digital Forensic Engineering,2009, (pp. 73-84).

[11] Henry-Labordere, A. (2004), “SMS andMMS interworking in mobile networks,”Artech House, Norwood, MA.

[12] Gilbert, Adam M., Hunt,nd Winch, Ken-neth C., (1997,

[13] Lessard, J. and Kessler, G. (2010), “An-droid forensics: Simplifying cell phoneexaminations,” Small Scale Digital De-vice Forensics Journal. Vol. 4, No. 1.

[14] McCarthy, P. (2005). Forensic Analysisof Mobile Phones. University of South

Page 12 c© 2020 JDFSL