9
Forensic analysis of video le formats Thomas Gloe a, * , André Fischer a , Matthias Kirchner b a dence GmbH, Dresden, Germany b University of Münster, Department of Information Systems, Münster, Germany Keywords: Digital forensics File format Metadata Video Authentication abstract Video le format standards dene only a limited number of mandatory features and leave room for interpretation. Design decisions of device manufacturers and software vendors are thus a fruitful resource for forensic video authentication. This paper explores AVI and MP4-like video streams of mobile phones and digital cameras in detail. We use customized parsers to extract all le format structures of videos from overall 19 digital camera models, 14 mobile phone models, and 6 video editing toolboxes. We report considerable differ- ences in the choice of container formats, audio and video compression algorithms, acquisition parameters, and internal le structure. In combination, such characteristics can help to authenticate digital video les in forensic settings by distinguishing between original and post-processed videos, verifying the purported source of a le, or identifying the true acquisition device model or the processing software used for video processing. ª 2014 The Authors. Published by Elsevier Ltd on behalf of DFRWS. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/3.0/). Introduction Methods to verify the authenticity of media data are of growing relevance in our digital world. While most con- sumer devices lack practical authentication support at all, attacks against professional camera authentication systems have demonstrated weaknesses of existing in-device solu- tions. 1 Forensic techniques to infer the provenance and the processing history of media les ex post have thus gained more and more interest among researchers and practi- tioners. Forensic image analysis has been the main driver of the eld (Sencar and Memon, 2013), but also video les have recently been brought to the forefront (Milani et al., 2012b). Resembling the evolution of digital image foren- sics, an already ample body of literature approaches the problem of video forensics through the analysis of inherent device characteristics or processing artifacts in the video data (Chen et al., 2007; Wang and Farid, 2007; Hsu et al., 2008; Conotter et al., 2011; Stamm et al., 2012; Vázquez- Padín et al., 2012, amongst others). File format information and metadata are another source of forensic evidence, but have generally received less attention. Existing works mainly focus on digital images. Here, basic JPEG (ISO/IEC 10918-1, ITU-T Recommendation T.81, 1992) and EXIF (Japan Electronics and Information Technology Industries Association, 2002) metadata proper- ties have gained major interest. Digital cameras and image processing software (or groups thereof) use customized quantization tables. Differences therein can narrow down the source device of a questioned image (Farid, 2008; Kornblum, 2008). Characteristics of thumbnail images (often saved as JPEG images themselves) have been reported to be another pool of forensically relevant features (Murdoch and Dornseif, 2004; Kee and Farid, 2010). In one of the most elaborate approaches, Kee et al. (2011) combine image and thumbnail compression parameters, image and thumbnail dimensions and the number of EXIF entries into signatures of camera model or processing software cong- urations. By testing against a reference database, images of unknown or uncertain provenance can be attributed to a * Corresponding author. E-mail addresses: [email protected] (T. Gloe), matthias.kirchner@ uni-muenster.de (M. Kirchner). 1 http://www.elcomsoft.de/nikon.html http://www.elcomsoft.de/ canon.html. Contents lists available at ScienceDirect Digital Investigation journal homepage: www.elsevier.com/locate/diin http://dx.doi.org/10.1016/j.diin.2014.03.009 1742-2876/ª 2014 The Authors. Published by Elsevier Ltd on behalf of DFRWS. This is an open access article under the CC BY-NC-ND license (http:// creativecommons.org/licenses/by-nc-nd/3.0/). Digital Investigation 11 (2014) S68S76

Forensic Analysis of Video Formats

  • Upload
    axyy

  • View
    406

  • Download
    1

Embed Size (px)

DESCRIPTION

Forensic Analysis of Video Formats - stalni sudski vještak za informatiku i telekomunikacije dr.sc. Saša Aksentijević

Citation preview

  • ats

    rchn

    ny

    andartion.esourams ol le

    y of mrld.ticatiouthenxistin

    problem of video forensics through the analysis of inherentdevice characteristics or processing artifacts in the video

    ameras and imagef) use customizedcan narrow downage (Farid, 2008;thumbnail imageshave been reported

    to be another pool of forensically relevant features(Murdoch and Dornseif, 2004; Kee and Farid, 2010). In oneof the most elaborate approaches, Kee et al. (2011) combineimage and thumbnail compression parameters, image andthumbnail dimensions and the number of EXIF entries intosignatures of camera model or processing software cong-urations. By testing against a reference database, images ofunknown or uncertain provenance can be attributed to a

    * Corresponding author.E-mail addresses: [email protected] (T. Gloe), matthias.kirchner@

    uni-muenster.de (M. Kirchner).1 http://www.elcomsoft.de/nikon.html http://www.elcomsoft.de/

    canon.html.

    Contents lists available at ScienceDirect

    Digital Inve

    journal homepage: www.e

    Digital Investigation 11 (2014) S68S76more and more interest among researchers and practi-tioners. Forensic image analysis has been themain driver ofthe eld (Sencar and Memon, 2013), but also video leshave recently been brought to the forefront (Milani et al.,2012b). Resembling the evolution of digital image foren-sics, an already ample body of literature approaches the

    ties have gained major interest. Digital cprocessing software (or groups thereoquantization tables. Differences thereinthe source device of a questioned imKornblum, 2008). Characteristics of(often saved as JPEG images themselves)tions.1 Forensic techniques to infer the provenance and theprocessing history of media les ex post have thus gained

    T.81, 1992) and EXIF (Japan Electronics and InformationTechnology Industries Association, 2002) metadata proper-attacks against professional camera ahave demonstrated weaknesses of eIntroduction

    Methods to verify the authenticitgrowing relevance in our digital wosumer devices lack practical authenhttp://dx.doi.org/10.1016/j.diin.2014.03.0091742-2876/ 2014 The Authors. Published by Elsevcreativecommons.org/licenses/by-nc-nd/3.0/).help to authenticate digital video les in forensic settings by distinguishing betweenoriginal and post-processed videos, verifying the purported source of a le, or identifyingthe true acquisition device model or the processing software used for video processing. 2014 The Authors. Published by Elsevier Ltd on behalf of DFRWS. This is an open accessarticle under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/3.0/).

    edia data are ofWhile most con-n support at all,tication systemsg in-device solu-

    data (Chen et al., 2007; Wang and Farid, 2007; Hsu et al.,2008; Conotter et al., 2011; Stamm et al., 2012; Vzquez-Padn et al., 2012, amongst others).

    File format information and metadata are another sourceof forensic evidence, but have generally received lessattention. Existing works mainly focus on digital images.Here, basic JPEG (ISO/IEC 10918-1, ITU-T Recommendationences in the choice of container formats, audio and video compression algorithms,acquisition parameters, and internal le structure. In combination, such characteristics can14 mobile phone models, and 6 video editing toolboxes. We report considerable differ-Forensic analysis of video le form

    Thomas Gloe a,*, Andr Fischer a, Matthias Kiadence GmbH, Dresden, GermanybUniversity of Mnster, Department of Information Systems, Mnster, Germa

    Keywords:Digital forensicsFile formatMetadataVideoAuthentication

    a b s t r a c t

    Video le format stroom for interpretaare thus a fruitful rMP4-like video streparsers to extract alier Ltd on behalf of DFRWSer b

    ds dene only a limited number of mandatory features and leaveDesign decisions of device manufacturers and software vendorsce for forensic video authentication. This paper explores AVI andf mobile phones and digital cameras in detail. We use customizedformat structures of videos from overall 19 digital camera models,

    stigation

    lsevier .com/locate/di in. This is an open access article under the CC BY-NC-ND license (http://

  • 2006; Milani et al., 2012a). The Adobe Premiere toolboxwas a sole exception in our test set in this regard. Thecommercial software is one of the major professional videoediting tools, but does not support lossless processing.

    We have written our own customized le parser(s) to

    T. Gloe et al. / Digital Investigation 11 (2014) S68S76 S69class of source congurations. Images are agged as suspi-cious if no match is found. In a different approach, Fan et al.(2013) exploit noise characteristics to determine whetherimage content and EXIF data are consistent. However, astampering with compression parameters or EXIF entries isonly a question of using proper software tools (which areoften publicly available), concerns have frequently beenexpressed that high-level le format information and met-adata are easily replaceable and/or forgeable (Sencar andMemon, 2008). On the contrary, existing image processingsoftware and metadata editors do not allow users to accessor to modify core le structures. Along these lines, Gloe(2012) reports that peculiarities in the specic internalorder of JPEG and EXIF structures are particularly valuableand distinctive information for digital image authentication.Such low-level characteristics thus offer a much increasedreliability and relaxdto some degreedthe commonassumption that le format and metadata information shallnot be trusted per se.

    Following this trail, this paper extends the idea of leformat forensics to popular digital video data containerformats, for whichdto the best of our knowledgednosystematic exploration of le format and/or metadataspecics has been reported in the forensic literature so far.Similar to differences in the JPEG le structure, we identifymanufacturer- and model-specic video le format char-acteristics and point to traces left by processing software.Such traces can be used to authenticate digital videostreams and to attribute recordings of unknown or ques-tionable provenance to (groups of) video camera models.Note that this differs from video le carving (Pal andMemon, 2009; Lewis, 2012), where it is usually sufcientto nd valid video streams in (fragmented) mass storagedumps. Based on a description of our (Test setup) and(General observations) on video le format forensics, thefollowing sections demonstrate that peculiarities of the(AVI Container format), of (Quicktime and related containerformats (MP4, 3GP)), and of (MJPEG Compressionparameters) can yield important insights about prove-nance and processing history of digital videos. The papercloses with a (Summary and concluding remarks).

    Test setup

    We report ndings from the examination of each onedevice of overall 19 digital camera models and 14 mobilephone models, all of them equipped with video capturingfunctionality. We acquired 3 to 14 videos per device byiterating over all available video quality settings (e.g., framesize and frame rate). Mobile phones were also switchedbetween regular and MMS (Multimedia Message Service)mode where available. All devices were subject to slightmotion during the video capturing process. Table 1 sum-marizes our test setup.

    For a selected number of camera models, video editingsoftware was used to cut short sequences (length: 10 s)from the recorded video streams. All software in our testssupports non-intrusive (lossless) video editing, i.e., wesaved les without re-compressing the original stream.Hence, the edited videos are presumably not detectable bymeans of double compression artifacts (Wang and Farid,read and extract all available le format information andmetadata from the videos in our database.2 As it isimpossible to detail all model- or vendor-specic singu-larities within the scope of this paper, the following sec-tions focus on selected results and observations that webelieve are particularly relevant for practical forensic ana-lyses of common video container formats.

    General observations

    The majority of digital cameras in our database storesvideos in the AVI container format. Only a few of the testdevices use Apple Quicktime MOV containers. We foundthat most digital cameras compress video data usingMotion JPEG (MJPEG), where every video frame is handledas independently JPEG-compressed image. Only threecamera models in our sample use more sophisticated andefcient compression algorithms (DivX, Xvid or H.264).Before compression, frames are generally converted to theYUV color space. We encountered 4:2:2 and 4:2:0 sub-sampling to reduce the resolution of chroma channels.MJPEG compressed video streams utilize the full intensityrange of 256 intensity levels for 8-bit encoded frames(yuvj422p or yuvj420p), whereas cameramodels with DivXor Xvid support (yuv420p) use only a reduced number of220/225 intensity levels for the luminance/chrominancechannel(s) (ITU-R Recommendation BT.601-7, 2011). Audiodata in the video container is usually stored as raw data(PCM), using linear (8-/16-bit) or logarithmic (m-law)quantization.

    All mobile phones in our database store video data inMOV-based container formats (MOV, 3GP, MP4). TheLG KU990 camera phone is an exception and alsosupports AVI containers. Interestingly, none of the mobilephones uses MJPEG compression. Instead, more sophisti-cated compression algorithms nd application, e.g., H.263,H.264, MPEG-4 video (simple prole) or DivX. Subsamplingalways follows a 4:2:0 scheme. In contrast to videos fromdigital cameras, the audio track of mobile phone videos istypically also subject to lossy compression. We foundMPEG-based audio compression or the AdaptiveMulti-Rateaudio codec (AMR-NB) to be most common. The latter is astandard optimized for speech coding (3rd GenerationPartnership Project, 1999), which is very common inmobile phones designed for GSM- and UMTS-networks.

    AVI Container format

    Microsoft introduced AVI (Audio Video Interleave) in1992 as a multimedia container format, which can contain

    2 Also the free software exiftool (available at: http://www.sno.phy.queensu.ca/wphil/exiftool) can be used to extract metadata and high-level le format information, but it does not provide access to all infor-mation that is of interest here.

  • dy.

    iner Video stream Audio stream

    MJPEG (yuvj422p) PCM (8-bit)

    AVIMotorola MileStone 3GP

    MP4MP4MP4

    T. Gloe et al. / Digital Investigation 11 (2014) S68S76S70Nokia 6710 3GP,Nokia E61i 3GP,Nokia E65 3GP,Table 1Digital cameras, mobile phones and video editing software used in this stu

    Make Model Conta

    Digital camera modelsAgfa DC-504, DC-733s, AVI

    DC-830i, Sensor530sAgfa Sensor505-X AVICanon S45, S70, A640, Ixus IIs AVICanon EOS-7D MOVCasio EX-M2 AVIKodak M1063 MOVMinolta DiMAGE Z1 MOVNikon CoolPix S3300 AVIPentax Optio A40 AVIPentax Optio W60 AVIPraktica DC2070 MOVRicoh GX100 AVISamsung NV15 AVIMobile phone modelsApple IPhone 4 MOVBenq Siemens S88 3GPBlackBerry 8310 3GPGoogle Nexus 7 3GPLG KU990 3GP,both video and audio streams (Microsoft DeveloperNetwork).

    General le structure

    AVI les start with a mandatory AVI RIFF (ResourceInterchange File Format) header. All following data isorganized and stored in so-called lists and chunks. A fourcharacter code (FourCC) is used to identify these data seg-ments. The FourCC for a list is LIST. Different FourCCs existfor chunks, e.g., JUNK or idx1. There is no strict specica-tion that denes the sequence and occurrence of lists andchunks.

    Fig. 1 illustrates the basic le structure that we found toappear similarly in all examined AVI les captured by dig-ital cameras or mobile phones. All devices store themandatory list hdrl directly after the AVI RIFF header. Thislist segment comprises all the information that is necessaryto decompress the video and audio data stored in the le.The third mandatory AVI segment after header and hdrllist is the movi list. It contains the actual video and audiodata. Depending on the specic camera or phone model,additional lists and/or JUNK chunks may exist between the

    Nokia X3-00 3GPPalm Pre MP4Samsung GT-5500i MP4Samsung SGH-D600 MP4Sony Ericsson K800i 3GP

    Software VersionFFmpeg 1.1.4 allAvidemux 2.6.2 allFreeVideoDub 2.0.17.320 allVirtualDub 1.9.11 AVIYamb 2.1.0.0 beta2 no AVIAdobe Premiere CS 5MJPEG (yuvj422p) PCM (m-law)MJPEG (yuvj422p) PCM (8-bit)H.264 (yuvj420p) PCM (16-bit)MJPEG (yuvj420p) PCM (8-bit)MJPEG (yuvj420p) PCM (m-law)MJPEG (yuvj422p) PCM (8-bit)MJPEG (yuvj422p) PCM (16-bit)DivX (yuv420p) MP2MJPEG (yuvj422p) PCM (8-bit)MJPEG (yuvj420p) MJPEG (yuvj422p) PCM (m-law)Xvid (yuv420p) PCM (m-law)

    H.264 (yuv420p) MP4AH.263 (yuv420p) AMR-NBMP4V, H.263 (yuv420p) AMR-NBH.264 (yuv420p) MP4ADivX, MP4V, H.263 (yuv420p) MP3, AMR-NBMP4V (yuv420p) MP4AMP4V, H.263 (yuv420p) MP4A, AMR-NBMP4V, H.263 (yuv420p) MP4A, AMR-NBMP4V, H.263 (yuv420p) MP4A, AMR-NBhdrl and movi lists. These optional data segments areeither used for padding or to store metadata. The idx1chunk indexes the data chunks and their location in the le.It is mentioned as an optional element in the AVI lereference (Microsoft Developer Network, 0000), yet wefound it in all video les in our database.

    Camera model specics

    An inspection of our video database revealed consider-able differences between device models and softwarevendors, a selection of which we discuss in the following.These and related characteristics can be tested for to verifyor to infer the source of questioned video content.

    Fig. 2 exemplarily illustrates four typical AVI le struc-tures of video recordings from Canon A640, Canon S45,Nikon CoolPix S3300, and Ricoh GX100 digital cameras. Thetree-like representations reect the nested character of the

    Fig. 1. Internal structure of AVI les acquired with our cameras.

    MP4V, H.263 (yuv420p) AMR-NBMP4V (yuv420p) MP4AMP4V, H.263 (yuv420p) AMR-NBMP4V (yuv420p) MP4AH.263 (yuv420p) AMR-NB

  • T. Gloe et al. / Digital Investigation 11 (2014) S68S76 S71AVI container format and depict all major lists and chunksin accordance to their actual order in the respective les.Corresponding elements are arranged on the same hori-zontal level. The gure indicates that not all les share thesame components, and that both the content and theformat of specic chunks may vary. The Nikon camera, forinstance, does not use the IDIT chunk to store therecording date, but adds a dedicated ncdt list to organizeits metadata. All four cameras insert an INFO list withdistinct content after the stream header. The two Canoncameras further differ in the format of the IDIT datespecication, but none of them uses the same format as theRicoh GX100.

    We also found that digital cameras with MJPEGcompression may specify the video codec in lowercase oruppercase letters. Canon Ixus IIs, Canon PowerShot (A640,S45, S70), Nikon CoolPix, Optio W60, Ricoh GX100, andCASIO EX-M2 cameras use mjpg, whereas all other testedcameras prefer the string MJPG. Further variations exist inthe hdrl list. The Agfa Sensor505-X camera adds anadditional stream description that refers to the Ven-dorName. The Pentax Optio A40 camera explicitly species

    Fig. 2. AVI le structure of videos acquired with Canon A640, Canon S45, Nikon Cranged on the same horizontal level. Not all nested sub-elements are listed.the StreamName (Video or Audio). Video and audiostream descriptions in hdrl lists from LG KU990 mobilephones are followed by JUNK chunks. Similar to the ex-amples in Fig. 2, most camera models add their own INFOlist after the hdrl list to provide additional metadata. Alsohere, padding with JUNK data chunks is common.

    Video editing

    All software tools in our test setting leave distinct tracesin the le structure of edited AVI videos, which do notmatch any of the camera characteristics in our database.Even losslessly edited videos are thus detectable bycomparing the le structure of a questioned le with areference of the purported source.

    Fig. 3 gives two specic examples. The familiar tree-likegraphs depict AVI le structures of a Canon A640 videoafter editing with the tools AVIDemux and VirtualDub,respectively. Differences to the structure of the original le(shown in the leftmost part of Fig. 2) are printed on graybackground. Observe that both tools place JUNK chunks inthe hdrl list at the end of audio and video stream

    oolPixS3300, and Ricoh GX100 digital cameras. Equivalent elements are ar-

  • source camera, as we will discuss in the section on (MJPEG

    Fig. 4. Internal structure of most MP4-like les acquired with our cameras.

    T. Gloe et al. / Digital Investigation 11 (2014) S68S76S72descriptions. VirtualDub adds another LIST withOpenDML3 AVI header data. The same tool extends the top-level le hierarchy by its own identier JUNK chunk(VirtualDub build 32842/release). AVIDemuxchanges the video codec information to capitalized letters.Other editing softwares leave their own traces. FFmpeg, forinstance, also inserts additional JUNK chunks. The contentof these entries is xed and specic to FFmpeg. In addition,a list info with details about the employed encoder soft-ware and version (here Lavf54.59.106) is present in videosedited with FFmpeg.

    Metadata lists and JUNK chunks that would hint to therecording device are typically lost in edited les. Virtual-Dub is an exception and appends the device-specic INFOlist of the original le to the edited video (cf. Fig. 3). Ingeneral, however, losslessly edited video streams mightstill contain distinct compression characteristics of the

    dene any particular order or number of data segments, we

    beyond the scope of this paper to detail all the differences

    Fig. 3. AVI le structure of a Canon A640 video after editing with AVIDemuxand VirtualDub. Equivalent elements are arranged on the same horizontallevel. Not all nested sub-elements are listed. Differences to the original leare printed on gray background (see also Fig. 2).

    3 OpenDML is an AVI-like le format specication that supports largerle sizes, amongst other extensions.we observed, and we rather focus on select indicativeexamples.

    Most MP4-like videos in our database start with theftyp atom, yet we also found a number of exceptions.Kodak videos, for instance, start with a skip atom, whereasMinolta Z1 videos place a pnot atom with a reference topreview data at the beginning of a le. The Praktica DC2070camera does not use a header at all and starts with themdat atom directly. As MP4-like container formats arecompatible with a plethora of different data formats,codecs and parameterizations, the ftyp atomdiffound the general structure in Fig. 4 to be present inmost ofthe examined les. MP4-like video les usually start withthe ftyp atom, which refers to the le type specicationsthe le is compatiblewith.4 The actual data stream is storedin the mdat atom, which is accompanied by correspondingmetadata in the moov atom.We also encountered les withmoof atoms, which contain shorter data chunks ofelementary streams.

    Camera model specics

    Reecting the by far more complex le format, MP4-likevideos exhibit an even larger degree of source-dependentinternal variations than AVI les.5 This is only to theadvantage of forensic investigators, who will nd thesecharacteristics useful for authentication purposes. Yet it isCompression parameters).

    QuickTime and related container formats (MP4, 3GP)

    The MOV container format was introduced by Apple forits QuickTime multimedia framework in 1991. It was laterused as a basis for the MP4 and 3GP container specica-tions (Apple Computer, Inc., 2001; ISO/IEC 14496, 2003; 3rdGeneration Partnership Project, 2004). For the sake ofsimplicity, we refer to all three container formats as MP4-like formats.

    General le structure

    Similar to the AVI format, MP4-like containers consist ofindividual data structures (atoms or boxes), which areidentied via unique 4-byte sequences. Information aboutthe atom size precedes the corresponding identier. Atomscan be nested. Although the format explicitly does not4 As per the 2008 ISO media le standard ISO/IEC 14496 (2008), theftyp atom is mandatory and must be placed as early as possible in thele.

    5 We refrain from detailed graphical illustrations because of thisincreased complexity.

  • presentdstates which specication is the best use of thele in the major_brand eld. Additional elds then list theminor version and compatible brands. Table 2 reportscombinations of ftyp major and compatible brand speci-

    Table 2Major and compatible brands stored in the MP4 ftyp atom.

    Model/container: model Majorbrand

    Compatible brands

    Apple IPhone 4 qt qtBenq S88 isom isom, 3g2aBlackBerry 8310, Palm Pre 3gp4 3gp5, 3gp4, isomCanon 7D qt qt, CAEPGoogle Nexus 7 3gp4 isom, 3gp4Kodak M1063 LG KU990 3gp5 3gp5, 3gp4LG KU990, Samsung

    SGH-D6003gp5 3gp5, isom

    Minolta DiMAGE Z1 Motorola MileStone 3gp4 3gp4, mp41, 3gp6Samsung GT-5500i (MP4V)3GP: Nokia 6710, E61i, E65 3gp4 3gp4, 3g2a, isomMP4: Nokia 6710, E61i, E65 mp42 mp42, 3gp4, isomNokia X3-00 3gp5 3gp5, 3gp4, 3g2a, isomPraktica DC2070 Samsung GT-5500i (H.263) 3gp4 3gp4, 3gp6SonyEricsson K800i 3gp5 vfj1, 3gp4, 3gp5, mp42

    FFmpeg isom isom, iso2, mp41YAMB mp42 isom, mp42, 3gp5Adobe Premiere CS 5 3gp5 isom, 3gp4, mp41, mp42

    Table 4Additional atoms in MP4-like les.

    Model Atoms

    Benq S88 mvex, mdat le end, moofBlackBerry 8310Canon 7D udtaGoogle Nexus 7 udta

    T. Gloe et al. / Digital Investigation 11 (2014) S68S76 S73cations in our database. It indicates that these specica-tions can differ to a large degree between differentrecording devices. We do not report the minor_versioneld, as we usually found it set to 0.

    Table 3Values stored in the MP4 moov atom time_scale eld, as observed byiterating over all available quality settings per device. Symbol . indicatesthat mvhd and mdhd entries do not differ.Model time_scale

    mvhd mdhd

    (video)mdhd

    (audio)

    Benq S88 90000 . 8000BlackBerry 8310 1000 . 1000Canon 7D 25000, 24000, 50000 . 48000Google Nexus 7 1000 90000 44100IPhone 4 600 . 44100Kodak M1063 150191279991 . 11025LG KU990 1000 . 8000,1000Minolta Z1 600 . 7875Motorola MileStone 1000 . 44100Nokia 6710 10000 30000 3GP: 8000,

    MP4:48000, 16000

    Nokia E61i, E65 10000 30000 3GP: 8000,MP4: 16000

    Nokia X3-00 15750 . 15750Palm Pre 1000 . 44100Praktica DC2070 60 . Samsung GT-5500i 1000 . 8000Samsung SGH-D600 1000 . 11025SonyEricsson K800i 1000 . 8000

    FFmpeg 1000 variable 48000YAMB 600 30000 48000Adobe Premiere CS5 90000 29970 32000The moov atom is one of the most complex data struc-tures in MP4-like les. It species the decoding parametersfor parsing the mdatdata streamcorrectly. The atom itself isorganized in a number of sub-structures, such as the generalmvhdmovie header atom or the more specic mdhd mediaheader atom. We found that parameter settings largelydepend on the specic camera model. Table 3 exemplarilysummarizes settings for the time_scale eld, whichcontrols both frame rate and audio sampling rate. Observethat mvhd and mdhd atom may dene different values forthis eld, and that corresponding combinations thereofvary between different recording devices.

    Besides the (quasi-)standard ftyp, mdat and moovatoms, we also encountered a variety of additional ele-ments in MP4-like video les. The overview in Tab. 4 in-

    IPhone 4 wide, free, metaKodak M1063 skip, edtsLG KU990Minolta Z1 pnot, PICTMotorola MileStone udtaNokia 6710Nokia E61i, E65Nokia X3-00 free, mdat le endPalm Pre udtaPraktica DC2070Samsung GT-5500i udtaSamsung SGH-D600 iodsSonyEricsson K800i udta

    FFmpeg free, edts, udtaYAMB iods, tref, nmhd, free, mdat le endAdobe Premiere CS5 iods, udta, uuid, mdat le enddicates, for instance, that some sources store metadata inudta user data atoms, or use free atoms for padding.Because the specic order of atoms is generally notexplicitly dened, it is nally not surprising that there existdifferences here, too. Notable deviations are Benq S88 andNokia X3-00 videos. The former follow a ftyp moov mdatsequence (and optionally multiple subsequent pairs ofmoof mdat). The latter are organized as ftyp moov freemdat free.6 Motorola MileStone exhibit another inter-esting peculiarity. Here, the audio stream precedes thevideo stream.

    Video editing

    Like in the case of AVI editing, none of the MP4-like lesproduced by editing software in our test set is compatible

    6 On a side note, we mention that the apparent prevalence of the mdatmoov order may pose a problem to forensic le carving scenarios. If thelast part of an MP4 le (including the moov atom) is lost, its recon-struction in the absence of metadata becomes considerably morecomplicated. This is generally not the case for AVI le containers or JPEGles.

  • (APP0(AVI1)) instead. Also the selection of JPEG markersegmentsdand in some cases even their sequencedisdifferent from the JPEG still images of the same digitalcamera.

    Kodak M1063 video frames, for instance, contain a

    Table 5Common JPEG le markers. Symbol indicatesmandatory markers. (JIFsentropy encoding table is not restricted to DHT.)

    Marker id Short value JIF JFIF EXIF Description

    SOI 0xFF D8 Start of imageAPPn 0xFF En Application dataAPP0 0xFF E0 (e.g., JFIF)APP1 0xFF E1 (e.g., EXIF)DQT 0xFF DB Quant. tablesDHT 0xFF C4 () Huffman tablesSOF 0xFF Cn Start of frameSOF 0xFF C0 (baseline DCT)SOS 0xFF DA Start of scanDRI 0xFF DD Restart intervalRSTn 0xFF Dn nth restartCOM 0xFF FE CommentEOI 0xFF D9 End of image

    Table 7Number of unique quantization tables in MJPEG videos.

    Camera model Y/CbCr

    Agfa DC-504 1/1Agfa DC-733s 589/390Agfa DC-830i 489/314Agfa Sensor505-X 893/286Agfa Sensor530s 1/1Canon Ixus IIs 5/5Canon A640 6/6Canon S45 6/6Canon S70 8/8Casio EX-M2 121/121Kodak M1063 10/10Minolta DiMAGE Z1 13/13Nikon CoolPix S3300 465/111Pentax Optio W60 73/73

    T. Gloe et al. / Digital Investigation 11 (2014) S68S76S74with any recording device in our database. The threebottommost rows in Tables 24 exemplify some of thedifferences for FFmpeg, YAMB, and Adobe Premiere CS5.Observe that each software has its own unique signature,which emphasizes the usefulness of such characteristics inauthentication scenarios.

    Table 2 suggests that the combination and order ofmajor and compatible brands is particularly indicative. Theinspection of the time_scale eld revealed that FFmpeg,AVIDemux and FreeVideoDub always set the mvhd value to1000, with a mdhd (video) value depending on the originalframe rate. Other tools employ different specications(cf. Table 3). Table 4 indicates that Samsung SGH-D600videos share the existence of additional iods objectdescriptor atoms with edited YAMB and Adobe Premierevideos, whereas the latter two also use extra tref anduuid atoms, respectively. FFmpeg-based software is theonly one in our test set that adds an edts edit list atom tothe le. We further observed that AVIDemux sets theacquisition time and modication time of edited lesincorrectly to 01-01-1904.

    MJPEG Compression parameters

    Independent of the supported container format, themajority of digital cameras in our database uses MotionJPEG (MJPEG) to encode recorded videos (cf. Table 1). In-dividual MJPEG frames are stored as JPEG compressed still

    images, which itself consist of marker segments. Such

    Table 6JPEG marker segment sequences in MJPEG compressed videos.

    Model/thumbnail: model Sequence of J

    Agfa DC-504, Sensor530s SOI, DQT, SOFAgfa DC-733s, DC-830i SOI, APP0(AVAgfa Sensor505-X, Nikon CoolPix S3300 SOI, APP0(AVCanon PowerShot A640 SOI, APP0(AVCanon S45, S70, Ixus IIs SOI, APP0(AVCasio EX-M2, Ricoh GX100 SOI, APP0(AVKodak M1063 SOI, APP0(AVMinolta DiMAGE Z1 SOI, DHT, DHTPentax Optio W60 SOI, APP0(AVPraktica DC2070 SOI, APP1(0xthumbnail: Nikon CoolPix S3300 SOI, DQT, DHTthumbnail: Pentax Optio W60, Ricoh GX100 SOI, DQT, SOFsegments are internally identied by 2-byte numbers,usually abbreviated with character sequences known fromthe JPEG standard (cf. Tab. 5). Based on our prior work onJPEG le forensics (Gloe, 2012), we expect that the exis-tence and the order of specic segments here depends onthe recording device, too.

    For all MJPEG videos in our test set, we observed thesame sequence of JPEG marker segments for all frames ofone video. Differences do exist between groups of cameramodels. Within these groups, the sequence of JPEG markersegments is independent of respective quantization set-tings. Table 6 summarizes our ndings for cameras withMJPEG support and the corresponding thumbnails(if available). Interestingly, MJPEG frames do not strictlyfollow the JFIF (Hamilton, 1992) or JPEG/Exif (JapanElectronics and Information Technology IndustriesAssociation, 2002) standards. Most cameras identify JPEGframes by means of an AVI application data segment

    Praktica DC2070 1/1Ricoh GX100 924/338 (2)

    Punique quantization tablesa 2914/1279

    a The total number of unique tables is not the sum of unique tables percamera model.combination of AVI and JFIF application data segments,

    PEG marker segments

    0, DHT, COM, SOS, EOII1), DQT, DHT, SOF0, SOS, EOII1), DRI, DQT, DHT, SOF0, SOS, EOII1), DRI, DQT, SOF0, SOS, EOII1), DRI, DQT, SOF0, APP2, SOS, EOII1), DQT, SOF0, SOS, EOII1), DRI, APP0(JFIF), DQT, DQT, SOF0, DHT, DHT, DHT, DHT, SOS, EOI, DHT, DHT, DQT, DQT, SOF0, SOS, EOII1), DRI, DQT, SOF0, DHT, SOS, EOI0000 mjpg), DQT, DHT, SOF0, SOS, EOI, SOF0, SOS, EOI0, DHT, SOS, EOI

  • T. Gloe et al. / Digital Investigation 11 (2014) S68S76 S75whereas the Praktica DC2070 digital camera uses a specicAPP1(0x0000 mjpg) marker segment. Differences alsoexist in the organization of Huffman and quantization ta-bles. Older Canon cameramodels (S45, S70, Ixus IIs) use theAPP2marker instead of the dedicated DHTmarker segmentto store the Huffman tables for entropy coding. CanonA640, Casio EX-M2, and Ricoh GX-100 video frames rely onstandardized Huffman tables and do not store tables alongwith the frames. On the contrary, Kodak M1063 and Min-olta Z1 cameras employ four DHT marker segments, i. e.,one segment per Huffman table instead of a single DHTsegment with all four tables. The same camera models usetwo DQT segments instead of one to store the two requiredquantization tables. Another characteristic of Minoltavideos are padding bits between subsequent frames.

    It is also interesting to note that we identied theimpressive number of 2914 unique JPEG quantization tables(luminance only) in our relatively small set of test videos.We refer to Table 7 for an overview. This large amount canbe mainly attributed to cameras with scene-dependentadaptive quantization settings. We found that these cam-eras may also switch quantization tables between frames ofthe same video. Backed with similar experiences withadaptive quantization tables of thumbnail images in theDresden Image Database (Gloe, 2012), our observations forMJPEG video frames strongly emphasize the challengesthat forensic investigators have to deal with whenattempting to create a comprehensive database of quanti-zation parameters.

    We close this section with the remark that markersegment characteristics (and quantization tables) are mostuseful to verify or to determine the source device of anMJPEG video (or a group of devices). Lossless video editingdoes notmodify the internal structure ofMJPEG frames. Thisis different from the analysis of JPEG les, where we re-ported software-specic traces in the le structure of pro-cessed images (Gloe, 2012). At the same time, however, anyvideo editing software is likely to change the structure of thecontainer format in its very own specic way. It is thus stillpossible to detect lossless video editing based on lestructure information, cf. our discussion of (Camera modelspecics) of the AVI container format.

    Summary and concluding remarks

    This paper has presented a rst systematic explorationof popular video container formats from a forensic view-point. Specically, we have focused on the internal lestructure of AVI andMP4-like (MOV, 3GP, MP4) multimediacontainers. Our examination of videos from 19 digitalcameras, 14 mobile phones, and various video editing toolsindicate a profusion of model- and software-specic pe-culiarities. The identied characteristics complement thetoolbox of forensic investigators and provide valuable cluesto verify the authenticity of digital video streams.

    Our main ndings can be summarized as follows:

    Videos from digital cameras and mobile phones oftenemploy different container formats and compressioncodecs. Mobile phones opt for sophisticated compres-sion algorithms (MP4V, H.26x). Most digital cameras inour test set prefer a combination of AVI containers andbasic MJPEG compression.

    The structure of AVI and MP4-like containers is notstrictly dened. We observed considerable differencesboth in the order and in the presence of individual datasegments. AVI les often contain specic INFO lists orJUNK chunks. MP4-like les may employ various non-standard atoms and different parametrizations of spe-cic atom entries.

    Different camera models implement different JPEGmarker segment sequences for MJPEG-compressedvideo frames. Content-adaptive quantization tablesseem to be more frequent than for JPEG images. MJPEGframe and JPEG still image marker sequences andcompression settings of the same camera model can becompletely disjoint.

    Lossless video editing leaves compression settings of theoriginal video stream untouched, but introduces its veryown distinctive artifacts in the structure of containerles. While le format peculiarities of the genuinesource device are typically lost after video editing, alltested software tools have unique le format signaturesthroughout our test set.

    We note that our test set did not comprise multipledevices per camera model, so we can only surmise that ourobservations generalize to arbitrary devices of the samemodel as well. Yet this is to be expected, as our observationsresemble similar reports about the structure of JPEG stillimages (Kee et al., 2011; Gloe, 2012), RAW image formats(Kalms et al., 2012), and MP3 audio les (Bhme andWestfeld, 2005). Because our analysis relies on le struc-ture internals, we are currently not aware of any publiclyavailable software that would allow users to consistentlyforge such information without advanced programmingskills. This makes the creation of plausible forgeries un-doubtedly a highly non-trivial undertaking and thereby re-emphasizes that le characteristics and metadata must notbe dismissed as unreliable source of evidence for the pur-pose of le authentication per se.

    We note that further examinations will have to showhow distinctive video le format characteristics are on alarger scale. It is without question that the identied gen-eral differences between different recording devices andediting softwares could also be combined with a signature-based quantitative approach as proposed by Kee et al.(2011) for JPEG les. From the viewpoint of practical casework, however, we believe that such condensed signa-tures are never optimal, as they do not exploit all availableinformation. This is particularly so when we consider thatvideo container formats are by far more complex (and lessstrictly dened) than the JPEG format. At the same time, itis an open question how a more holistic signature wouldscale with all the unknown le format variations thatdoubtlessly still exist in the wild, but also to what degreemodern streamlined smartphone designs are built aroundstandard open source or vendor-based reference imple-mentations. We can only surmise that a collection ofsource-specic le parsers, which mimic the carefulmanual inspection of a forensic investigator, could help in

  • authentication scenarios. Future work will have to inves-tigate how the specication of such parsers can be auto-mated and how practical this approach is also for large-scale source identication applications.

    References

    3rd Generation Partnership Project. Mandatory speech CODEC speechprocessing functions; AMR speech codec; General description, 3GPPTS 26.071; 1999.

    3rd Generation Partnership Project. Transparent end-to-end packetswitched streaming service (PSS); 3GPP le format (3GP), 3GPP TS26.244; 2004.

    Apple Computer, Inc. Quicktime le format; 2001.Bhme R, Westfeld A. Feature-based encoder classication of compressed

    audio streams. Multimed Syst 2005;11:10820.Chen M, Fridrich J, Goljan M, Luks J. Source digital camcorder identi-

    cation using sensor photo-response non-uniformity. In: Delp EJ,Wong PW, editors. Proceedings of SPIE: Security and Watermarking ofMultimedia Content IX. SPIE Press; 2007. 65051G.

    Conotter V, OBrien J, Farid H. Exposing digital forgeries in ballisticmotion. IEEE Trans Inf Forensics Security 2011;7:28396.

    Fan J, Cao H, Kot AC. Estimating EXIF parameters based on noise featuresfor image manipulation detection. IEEE Trans Inf Forensics Security2013;8:60818.

    Farid H. Digital image ballistics from JPEG quantization: a followup study[Technical Report TR2008638]. Hanover, NH, USA: Department ofComputer Science, Dartmouth College; 2008.

    Gloe T. Forensic analysis of ordered data structures on the example of

    Japan Electronics and Information Technology Industries Association.JEITA CP-3451, Exchangeable image le format for digital still cam-eras: Exif version 2.2; 2002.

    Kalms M, Gloe T, Laing C. File forensics for raw camera image formats. In:6th Conference on Software, Knowledge, Information Managementand Applications (SKIMA); 2012.

    Kee E, Farid H. Digital image authentication from thumbnails. In:Memon ND, Dittmann J, Alattar AM, Delp EJ, editors. Proceedings ofSPIE: Media Forensics and Security II; SPIE Press; 2010. 75410E.

    Kee E, JohnsonMK, Farid H. Digital image authentication from JPEG headers.IEEE Trans Inf Forensics Security 2011;6:106675.

    Kornblum JD. Using JPEG quantization tables to identify imagery pro-cessed by software. Digit Investig 2008;5:S215.

    Lewis AB. Reconstructing compressed photo and video data [TechnicalReport UCAM-CL-TR-813]. Cambridge, United Kingdom: University ofCambridge, Computer Laboratory; 2012.

    Microsoft Developer Network. AVI RIFF le reference. http://msdn.microsoft.com/en-us/library/ms779636(VS.85).aspx.

    Milani S, Bestagini P, Tagliasacchi M, Tubaro S. Multiple compressiondetection for video sequences. In: IEEE International Workshop onMultimedia Signal Processing (MMSP); 2012a. pp. 1127.

    Milani S, Fontani M, Bestagini P, Barni M, Piva A, Tagliasacchi M, et al. Anoverview on video forensics. APSIPA Transactions Signal InformationProcess 2012b;1:e2.

    Murdoch SJ, Dornseif M. Hidden data in internet published documents.In: 21st Chaos communication congress; 2004.

    Pal A, Memon N. The evolution of le carving. IEEE Signal Process Mag2009;26:5971.

    Sencar HT, Memon N. Overview of state-of-the-art in digital image fo-rensics. In: Bhattacharya BB, Sur-Kolay S, Nandy SC, Bagchi A, editors.Algorithms, architectures and information systems security. Statisti-cal Science and Interdisciplinary Research, vol. 3. World ScienticPress; 2008. pp. 32548 [chapter 15].

    T. Gloe et al. / Digital Investigation 11 (2014) S68S76S76JPEG les. In: IEEE international Workshop on Information Forensicsand Security (WIFS). IEEE; 2012. pp. 13944.

    Hamilton E. JPEG le interchange format, Version 1.02; 1992.Hsu CC, Hung TY, Lin CW, Hsu CT. Video forgery detection using corre-

    lation of noise residue. In: IEEE workshop on Multimedia SignalProcessing (MMSP). IEEE; 2008. pp. 1704.

    ISO/IEC 10918-1, ITU-T Recommendation T.81. Information technology.Digital compression and coding of continuous-tone still images;1992.

    ISO/IEC 14496. Information technology. coding of audio-visual objects,Part 14: MP4 le format; 2003.

    ISO/IEC 14496. Information technology. Coding of audio-visual objects,Part 12: ISO base media le format. 3rd ed. 2008.

    ITU-R Recommendation BT.601-7. Studio encoding parameters of digitaltelevision for standard 4:3 and wide-screen 16:9 aspect ratio; 2011.Sencar HT, Memon N, editors. Digital image forensics. Springer; 2013.Stamm MC, Lin WS, Liu KJR. Temporal forensics and anti-forensics for

    motion compensated video. IEEE Trans Inf Forensics Security 2012;7:131529.

    Vzquez-Padn D, Fontani M, Bianchi T, Comesaa P, Piva A, Barni M.Detection of video double encoding with GOP size estimation. In:IEEE international Workshop on Information Forensics and Security(WIFS). IEEE; 2012. pp. 1516.

    Wang W, Farid H. Exposing digital forgeries in video by detecting doubleMPEG compression. In: ACM workshop on multimedia and security.ACM Press; 2006. pp. 3747.

    Wang W, Farid H. Exposing digital forgeries in interlaced and dein-terlaced video. IEEE Trans Inf Forensics Security 2007;2:43849.

    Forensic analysis of video file formatsIntroductionTest setupGeneral observationsAVI Container formatGeneral file structureCamera model specificsVideo editing

    QuickTime and related container formats (MP4, 3GP)General file structureCamera model specificsVideo editing

    MJPEG Compression parametersSummary and concluding remarksReferences