Upload
zoe-copeland
View
224
Download
1
Embed Size (px)
Citation preview
Sogang University
Advanced Operating SystemsAdvanced Operating Systems
(Process Management - Linux)(Process Management - Linux)
Sang Gue Oh, Ph.D.Sang Gue Oh, Ph.D.
Email : [email protected] : [email protected]
Page 2Sogang University
Process Management - Linux
Process Descriptor in LinuxProcess Descriptor in Linux Process Descriptor in LinuxProcess Descriptor in Linux
Each process is represented by a task_struct data structure.
Rather complex. It not only contains many fields, but pointers to other structures in order to manage processes.
Page 3Sogang University
Process Management - Linux
Process StateProcess State Process StateProcess State
TASK_RUNNING : either executing on the CPU or waiting to be executed.
TASK_INTERRUPTIBLE : suspended (sleeping) until some condition becomes true. Can be interrupted by a signal or releasing a resource the process is waiting for.
TASK_UNINTERRUPTIBLE : similar to the previous state except that delivering a signal to the sleeping process leaves its state unchanged.
TASK_STOPPED : e.g., ctrl+z or debugging mode. TASK_ZOMBIE : a halted process but the parent process has
not yet issued a wait() like system call to release the resource. This process still has a task_struct in the task vector.
Page 4Sogang University
Process Management - Linux
Task ArrayTask Array (up to Linux 2.2.x) (up to Linux 2.2.x)Task ArrayTask Array (up to Linux 2.2.x) (up to Linux 2.2.x)
Task array An array of pointers to every task_struct data structure.
Max. number of processes : the size of the task vector (default : 512
entries – defined with the variable NR_TASKS).
Linux 2.4 removes the task array to raise the hard-coded limit on the
number of processes.
Initializing task array
#define NR_TASKS 512 <include/linux/task.h>
#define init_task (init_task_union.task) <include/asm_i386/process.h>
struct task_struct *task[NR_TASKS] = { &init_task,} ; <kernel/sched.c>
--- init_task is a process descriptor pointer to the process 0 or swapper process.
Page 5Sogang University
Process Management - Linux
Storing a Process DescriptorStoring a Process DescriptorStoring a Process DescriptorStoring a Process Descriptor
The task array only contains pointers to process descriptors. Process descriptors are stored in dynamic memory
(alloc_task_struct()). Linux stores two different data structure in a single 8 KB memory. They are process descriptor and kernel mode stack for each
processor.
union task_union {
struct task_struct task;
unsigned long stack[2048];
}
--- include/asm/processr.h
Page 6Sogang University
Process Management - Linux
Accessing a Processor DescriptorAccessing a Processor Descriptor Accessing a Processor DescriptorAccessing a Processor Descriptor
The esp register is the CPU stack pointer. Right after switching from user mode to kernel mode, the kern
el stack of a process is always empty. The process descriptor pointer of a process can be easily obtai
ned via esp register. Mask out the 13 least significant bits of esp. This is done by the current macro.
The current macro contains the process descriptor pointer of currently runnin
g process (e.g., current->pid : pid of the process currently running).
movl $0xffffe000, %ecx
andl %esp, %ecx
movl %ecx, p current
Page 7Sogang University
Process Management - Linux
Process ListProcess List Process ListProcess List
Process list for all existing processes
Process list of TASK_RUNNING state Process list of task free entries
Page 8Sogang University
Process Management - Linux
PID Hash TablePID Hash Table PID Hash TablePID Hash Table
Used to derive the process descriptor pointer via pid. In order to speed up the search, a hash table (pidhash variable)
consisting of PIDHASH_SZ elements (default is 128 in Linux 2.2.x, 1024 in Linux 2.4.x) has been introduced.
Hash function: #define pid_hashfn(x) ((((x) >> 8) ^ (x) & (PIDHASH_SZ – 1))
Uses chaining to handle collision.pidhash
0
100
123
127
Pid 228 Pid 27535
Pid 27536
pidhash_next
pidhash_pprev
Page 9Sogang University
Process Management - Linux
Parenthood RelationshipParenthood RelationshipParenthood RelationshipParenthood Relationship
p_opptr : original parent p_pptr : parent p_cptr : youngest child p_ysptr : younger sibling p_osptr : older sibling Every task_struct keeps pointers to its parent process and to its siblings
as well as to its own child processes. pstree command
/* * pointers to (original) parent process, youngest child, younger sibling, * older sibling, respectively. (p->father can be replaced with * p->p_pptr->pid) */ struct task_struct *p_opptr, *p_pptr, *p_cptr, *p_ysptr, *p_osptr;
Page 10Sogang University
Process Management - Linux
Example of Parenthood RelationshipExample of Parenthood RelationshipExample of Parenthood RelationshipExample of Parenthood Relationship
Page 11Sogang University
Process Management - Linux
Wait Queues (include/linux/wait.h and sched.h)Wait Queues (include/linux/wait.h and sched.h) Wait Queues (include/linux/wait.h and sched.h)Wait Queues (include/linux/wait.h and sched.h)
Process list of processes with TASK_INTERRUPTIBLE or TASK UNINTERRUPTIBLE states.
Wait queue pointer Receives the address q of a wait queue and set that pointer to q – 4.
q
struct wait_queue {
struct task_struct *task;
struct wait_queue *next;
}
add_wait_queue(struct wait_queue **q, struct wait_queue *entry);
remove_wait_queue(struct wait_queue **q, struct wait_queue *entry);
Page 12Sogang University
Process Management - Linux
Using Wait QueuesUsing Wait Queues Using Wait QueuesUsing Wait Queues sleep_on(struct wait_queue **p) : set process state to TASK_UNINTERRUPTI
BLE and inserts the process into the wait queue. Can be waken up via wake_up macro.
interruptible_sleep_on(struct wait_queue **p) : set process state to TASK_INTERRUPTIBLE and inserts the process into the wait queue. Can be waken up via wake_up_interruptible macro or receiving a signal.
sleep_on_timeout(struct wait_queue **p, long time) : similar to sleep_on except that the process can be awakened via timeout.
interruptible_sleep_on_timeout(struct wait_queue **p, long time) : similar to interruptible_sleep_on except that the process can be awakened via timeout.
wake_up(struct wait_queue **p) : macro to wake up all the sleeping processes with TASK_UNINTERRUPTIBLE state in the wait queue. Invoke __wake_up(queue, mode) where mode is the process states for waking up.
wake_up_interruptible(struct wait_queue **p) : macro to wake up all the sleeping processes with TASK_INTERRUPTIBLE state in the wait queue. Invoke __wake_up(queue, mode) function.
Page 13Sogang University
Process Management - Linux
Process Usage LimitsProcess Usage Limits Process Usage LimitsProcess Usage Limits Processes are associated with sets of usage limits, which specify the
amount of system resources they can use. They are stored in the rlim field of the process descriptor. The field is an array of elements of type struct rlimit (e.g., struct rli
mit rlim[]), struct rlimit { /* define in include/linux/resource.h */
long rlim_cur;long rlim_max;
}
Examples of elements are RLIMIT_CPU (maximum CPU time), RLIMIT_FSIZE (maximum file size), RLIMIT_NPROC (maximum # of processes that the user can own), etc.
(defined in include/asm/resource.h) current->rlim[RLIMIT_CPU] : CPU time limit of current process.
Page 14Sogang University
Process Management - Linux
Creating ProcessesCreating Processes Creating ProcessesCreating Processes Traditional UNIX systems treat all processes in the same way: res
ources owned by the parent process are duplicated and a copy is granted to the child process.
This approach makes process creation very slow and inefficient. Three approaches :
Copy on Write (COW) : allows both the parent and the child to read the same physical pages. Whenever either one tries to write on a physical page, the kernel copies its contents into a new physical page that is assigned to the writing process. fork() is implemented with COW.
Lightweight Processes : allows both the parent and the child to share many per-kernel data structure.
vfork() system call : creates a process that shares the memory address space of its parent. Parent’s execution is blocked until the child exits or executes a new program. In Linux, vfork() is identical to fork().
Page 15Sogang University
Process Management - Linux
Copy on WriteCopy on Write Copy on WriteCopy on Write
Physical Memory Parent PTChild PT
Child PT Parent PT Physical Memory
Shared
Copy on Write
copy
Parent copy
Child copy
write
Page 16Sogang University
Process Management - Linux
Creating Processes in LinuxCreating Processes in LinuxCreating Processes in LinuxCreating Processes in Linux
Creating a process (kernel/fork.c - do_fork( ))
New processes are created by cloning old processes. A new task_struct is allocated (alloc_task_struct() to get 8 KB). Copies the contents of the parent’s task descriptor into the new descriptor. Check the usage limit. A new process identifier is created. The new task_struct is entered into the task vector. Updates all the process descriptor fields that cannot be inherited from the
parent process, such as the fields that specify the process parenthood
relationships.
• Create new data structures and copy into them the values of the corresponding
parent data structures.
• Update the various process list and hash table.
Page 17Sogang University
Process Management - Linux
Example (Process Creation – User Program) Example (Process Creation – User Program) Example (Process Creation – User Program) Example (Process Creation – User Program) #include <unistd.h> main() {
int childpid;
if ( ( childpid = fork() ) < 0 ) { printf(“can’t fork\n”); exit(-1);
} else if (childpid > 0) { /* parent process */
….. Parent_Service_routine(); OR execvp(“prog_name1”, argv); OR while ( wait((int *) 0) != childpid) ; ….. exit(0);
} else { /* child process */
….. Child_Service_Routine(); OR execvp(“prog_name2”, argv); ….. exit(0);
} }
Page 18Sogang University
Process Management - Linux
Kernel ThreadsKernel Threads Kernel ThreadsKernel Threads Since some of the system processes run only in Kernel Mode, mo
dern operating systems delegate their functions to kernel threads (kernel-level lightweight process).
What is different from regular process ? Each kernel thread executes a single specific kernel function, while regular pro
cesses execute kernel functions only through system calls. Kernel threads run only in Kernel Mode, while regular processes run alternativ
ely in Kernel Mode and in User Mode. Since kernel threads run only in Kernel Mode, they use only linear addresses g
reater than PAGE_OFFSET. Regular processes, on the other hand, use all 4 GB of linear addresses, either in User Mode or in Kernel Mode.
Kernel threads share kernel data structures. Both kernel threads and regular process occupy pid and the corresponding pro
cess descriptor.
Created via kernel_thread(int (*fn) (void *), void *arg, unsigned long flags).
Page 19Sogang University
Process Management - Linux
Process Address Space (Virtual Memory)Process Address Space (Virtual Memory) Process Address Space (Virtual Memory)Process Address Space (Virtual Memory)
The address space of a process consists of all linear addresses that the process is allowed to use.
Each process sees a different set of linear addresses. The kernel dynamically modify a process address space by adding
or removing intervals of linear addresses (memory region). For reasons of efficiency, both the linear address and the length of
a memory region must be multiples of 4K. All information related to the process address space (memory
descriptor) is included in a table referenced by the mm field of the process descriptor.
Each entry of the table contains the information of all the memory region within the process.
Page 20Sogang University
Process Management - Linux
Memory Descriptor (mm_struct)Memory Descriptor (mm_struct) Memory Descriptor (mm_struct)Memory Descriptor (mm_struct)
Pointers to the memory regions.
Page 21Sogang University
Process Management - Linux
Memory Region (vm_area_struct)Memory Region (vm_area_struct) Memory Region (vm_area_struct)Memory Region (vm_area_struct)
Start address of a memory region
End address of a memory region
Next memory region
Page 22Sogang University
Process Management - Linux
Adding or Removing Memory RegionsAdding or Removing Memory Regions Adding or Removing Memory RegionsAdding or Removing Memory Regions
Page 23Sogang University
Process Management - Linux
Memory Mapping (1)Memory Mapping (1)Memory Mapping (1)Memory Mapping (1)
The executable binary file of a process needs to be mapped into the virtual address space (4 GB) before executing. (Use do_mmap() function)
Procedure Generate a set of vm_area_struct (include/linux/mm.h). Each vm_area_struct represents a part of the executable image.
The executable code, initialized data, uninitialized data (BSS), etc. Associate the correct set of virtual memory operations.
All mapped areas (vm_area_struct) belonging to the same process are connected using a tree structure.
AVL tree structure - for efficient search (O(n) -> O(logn)). n is typically around 6 but may reach 3000 in some cases.
Page 24Sogang University
Process Management - Linux
Memory Mapping (2)Memory Mapping (2)Memory Mapping (2)Memory Mapping (2)
mm countpgd
mmap_avlmmap
vm_endvm_start
vm_opsvm_inodevm_flags
vm_next
vm_endvm_start
vm_opsvm_inodevm_flags
vm_next
Data
Code
vm_area_structmm_struct
Task_struct
open()
close()
….
nopage()
swapin()
swapout()
Virtual Memory Operations
Page 25Sogang University
Process Management - Linux
Memory Mapping - ExampleMemory Mapping - ExampleMemory Mapping - ExampleMemory Mapping - Example
The vm_area_list of a process can be seen at /proc/pid/maps. An example for the init process.
$cat /proc/1/maps
08048000-0804e000 r-xp 00000000 08:03 52838 # /sbin/init - code 0804e000-0804f000 rw-p 00005000 08:03 52838 # /sbin/init - data 0804f000-08054000 rwxp 00000000 00:00 0 # bss 40000000-40012000 r-xp 00000000 08:03 36578 # /lib/ld-2.1.2.so - code 40012000-40013000 rw-p 00012000 08:03 36578 # /lib/ld-2.1.2.so - data 40018000-40103000 r-xp 00000000 08:03 36585 # /lib/libc-2.1.2.so - code 40103000-40107000 rw-p 000ea000 08:03 36585 # /lib/libc-2.1.2.so - data 40107000-4010b000 rw-p 00000000 00:00 0 # bss bfffe000-c0000000 rwxp fffff000 00:00 0 # stack
Page 26Sogang University
Process Management - Linux
Getting a New Memory RegionGetting a New Memory Region Getting a New Memory RegionGetting a New Memory Region
When the user types a command at the console -> new process.
A running process may decide to load an entirely different program.
A running process may perform a “memory mapping” on a file.
A process may keep adding data on its user mode stack.
A process may expand its dynamic area (heap) through a function call such as malloc().
A process may create an IPC shared memory region.
Page 27Sogang University
Process Management - Linux
Memory Region Access Rights (vm_flags)Memory Region Access Rights (vm_flags) Memory Region Access Rights (vm_flags)Memory Region Access Rights (vm_flags)
Page 28Sogang University
Process Management - Linux
Memory Region HandlingMemory Region Handling Memory Region HandlingMemory Region Handling
Finding the closest region to a given address find_vma() find_vma_prev()
Finding a region that overlaps a given address interval find_vma_intersection()
Finding a free address interval get_unmapped_area()
Inserting a region in the memory descriptor list insert_vm_struct()
Merging contiguous regions merge_segments()
Page 29Sogang University
Process Management - Linux
Overall Scheme for Page Fault HandlerOverall Scheme for Page Fault Handler Overall Scheme for Page Fault HandlerOverall Scheme for Page Fault Handler
Page 30Sogang University
Process Management - Linux
Flow Diagram of the Page Fault HandlerFlow Diagram of the Page Fault Handler Flow Diagram of the Page Fault HandlerFlow Diagram of the Page Fault Handler
Page 31Sogang University
Process Management - Linux
Demand Paging - Page Fault Handling (1)Demand Paging - Page Fault Handling (1)Demand Paging - Page Fault Handling (1)Demand Paging - Page Fault Handling (1)
do_page_fault() (arch/i386/mm/fault.c) - page fault handler Find the vm_area_struct that the page fault occurred in (find_vma()). If not found then illegal access, send SIGSEGV signal. else call handle_mm_fault().
handle_mm_fault() (mm/memory.c) Search of the page table entry exists. If no, allocate a new page table entry. Call handle_pte_fault().
handle_pte_fault() (mm/memory.c) switch (cause) {
case “memory not present” : call do_no_page(); break; case “protection violation” : call do_wp_page(); break; case “swap out” : call do_swap_page(); break;
}
Page 32Sogang University
Process Management - Linux
Demand Paging - Page Fault Handling (2)Demand Paging - Page Fault Handling (2)Demand Paging - Page Fault Handling (2)Demand Paging - Page Fault Handling (2)
do_no_page() (mm/memory.c) Allocate a new page, and update page table entry.
do_wp_page() (mm/memory.c) Fault by copy-on-write. Allocate a new page and copy the old page into a new page. Decrease the map count of old page by 1.
do_swap_page() (mm/memory.c) Load appropriate pages into memory from the swap area.
Page 33Sogang University
Process Management - Linux
Page Cache Management (1)Page Cache Management (1)Page Cache Management (1)Page Cache Management (1)
Purpose To speed up access to files on disk. Store pages from memory mapped files.
Page Hash Table A vector of pointers to mem_map_t. Indexed by VFS inode and the file offset. If page found, return the mem_map_t. else, allocate a physical page and read from the file.
Single Page Read Ahead Accessing the pages in the file serially.
Page 34Sogang University
Process Management - Linux
Page Cache Management (2)Page Cache Management (2)Page Cache Management (2)Page Cache Management (2)
inode
offset
next_hash
prev_hash
page_hash_table
mem_map_t mem_map_t
inode
offset
next_hash
prev_hash
12
0x2000
12
0x8000
struct page *page_hash_table[HASH_SIZE]
ˀ when a read request for a page occurs
- hash table is checked for the existence of that page
- if (exist) no need to read from the file system
Page 35Sogang University
Process Management - Linux
Swapping Out and Discarding Pages (1)Swapping Out and Discarding Pages (1)Swapping Out and Discarding Pages (1)Swapping Out and Discarding Pages (1)
Kernel Swap Daemon (kswapd - mm/vmscan.c) Kernel thread. Ensure enough free pages in the system. Started by the kernel init process at startup. Kernel swap timer (periodically awaken - basically once a second).
Variables free_pages_high free_pages_low nr_async_pages
number of pages waiting to be written to the swap file.
Page 36Sogang University
Process Management - Linux
Swapping Out and Discarding Pages (2)Swapping Out and Discarding Pages (2)Swapping Out and Discarding Pages (2)Swapping Out and Discarding Pages (2)
Swap Daemon Operation If free pages > free_pages_high do nothing. else, try three ways to free physical pages.
Reducing the size of the buffer and page caches. Swapping out System V shared memory pages. Swapping out and discarding pages.
If free pages < free_pages_low, Try to free 6 pages and sleep for half its usual time.
If free_pages_low < free pages < free_pages_high, Try to free 3 pages.
Page 37Sogang University
Process Management - Linux
Reducing the Size of the CachesReducing the Size of the CachesReducing the Size of the CachesReducing the Size of the Caches
Why shrink caches ? (mm/filemap.c) Page cache and buffer cache entries are good candidates. Relatively easy since we don’t need to write to physical devices. All processes need to suffer equally.
Method Examine a block of pages in the mem_map page vector in cyclical m
anner (clock algorithm). If cached in either the page cache or the buffer cache remove from c
ache. If all the buffers in a page are freed, then the page is also freed. (In case when the page itself is cached in the buffer cache)