Operating System Architectures - Unix
Beuth HochschuleSummer Term 2014
Operating Systems I PT / FF 2014
Modern UNIX Systems• System V Release 4 (SVR4) was a major milestone
• AT&T and Sun Microsystems (R.I.P.) combined so-far diverging Unix flavors
• Intention to provide uniform platform for commercial UNIX deployment
• Added preemptive kernel, virtual memory concepts, virtual file system support
• Solaris is the successor of Sun‘s SVR4-based UNIX release
• 4.4BSD was the final version from Berkeley university
• Meanwhile many successful derivatives, including Mac OS X
• Most modern UNIX kernels are monolithic
• All functional components of the kernel have access to all data and methods
• Loadable modules (object files) that can be linked to / unlinked from the kernel at runtime, stackable
2
Operating Systems I PT / FF 2014
System Programming in Unix• Unix system interface is a mixture of C library, POSIX, and custom functions
• Linux
• POSIX 1003.1 (mostly) + Standard C library + SVR4 + BSD functions
• Every system call has a platform-dependent symbolic constant(asm-<arch>/unistd.h) and a symbolic name
• Classes: Process management, time-related functions, signal processing, scheduling, kernel modules, file system, memory management, IPC, network, monitoring, security
• MacOS X
• BSD portion derived from FreeBSD (4.4BSD) + Standard C library + ObjC specials
• Free BSD
• POSIX 1003.1 (mostly) + Standard C library + BSD functions
3
Operating Systems I PT / FF 2014
Unix: Everything Is A File
• „The UNIX Time-Sharing System“ - D. M. Ritchie and K. Thompson, 1974
4
Operating Systems I PT / FF 2014
Unix: Everything Is A File• Hierarchical namespace of special files, ordinary files and directories
• Support for mountable sub trees in one hierarchy
• Today typically de-named as Virtual File System (VFS) concept
• Each supported I/O device is associated with at least one special file in /dev
• Read and written as ordinary files, but leads to device interaction
• Protection relies on filesystem mechanisms
• „Everything can have a file descriptor“ is a better description than „Everything is a file“ [Brown2007]
• /proc
• Special file system mounted by the kernel at boot time (since SVR4 / BSD)
• Representation of kernel information as files, possibility for user - kernel mode interaction (e.g. ps tool)
5
Operating Systems I PT / FF 2014
Linux
• Unix variant initially targeting the IBM PC, meanwhile broad adoption
• Wide number of supported platforms, source code available as ,free‘ software
• „Free as in speech, not as in beer“ [FSF]
• Monolithic kernel compiled per platform
• /linux/arch/* directory in the source code tree
• Kernel is extensible at run-time by loadable kernel modules (LKM)
• API / ABI for such modules is not stable - module binaries must fit to the kernel version being executed
• Support for versioning of kernel modules and ,tainting‘ of non-GPL drivers
• Graphic system traditionally completely in user mode
6
Operating Systems I PT / FF 2014
Linux Kernel Components
7
Operating Systems I PT / FF 2014
Linux
8
Operating Systems I PT / FF 2014
Anatomy of a Linux System Call [Mauerer]
• Handler implementations in portable C code („sys_“ prefix) spread in the sources
• Example: sys_getuid(void) in kernel/timer.c
• Kernel code performs mode switch and conversion of function parameters
• Processor registers store system call parameters and system call number(architecture-specific assembler code)
• errno.h and errno-base.h define positive error return codes, delivered as negative number to indicate that this is a problem
9
Application libc Kernel Kernel
Handler
• $0x80 call gate (IA32) • SYSENTER / SYSEXIT
(>IA32 PII) • call_pal PAL_callsys (Alpha) • sc (PowerPC) • syscall (AMD64)
Operating Systems I PT / FF 2014
Anatomy of a Linux System Call
• strace tool, based on ptrace system call
• Interception on system call boundary
• Access to process address space possible
• Hardware-supported breakpoints possible
• MacOS X: dtruss
• Solaris: truss
10
troeger@dfw:~$ strace -f -T pwd execve("/bin/pwd", ["pwd"], [/* 14 vars */]) = 0 <0.000279> brk(0) = 0x80d5000 <0.000012> access("/etc/ld.so.nohwcap", F_OK) = -1 ENOENT (No such file or directory) <0.000018> mmap2(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xb7761000 <0.000014> access("/etc/ld.so.preload", R_OK) = -1 ENOENT (No such file or directory) <0.000015> open("/etc/ld.so.cache", O_RDONLY) = 3 <0.000016> fstat64(3, {st_mode=S_IFREG|0644, st_size=48165, ...}) = 0 <0.000012> mmap2(NULL, 48165, PROT_READ, MAP_PRIVATE, 3, 0) = 0xb7755000 <0.000014> close(3) = 0 <0.000011> access("/etc/ld.so.nohwcap", F_OK) = -1 ENOENT (No such file or directory) <0.000015> open("/lib/i686/cmov/libc.so.6", O_RDONLY) = 3 <0.000019> read(3, "\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0\3\0\1\0\0\0\0n\1\0004\0\0\0"..., 512) = 512 <0.000013> fstat64(3, {st_mode=S_IFREG|0755, st_size=1327556, ...}) = 0 <0.000012> mmap2(NULL, 1337704, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0xb760e000 <0.000014> mprotect(0xb774e000, 4096, PROT_NONE) = 0 <0.000017> mmap2(0xb774f000, 12288, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x140) = 0xb774f000 <0.000018> mmap2(0xb7752000, 10600, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0xb7752000 <0.000015> close(3) = 0 <0.000012> mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xb760d000 <0.000013> set_thread_area({entry_number:-1 -> 6, base_addr:0xb760d8d0, limit:1048575, seg_32bit:1, contents:0, read_exec_only:0, limit_in_pages:1, seg_not_present:0, useable:1}) = 0 <0.000012> mprotect(0xb774f000, 8192, PROT_READ) = 0 <0.000015> mprotect(0xb777f000, 4096, PROT_READ) = 0 <0.000014> munmap(0xb7755000, 48165) = 0 <0.000018> brk(0) = 0x80d5000 <0.000011> brk(0x80f6000) = 0x80f6000 <0.000012> open("/usr/lib/locale/locale-archive", O_RDONLY|O_LARGEFILE) = 3 <0.000023> fstat64(3, {st_mode=S_IFREG|0644, st_size=108793664, ...}) = 0 <0.000011> mmap2(NULL, 2097152, PROT_READ, MAP_PRIVATE, 3, 0) = 0xb740d000 <0.000014> mmap2(NULL, 4096, PROT_READ, MAP_PRIVATE, 3, 0xf37) = 0xb7760000 <0.000014> close(3) = 0 <0.000012> getcwd("/net/pao/export/home/staff/troeger", 4096) = 35 <0.000016> fstat64(1, {st_mode=S_IFCHR|0620, st_rdev=makedev(136, 1), ...}) = 0 <0.000011> mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xb775f000 <0.000013> write(1, "/net/pao/export/home/staff/troeg"..., 35/net/pao/export/home/staff/troeger ) = 35 <0.000016> close(1) = 0 <0.000011> munmap(0xb775f000, 4096) = 0 <0.000016> close(2) = 0 <0.000011> exit_group(0) = ?
Operating Systems I PT / FF 2014
Linux Modules• Support for dynamically loaded and linked binary kernel parts - modules
• Reduces size of the compiled monolithic kernel binary
• Allows driver integration without re-compilation of the kernel
• Also solves some GPL licensing issues with modern hardware drivers
• Modules are relocatable object files that are linked into the kernel
• Kernel has table of registered functions with their address (/proc/kallsyms)
• Dynamic linker (ld.so) can load and re-locate the code accordingly (more later)
• modprobe tool, relies on insmod tool which uses the init_module system call
• Considers module dependencies determined by depmod utility (modules.dep)
• Kernel can trigger kmod daemon to automatically load missing module(request_module)
11
Operating Systems I PT / FF 2014
Linux Modules
12
Operating Systems I PT / FF 2014
Linux Modules
• Versioning
• (Binary) drivers have problems with updated kernel versions
• Optional solution is to generate signature checksums for kernel functions (genksym)
• Module compilation stores checksums of all used functions in the implementation
• Kernel may become „tainted“ if module uses symbol without demanding a specific version
13
Operating Systems I PT / FF 2014
Mac OS X / Darwin
• Mac OS X kernel is Darwin
• Kernel environment derived fromFreeBSD + Mach
• Available as open source
• Mach components: Low-level functionality(IPC, SMP, virtual memory, paging, modularity)
• I/O Kit: Framework for simplified driver development
• Network Kernel Extensions (NKE)
• Add / remove kernel modules for networking without interruption orre-compilation
14
(C) developer.apple.com
Operating Systems I PT / FF 2014
Mac OS X / Darwin
• Switch between kernel and user mode is called boundary crossing
• Darwin supports several methods
• Mach IPC / RPC: low-level, low-latency, low bandwidth
• Mach Interface Generator (MIG) implements C API from interface description
• RPC routines are grouped in subsystems (e.g. virtual memory)
• BSD syscall: not pluggable, only intended for filesystem and networking
• BSD sysctl / sysctlbyname: supersedes the syscall interface, pluggable
• Typically used to read / write kernel variables
• BSD ioctl: sends commands directly to device drivers (/dev)
• Classical mechanism from BSD
15
Operating Systems I PT / FF 2014
Summary
16
• Modern operating system tackle three major tasks
• Hide complexity and heterogeneity of the underlying hardware
• Manage system resources
• Ensure flexibility, portability and security through layering
• Fundamental concepts are processes and virtual memory
• All operating systems use ring protection support from hardware to implement user mode and kernel mode
• Applications use system API to access kernel-mode functionality
• Operating systems have pluggability support for their hardware device drivers
• All operating systems have common roots in history