Upload
bing
View
22
Download
1
Tags:
Embed Size (px)
DESCRIPTION
WINDOWS OS
Citation preview
Goals
• Hardware-portable– Used to support MIPS, PowerPC and Alpha– Currently supports x86, ia64, and amd64– Multiple vendors build hardware
• Software-portable– POSIX, OS2, and Win32 subsystems
• OS2 is dead• POSIX is still supported—separate product• Lots of Win32 software out there in the world
Goals
• High performance– Anticipated PC speeds approaching
minicomputers and mainframes– Async IO model is standard– Support for large physical memories– SMP was an early design goal– Designed to support multi-threaded processes– Kernel has to be reentrant
Process Model
• Threads and processes are distinct
• Process:– Address space– Handle table (Handles => file descriptors)– Process default security token
• Thread:– Execution Context– Optional thread-specific security token
Tokens
• “Who you are”—list of identities– Each identity is a SID
• Also contains Privileges– Shutdown, Load drivers, Backup, Debug…
• Can be passed through LPC ports and named pipe requests– Server side can use this to selectively
impersonate the client.
Object Manager
• Uniform interface to kernel mode objects.
• Handles are 32bit opaque integers
• Per-process handle table maps handles to objects and permissions on the objects
• Implements refcount GC– Pointer count—total number of references– Handle count—number of open handles
Object Manager
• Implements an object namespace– Win32 objects are under \BaseNamedObjects– Devices under \Device
• This includes filesystems
– Drive letters are symbolic links• \??\C: => the appropriate filesystem device
• Some things have other names– Processes and threads are opened by
specifying a CID: (Process.Thread)
Standard operations on handles
• CloseHandle()
• DuplicateHandle()– Takes source and destination process– Very useful for servers
• WaitForSingleObject(), WaitForMultipleObjects()– Wait for something to happen– Can wait on up to 64 handles at once
Security Descriptors
• Each object has a Security Descriptor– Owner—special SID, CREATOR_OWNER– Group—special SID, CREATOR_GROUP– DACL
• Discretionary Access Control List• List of SIDs and granted or denied access rights
– SACL• System Access Control List• List of SIDs and access rights to be audited
Access Rights
typedef struct _ACCESS_MASK { USHORT SpecificRights; UCHAR StandardRights; UCHAR AccessSystemAcl : 1; UCHAR Reserved : 3; UCHAR GenericAll : 1; UCHAR GenericExecute : 1; UCHAR GenericWrite : 1; UCHAR GenericRead : 1;} ACCESS_MASK;
Security Use
• Objects are referred to via handles• Security checks occur when an object is
opened– Open requests contain a mask of requested
access rights– If granted to the token by the DACL, the
handle contains those access rights
• Access rights are checked on use– Just a bit test—very fast
Object Open
evt = OpenEvent(EVENT_MODIFY_STATE, FALSE, "SomeName");
– Finds the event object by name– Walks the DACL, looking for token SIDs– Keeps looking until all permissions are
granted– If access is granted, inserts a handle to the
object into the process’s handle table, with EVENT_MODIFY_STATE access
Object Use
SetEvent(evt);– SetEvent() requires EVENT_MODIFY_STATE
access, and an event object.– The kernel looks up the handle in the
process’s handle table.– Checks to make sure that it maps to an event
object, and that the granted access bits contain the EVENT_MODIFY_STATE bit.
– If all is good, the event is set.
Object Use
WaitForSingleObject(evt)– WaitForSingleObject() requires a
synchronization object (like an event) and SYNCHRONIZE access.
– evt maps to an event object– SYNCHRONIZE access was not requested
when the handle was inserted.– Even if the DACL permits it, the wait fails.
Types of Objects
• Events– State is set or clear.– Can clear when a wait completes (auto-reset)
• Mutexes– Can be acquired by a single thread at a time.– Automatically release when owner exits.
• Semaphores– Maintain a count– Waits decrement the count
More objects
• Threads, Processes, Timers—like events
• Registry Keys– Manipulate data in the registry—centralized
store of system configuration info.
• LPC Ports– Fast local RPC– Security tokens can transfer over LPC calls
• Files
Files & IO
• File objects maintain a current offset, and a pointer to the underlying stream.
• Default internal model is asynchronous– Synchronous IO just waits for the IO to
complete– Async IO can set an event, or run a callback
in the thread which queued the IO, or post a message to an IO completion port.
• Each request is an IRP
IRPs
• Maintain state of IO requests, independent of the thread working on the IO
• IRPs are handed off through the device stack to their destinations– Threads process IRPs– Initiating thread processes the IRP until a
device returns STATUS_PENDING– Subsequent processing can be done in kernel
worker threads
Interrupts
IRQL—Interrupt Request Level:0 => PASSIVE_LEVEL
Processor is running threads
All usermode code is at IRQL 0
1 => APC_LEVEL; threads, APCs disabled
2 => DISPATCH_LEVEL• Running as the processor: can’t stop!• Can’t take a page fault• Only locks available are KSPIN_LOCKs
Interupts
3-26 => Device Interrupt Service Routines• Device interrupts are mapped to an IRQL and an
interrupt service routine; ISR is called at that IRQL
27 => PROFILE_LEVEL—profiling
28 => CLOCK2_LEVEL—clock interrupt
29 => IPI_LEVEL—interprocessor interrupt• Requests another processor to do something
30 => POWER_LEVEL—power failure
31 => HIGH_LEVEL—interrupts disabled
Interrupts
• Hardware signals an interrupt• Interrupt’s ISR runs at device IRQL
– Has to be fast; get off the processor and allow other ISRs to run
– Typically queues a DPC, acknowledges the interrupt, and returns
• DPC—Delayed Procedure Call– Further processing at DISPATCH_LEVEL– Queues work to kernel worker threads
IO Completion
• Driver calls IO Manager to complete the IRP
• IO Manager queues a kernel mode APC to the initiating thread
• APC: Asynchronous Procedure Call– Kernel mode APC preempts thread execution– Writes data back to user mode in the context
of the thread which initiated the IO– Signals completion of the IO
IO Cache
• Classic: block cache– Page mappings translate directly to blocks on
the underlying partition.
• Windows: stream cache– Page mappings are offsets within a stream.– IO Cache Manager uses the same mappings.– All cache management (trimming) is
centralized in the memory manager– All modifications show up in mapped views.
Virtual Memory
• Sections—another object type– Can be created to map a file– Can also be created off the pagefile– Optionally named, for shared memory
• Reservation– Range of VA which will not be handed out for
some other purpose
• Committed– VA which actually maps to something
Aside: CreateProcess
• Just a user mode Win32 API { NtCreateFile(&file, szImage); NtCreateSection(&sec, file); NtCreateProcess(&proc, sec); NtCreateThread(&thrd, proc);}
WaitForSingleObject(proc);
Virtual Memory
• Memory Manager maintains processor-specific page table entry mappings.– Some parts of the address space are shared
between processes—for instance, the kernel’s address space and the per-session space.
• On a pagefault, mm reads in the data
• Pages can be mapped without the appropriate access… what to do?
Signals
• With threads, signals don’t work very well.
• Some software designs expect to touch inaccessible memory.– Large structured files– Concurrent garbage collection– SLists
• Single global handler has to somehow know about all possible situations.
Structured Exception Handling
• Exceptions unwind the stack– Almost like C++!– C++ matches against a type hierarchy– SEH calls exception filter code—filters are
Turing-complete.
• Two ways to deal with exceptions:– try/finally– try/except
try/finally
res = AllocateSomeResource();try { SomeOperation(res);} finally { if (AbnormalTermination()) { FreeSomeResource(res); }}return res;
try/except
try {
SomeOperationWhichMayAV();
} except (Filter(
GetExceptionCode(),
GetExceptionInformation())) {
DoSomethingElse();
}
try/except
• GetExceptionCode()– A code indicating the cause of the exception
• GetExceptionInformation()– Additional code-specific info– The full processor context
• Filter decides what to do– EXCEPTION_EXECUTE_HANDLER– EXCEPTION_CONTINUE_SEARCH– EXCEPTION_CONTINUE_EXECUTION
Structured Exception Handling
• On x86, TEB points to stack of EXCEPTION_REGISTRATION_RECORD– auto structs, pointing to handler code– pushed by function prolog– popped by function epilog
• On exception, RtlDispatchException() walks the list.– Runs the filters to figure out what to do– Calls handler functions
Structured Exception Handling
• On x86, there’s some overhead with pushing and popping the registration record
• On ia64, there is no overhead– Stack traces are reliable– It’s always possible to look up the handler
• Exception handling is very slow– Especially on ia64
• Used only for truly exceptional conditions
Structured Exception Handling
• Used in kernel mode too!– Most user mode access will just work– Still need to validate address ranges & data– Works great for SMP when another thread
might be in the middle of modifying the address space
– Expected read exceptions are returned as status codes from system calls
– Expected writes are returned as SUCCESS– Unexpected => buggy kernel => blue screen
Top-level Exception Filter
• Top frame on each thread defines a catchall exception filter
• Top-level exception filter:– Notifies the debugger (if being debugged)– Launches a just-in-time debugger (if set up)– Loads faultrep.dll to report the failure
Faultrep.dll
• faultrep.dll offers to report the failure back to Microsoft
• We analyze the failures– A significant number are recognized instantly;
we can tell the user what happened and how to fix it.
– The others go through the standard triage process; developers analyze the dumps and figure out what happened.
OCA
• 67 million machines running XP
• Tens of thousands of drivers
• Over 100 drivers on any given machine
• One bug in one driver => Crash
• A significant number of crashes come from third-party drivers (some of which ship on the CD)
• Lots of different problems, though
Driver Verifier
• Controlled by verifier.exe
• Special-pool’s allocations– Detects allocation overruns & use after free
• Validates some behaviors– IRQL—touching paged memory?– DMA buffers
• Can inject failures—useful for testing behavior under sub-optimal conditions
Stress
• Every night, a couple hundred machines run stress on the latest build
• Stress exercises filesystems, memory, GUI, scheduler, &c, trying to uncover low-memory handling problems and race conditions
• Every morning, the stress test team triages failed machines
• Developers debug the failures