76
Using Projects Based on Internal NT APIs to Teach OS Principles Microsoft Research/Asia - Beijing January 2005 Dave Probert, Ph.D. Architect, Windows Kernel Group Windows Core Operating Systems Division Microsoft Corporation

Using Projects Based on Internal NT APIs to Teach OS Principles Microsoft Research/Asia - Beijing January 2005 Dave Probert, Ph.D. Architect, Windows Kernel

Embed Size (px)

Citation preview

Page 1: Using Projects Based on Internal NT APIs to Teach OS Principles Microsoft Research/Asia - Beijing January 2005 Dave Probert, Ph.D. Architect, Windows Kernel

Using Projects Based on Internal NT APIs to Teach OS Principles

Microsoft Research/Asia - BeijingJanuary 2005

Dave Probert, Ph.D.Architect, Windows Kernel Group

Windows Core Operating Systems DivisionMicrosoft Corporation

Page 2: Using Projects Based on Internal NT APIs to Teach OS Principles Microsoft Research/Asia - Beijing January 2005 Dave Probert, Ph.D. Architect, Windows Kernel

© Microsoft Corporation 2005 2

Overview of Presentation

Microsoft’s motivation / my motivationOverview of the Oz Project CourseNT Architecture overviewRelevant NT features for OzNT API overviewOS principles and terminologyOz project areasThe plan

Page 3: Using Projects Based on Internal NT APIs to Teach OS Principles Microsoft Research/Asia - Beijing January 2005 Dave Probert, Ph.D. Architect, Windows Kernel

© Microsoft Corporation 2005 3

The Oz Project Course

ObjectivesProvide an environment to build a rich set of projects

that explore OS principles by leveraging the NT subsystem model for implementing OS personalities

Use real OS features rather than a ‘toy’ simulation

Reduce the complexity required to learn build experiments

A simple development environment, using standard tools for building, debugging, and instrumentation

Encourage ‘out-of-the-box’ thinking by students

Page 4: Using Projects Based on Internal NT APIs to Teach OS Principles Microsoft Research/Asia - Beijing January 2005 Dave Probert, Ph.D. Architect, Windows Kernel

© Microsoft Corporation 2005 4

The Oz Project Course

CaveatWe are presenting some of our plans very early

in their development to entice feedback from faculty teaching OS principles to undergrads

Anything presented now may easily change as the project evolves and we incorporate the advice we are given

Page 5: Using Projects Based on Internal NT APIs to Teach OS Principles Microsoft Research/Asia - Beijing January 2005 Dave Probert, Ph.D. Architect, Windows Kernel

© Microsoft Corporation 2005 5

The Oz Project Course

Oz SummaryLibrary of functions that wrap the native NT APIs to

provide access to low-level primitives to provide address spaces, threads, exceptions, and IPC

Languages: C/C++, C#, Java – but not SchemeA runtime of support functions that simplify student

projectsDocumentation for the Oz functions/runtimeA rich set of projects, with many variations, that allow

students to explore qualitatively (& quantitatively) a large assortment of OS principles

Tools for instrumentation and measurement

Page 6: Using Projects Based on Internal NT APIs to Teach OS Principles Microsoft Research/Asia - Beijing January 2005 Dave Probert, Ph.D. Architect, Windows Kernel

© Microsoft Corporation 2005 6

Overview of Windows Architecture

NT is not a microkernel, but does support user-mode OS personalities (i.e. for posix, OS/2, Win32)

Primary supported programming interface: Win32Win32 and other subsystems built on native NT APIsNT APIs generally not documented (not intended as the

supported programming model) – but specific APIs are document in the DDK

Kernel implementation organized around the object manager

NT APIs are rich (many parameters) and need refactoring and simplification for student use

Page 7: Using Projects Based on Internal NT APIs to Teach OS Principles Microsoft Research/Asia - Beijing January 2005 Dave Probert, Ph.D. Architect, Windows Kernel

© Microsoft Corporation 2005 7

Windows Architecture

User-mode

Kernel-mode Trap interface / LPC

ntdll / run-time library

Win32 GUIProcs & threads

Kernel run-time / Hardware Adaptation Layer

Virtual memoryIO ManagerSecurity refmon

Cache mgr

File filters

File systems

Volume mgrs

Device stacks

Scheduler

Kernel32 User32 / GDI

DLLs

Applications

System Services

Object Manager / Configuration Management

FS run-time

exec synchr

Subsystemservers

Login/GINA

Critical services

Page 8: Using Projects Based on Internal NT APIs to Teach OS Principles Microsoft Research/Asia - Beijing January 2005 Dave Probert, Ph.D. Architect, Windows Kernel

© Microsoft Corporation 2005 8

Processes and Threads

Page 9: Using Projects Based on Internal NT APIs to Teach OS Principles Microsoft Research/Asia - Beijing January 2005 Dave Probert, Ph.D. Architect, Windows Kernel

© Microsoft Corporation 2005 9

Process

Container for an address space and threads

Represented by EPROCESS that includes a KPROCESS

Associated User-mode Process Environment Block (PEB)

Primary Access Token

Quota block, Debug port, Handle Table, etc.

Unique process ID

MM structures like the WorkingSet, VAD tree, AWE, etc.

Queued to the global process list and session list

Optionally queued to a job’s process list

Page 10: Using Projects Based on Internal NT APIs to Teach OS Principles Microsoft Research/Asia - Beijing January 2005 Dave Probert, Ph.D. Architect, Windows Kernel

© Microsoft Corporation 2005 10

Thread

Fundamental schedulable entity in the systemRepresented by ETHREAD that includes a KTHREADQueued to the process (both E and K thread)IRP listImpersonation Access TokenUnique thread IDAssociated User-mode Thread Environment Block (TEB)User-mode stackKernel-mode stack

Page 11: Using Projects Based on Internal NT APIs to Teach OS Principles Microsoft Research/Asia - Beijing January 2005 Dave Probert, Ph.D. Architect, Windows Kernel

© Microsoft Corporation 2005 11

Process/Thread structure

ObjectManager

Any HandleTable

ProcessObject

Process’Handle Table

VirtualAddress

Descriptors

Thread

Thread

Thread

Thread

Thread

Thread

Files

Events

Devices

Drivers

Page 12: Using Projects Based on Internal NT APIs to Teach OS Principles Microsoft Research/Asia - Beijing January 2005 Dave Probert, Ph.D. Architect, Windows Kernel

© Microsoft Corporation 2005 12

System DLL

Core user-mode functionality in the system dynamic link library (DLL) ntdll.dll

Mapped during process address space setup by the kernel

Contains all core system service entry points

User-mode trampoline points for:– Process/thread startup– Exception dispatch– User APC dispatch– Kernel-user callouts

Page 13: Using Projects Based on Internal NT APIs to Teach OS Principles Microsoft Research/Asia - Beijing January 2005 Dave Probert, Ph.D. Architect, Windows Kernel

© Microsoft Corporation 2005 13

SchedulingWindows schedules threads, not processes

Scheduling is preemptive, priority-based, and round-robin at the highest-priority

16 real-time priorities above 16 normal prioritiesScheduler tries to keep a thread on its ideal processor/node to avoid perf

degradation of cache/NUMA-memoryThreads can specify affinity mask to run only on certain processors

Each thread has a current & base priorityBase priority initialized from processNon-realtime threads have priority boost/decay from baseBoosts for GUI foreground, waking for eventPriority decays, particularly if thread is CPU bound (running at quantum

end)

Scheduler is a complex state machineResponds to thread priority change, quantum expiration, blocking, etc.

Priority inversions can lead to starvationBalance Set Manager periodically boosts non-running ready threads

Page 14: Using Projects Based on Internal NT APIs to Teach OS Principles Microsoft Research/Asia - Beijing January 2005 Dave Probert, Ph.D. Architect, Windows Kernel

© Microsoft Corporation 2005 14

Thread scheduling

states

Initialized

Ready

Terminated Running

Standby

DeferredReady

Waiting

KeInitThread

KeTerminateThread

Transitionk stack

swapped

KiUnwaitThreadKiReadyThread

KiQuantumEndKiIdleScheduleKiSwapThreadKiExitDispatcherNtYieldExecution

Kernel Thread Transition [email protected]

2003/04/06 v0.4b

Idleprocessor

orpreemption

KiInsertDeferredReadyList

preemption

preemption

KiRetireDpcList/KiSwapThread/KiExitDispatcherKiProcessDeferredReadyListKiDeferredReadyThread

no avail.processor

KiSelectNextThread

PspCreateThreadKiReadyThreadKiInsertDeferredReadyList

Affinityok

Affinitynot ok

KiSetAffinityThreadKiSetpriorityThread

Readyprocessswapped

KiReadyThread

Page 15: Using Projects Based on Internal NT APIs to Teach OS Principles Microsoft Research/Asia - Beijing January 2005 Dave Probert, Ph.D. Architect, Windows Kernel

© Microsoft Corporation 2005 15

Thread scheduling states• Main quasi-states:

– Ready – able to run– Running – current thread on a processor– Waiting – waiting an event

• For scalability Ready is three real states:– DeferredReady – queued on any processor– Standby – will imminently start Running– Ready – queue on target processor by priority

• Goal is granular locking of thread priority queues

• Red states related to outswapped stacks and processes

Page 16: Using Projects Based on Internal NT APIs to Teach OS Principles Microsoft Research/Asia - Beijing January 2005 Dave Probert, Ph.D. Architect, Windows Kernel

© Microsoft Corporation 2005 17

NT Subsystem model

Page 17: Using Projects Based on Internal NT APIs to Teach OS Principles Microsoft Research/Asia - Beijing January 2005 Dave Probert, Ph.D. Architect, Windows Kernel

© Microsoft Corporation 2005 18

Subsystem implementation

Four components:1. Hook in system CreateProcess

2. Subsystem run-time libraries

3. Subsystem server process

4. Kernel-level support for inter-op of certain subsystem features

Communication via NT LPC (Lightweight remote Procedure Calls)

System integration can vary between runtime and subsystem process

Page 18: Using Projects Based on Internal NT APIs to Teach OS Principles Microsoft Research/Asia - Beijing January 2005 Dave Probert, Ph.D. Architect, Windows Kernel

© Microsoft Corporation 2005 19

Hardware Abstraction Layer

Windows NT Kernel

POSIX/UNIX Subsystem

UNIX /POSIX APIsBSD

Sockets

UNIX, XPG,POSIX.2

commands & utilities

UNIXshells

telnetd

Workshoptools: gcc, g++

perl, Apache,Tcl/Tk, bash, etc. X11

UNIX

SDK

UNIXApplications

Win32 Subsystem

Win32APIs

Windows NT sysadmin, commands

& networking

Win95GUI

winsock

Windows NT

commandShell

X11R6.3server

WindowsAppli-

cations

NFSClient Server Gateway

WindowsAppli-

cations

Interix Architecture

Motif

Page 19: Using Projects Based on Internal NT APIs to Teach OS Principles Microsoft Research/Asia - Beijing January 2005 Dave Probert, Ph.D. Architect, Windows Kernel

© Microsoft Corporation 2005 20

POSIX.EXE

Interix Process Startup

• W calls CreateProcess(“I.EXE”)• API changes command line

toPOSIX.EXE /C I.EXE

• POSIX.EXE sets up console, environment, argv, etc. and sends LPC to PSXSS

• PSXSS calls NtCreateProcess() to start I

PSXSS.EXE

I

W

Page 20: Using Projects Based on Internal NT APIs to Teach OS Principles Microsoft Research/Asia - Beijing January 2005 Dave Probert, Ph.D. Architect, Windows Kernel

© Microsoft Corporation 2005 21

Virtual Memory

Page 21: Using Projects Based on Internal NT APIs to Teach OS Principles Microsoft Research/Asia - Beijing January 2005 Dave Probert, Ph.D. Architect, Windows Kernel

© Microsoft Corporation 2005 22

Virtual Memory ManagerFeatures

Provides 4 GB flat virtual address space (IA32)Manages process address space (RESERVE)Handles pagefaultsManages process working setsManages physical memory & pagefile (COMMIT)Provides memory-mapped filesAllows pages shared between processesFacilities for I/O subsystem and device driversSupports file system cache managerProvides direct address space / memory control (AWE)Implements kernel-mode heap (aka POOL)

Page 22: Using Projects Based on Internal NT APIs to Teach OS Principles Microsoft Research/Asia - Beijing January 2005 Dave Probert, Ph.D. Architect, Windows Kernel

© Microsoft Corporation 2005 23

Paging OverviewWorking Sets: list of valid pages for each process

(and the kernel)Pages ‘trimmed’ from working set on lists

Standby list: pages backed by diskModified list: dirty pages to push to diskFree list: pages not associated with diskZero list: supply of demand-zero pages

Modify/standby pages can be faulted back into a working set w/o disk activity (soft fault)

Background system threads trim working sets, write modified pages and produce zero pages based on memory state and config parameters

Page 23: Using Projects Based on Internal NT APIs to Teach OS Principles Microsoft Research/Asia - Beijing January 2005 Dave Probert, Ph.D. Architect, Windows Kernel

© Microsoft Corporation 2005 24

Managing Working SetsAging pages: Increment age counts for pages

which haven't been accessedEstimate unused pages: count in working set and

keep a global count of estimateWhen getting tight on memory: replace rather

than add pages when a fault occurs in a working set with significant unused pages

When memory is tight: reduce (trim) working sets which are above their maximum

Balance Set Manager: periodically runs Working Set Trimmer, also swaps out kernel stacks of long-waiting threads

Page 24: Using Projects Based on Internal NT APIs to Teach OS Principles Microsoft Research/Asia - Beijing January 2005 Dave Probert, Ph.D. Architect, Windows Kernel

© Microsoft Corporation 2005 25

Object Manager & Handle Tables

Page 25: Using Projects Based on Internal NT APIs to Teach OS Principles Microsoft Research/Asia - Beijing January 2005 Dave Probert, Ph.D. Architect, Windows Kernel

© Microsoft Corporation 2005 26

Kernel Object Manager (OB)

• Provides underlying NT namespace

• Unifies kernel data structure referencing

• Unifies user-mode referencing via handles

• Simplifies resource charging

• Central facility for security protection

Page 26: Using Projects Based on Internal NT APIs to Teach OS Principles Microsoft Research/Asia - Beijing January 2005 Dave Probert, Ph.D. Architect, Windows Kernel

© Microsoft Corporation 2005 27

OBJECT_HEADER

PointerCount

HandleCount

pObjectType

pQuotaBlockCharged

pSecurityDescriptor

OBJECT BODY

offNameInfo offHandleInfo offQuotaInfo Flags

CreateInfo + NameInfo + HandleInfo + QuotaInfo

Page 27: Using Projects Based on Internal NT APIs to Teach OS Principles Microsoft Research/Asia - Beijing January 2005 Dave Probert, Ph.D. Architect, Windows Kernel

© Microsoft Corporation 2005 28

DISPATCHER_HEADERFundamental kernel synchronization mechanism

Equivalent to a KEVENT

Inserted

SignalState

WaitListHead.flink

WaitListHead.blink

Absolute TypeSize

Page 28: Using Projects Based on Internal NT APIs to Teach OS Principles Microsoft Research/Asia - Beijing January 2005 Dave Probert, Ph.D. Architect, Windows Kernel

© Microsoft Corporation 2005 29

WaitListHead WaitListEntry

WaitBlockList

KPRCB Thread Thread

WaitListEntry

WaitBlockList

WaitListEntry

NextWaitBlock

WaitBlock

WaitListEntry

NextWaitBlock

WaitBlock

WaitListEntry

NextWaitBlock

WaitBlock

WaitListEntry

NextWaitBlock

WaitBlock

WaitListEntry

NextWaitBlock

WaitBlock

WaitListEntry

NextWaitBlock

WaitBlock

WaitListHead

Object->Header

Signaled

WaitListHead

Object->Header

Signaled

WaitListHead

Object->Header

Signaled

WaitListHead

Object->Header

Signaled

Page 29: Using Projects Based on Internal NT APIs to Teach OS Principles Microsoft Research/Asia - Beijing January 2005 Dave Probert, Ph.D. Architect, Windows Kernel

© Microsoft Corporation 2005 30

Object Methods

OPEN: Create/Open/Dup/Inherit handle

CLOSE: Called when each handle closed

DELETE: Called on last dereference

PARSE: Called looking up objects by name

SECURITY: Usually SeDefaultObjectMethod

QUERYNAME: Return object-specific name

OKAYTOCLOSE: Give veto on handle close

Page 30: Using Projects Based on Internal NT APIs to Teach OS Principles Microsoft Research/Asia - Beijing January 2005 Dave Probert, Ph.D. Architect, Windows Kernel

© Microsoft Corporation 2005 32

Process/Thread TypesJob - JOB

DeleteProcedure = PspJobDelete

CloseProcedure = PspJobClose

Process - EPROCESSDeleteProcedure = PspProcessDelete

Profile - EPROFILEDeleteProcedure = ExpProfileDelete

Section - SECTIONDeleteProcedure = MiSectionDelete

Thread - ETHREADDeleteProcedure = PspThreadDelete

Token - TOKENDeleteProcedure = SepTokenDeleteMethod

Page 31: Using Projects Based on Internal NT APIs to Teach OS Principles Microsoft Research/Asia - Beijing January 2005 Dave Probert, Ph.D. Architect, Windows Kernel

© Microsoft Corporation 2005 33

Synchronization TypesEvent - KEVENT

EventPair - EEVENT_PAIR

KeyedEvent - KEYED_EVENT_OBJECT

Mutant - KMUTANTDeleteProcedure = ExpDeleteMutant

Port - LPCP_PORT_OBJECTDeleteProcedure = LpcpDeletePort

CloseProcedure = LpcpClosePort

Semaphore - KSEMAPHORE

Timer - ETIMERDeleteProcedure = ExpDeleteTimer

Page 32: Using Projects Based on Internal NT APIs to Teach OS Principles Microsoft Research/Asia - Beijing January 2005 Dave Probert, Ph.D. Architect, Windows Kernel

© Microsoft Corporation 2005 34

Handle Table (Executive)

• Efficient, scalable object index structure

• One per process containing ‘open’ objects

• Kernel handle table (system process)

• Also used to allocate process/thread IDs

Page 33: Using Projects Based on Internal NT APIs to Teach OS Principles Microsoft Research/Asia - Beijing January 2005 Dave Probert, Ph.D. Architect, Windows Kernel

© Microsoft Corporation 2005 35

Process Handle Tables

pHandleTable

EPROCESS

pHandleTable

EPROCESS

SystemProcess

Handle Table

Handle Table

Kernel Handles

object

object

object

objectobject

Page 34: Using Projects Based on Internal NT APIs to Teach OS Principles Microsoft Research/Asia - Beijing January 2005 Dave Probert, Ph.D. Architect, Windows Kernel

© Microsoft Corporation 2005 36

One level: (to 512 handles)

TableCode

A: Handle Table Entries [512 ]Handle Table

Object

ObjectObject

Page 35: Using Projects Based on Internal NT APIs to Teach OS Principles Microsoft Research/Asia - Beijing January 2005 Dave Probert, Ph.D. Architect, Windows Kernel

© Microsoft Corporation 2005 37

Two levels: (to 512K handles)

TableCode

A: Handle Table Entries [512 ]

Handle Table

Object

ObjectObject

B: Handle Table Pointers [1024 ]

C: Handle Table Entries [512 ]

Page 36: Using Projects Based on Internal NT APIs to Teach OS Principles Microsoft Research/Asia - Beijing January 2005 Dave Probert, Ph.D. Architect, Windows Kernel

© Microsoft Corporation 2005 38

Three levels: (to 16M handles)

TableCode

A: Handle Table Entries [512 ]

Handle Table

Object

ObjectObject

B: Handle Table Pointers [1024 ]

C: Handle Table Entries [512 ]

D: Handle Table Pointers [32 ]

E: Handle Table Pointers [1024 ]

F: Handle Table Entries [512 ]

Page 37: Using Projects Based on Internal NT APIs to Teach OS Principles Microsoft Research/Asia - Beijing January 2005 Dave Probert, Ph.D. Architect, Windows Kernel

© Microsoft Corporation 2005 39

Handle Table FunctionsExCreateHandleTable – create non-process tablesExDupHandleTable – called creating processes

ExSweepHandleTable – for process rundownExDestroyHandleTable – called destroying processes

ExCreateHandle – setup new handle table entryExChangeHandle – used to set inherit and/or protectExDestroyHandle – implements CloseHandleExMapHandleToPointer – reference underlying object

ExReferenceHandleDebugInfo – tracing handlesExSnapShotHandleTables – handle searchers (oh.exe)

Page 38: Using Projects Based on Internal NT APIs to Teach OS Principles Microsoft Research/Asia - Beijing January 2005 Dave Probert, Ph.D. Architect, Windows Kernel

© Microsoft Corporation 2005 40

Object Manager Summary

• Manages the NT namespace• Common scheme for managing resources• Extensible method-based model for building

system objects• Memory management based on reference

counting• Uniform/centralized security model• Support handle-based access of system objects• Common, uniform mechanisms for using system

resources

Page 39: Using Projects Based on Internal NT APIs to Teach OS Principles Microsoft Research/Asia - Beijing January 2005 Dave Probert, Ph.D. Architect, Windows Kernel

© Microsoft Corporation 2005 41

Input/Output system

Page 40: Using Projects Based on Internal NT APIs to Teach OS Principles Microsoft Research/Asia - Beijing January 2005 Dave Probert, Ph.D. Architect, Windows Kernel

© Microsoft Corporation 2005 42

Windows I/O ModelAsychronous, Packet-based, ExtensibleDevice discovery supports plug-and-play

— volumes automatically detected and mounted— power management support (ACPI)

Drivers attach to per device driver stacks— Drivers can filter actions of other drivers in each stack

Integrated kernel support— memory Manager provides DMA support— HAL provides device access, PnP manages device resources— Cache manager provides file-level caching via MM file-mapping

Multiple I/O completion mechanisms:—synchronous—update user-mode memory status—signal events—callbacks within initiating thread—reaped by threads waiting on an I/O Completion Port

Page 41: Using Projects Based on Internal NT APIs to Teach OS Principles Microsoft Research/Asia - Beijing January 2005 Dave Probert, Ph.D. Architect, Windows Kernel

© Microsoft Corporation 2005 43

IO Request Packet (IRP)

IO operations encapsulated in IRPs

IO requests travel down a driver stack in an IRP

Each driver gets an IRP stack location which contains parameters for that IO request

IRP has major and minor codes to describe IO operations

Major codes include create, read, write, PNP, devioctl, cleanup and close

Irps are associated with the thread that made the IO request

Page 42: Using Projects Based on Internal NT APIs to Teach OS Principles Microsoft Research/Asia - Beijing January 2005 Dave Probert, Ph.D. Architect, Windows Kernel

© Microsoft Corporation 2005 44

IRP Fields

Flags

Buffer Pointers

MDL Chain

Thread’s IRPs

Completion/Cancel Info

Completion

APC block

Driver

Queuing

& Comm.

System

UserMDL

Thread

IRP Stack Locations

Page 43: Using Projects Based on Internal NT APIs to Teach OS Principles Microsoft Research/Asia - Beijing January 2005 Dave Probert, Ph.D. Architect, Windows Kernel

© Microsoft Corporation 2005 45

Object Relationships

Driver Object

DeviceObject

DeviceObject

DeviceObject

DeviceObject

DeviceObject

File ObjectDriver Object

File Object

Volume

Page 44: Using Projects Based on Internal NT APIs to Teach OS Principles Microsoft Research/Asia - Beijing January 2005 Dave Probert, Ph.D. Architect, Windows Kernel

© Microsoft Corporation 2005 46

Layering DriversDevice objects attach one on top of another using

IoAttachDevice* APIs creating device stacks– IO manager sends IRP to top of the stack– drivers store next lower device object in their private

data structure– stack tear down done using IoDetachDevice and

IoDeleteDeviceDevice objects point to driver objects

– driver represent driver state, including dispatch tableFile objects point to open filesFile systems are drivers which manage file objects for

volumes (described by VolumeParameterBlocks)

Page 45: Using Projects Based on Internal NT APIs to Teach OS Principles Microsoft Research/Asia - Beijing January 2005 Dave Probert, Ph.D. Architect, Windows Kernel

© Microsoft Corporation 2005 47

Each IRP Stack Location

Major/Minor Function Codes

Flags & Control

MDL Chain

Parameters:

DeviceObject

FileObject

Completion Routine & Parameter

Create: security, optionsRead: len, key, offset

DevObj

FileObj

DrvrObj

Page 46: Using Projects Based on Internal NT APIs to Teach OS Principles Microsoft Research/Asia - Beijing January 2005 Dave Probert, Ph.D. Architect, Windows Kernel

© Microsoft Corporation 2005 48

IRP flow of control (synchronous)IOMgr (e.g. IopParseDevice) creates IRP, fills in top

stack location, calls IoCallDriver to pass to stackdriver determined by top device object on device stackdriver passed the device object and IRP

IoCallDrivercopies stack location for next driverdriver routine determined by major function in drvobj

Each driver in turndoes work on IRP, if desiredkeeps track in the device object of the next stack device

Calls IoCallDriver on next deviceEventually bottom driver completes IO and returns on callstack

Page 47: Using Projects Based on Internal NT APIs to Teach OS Principles Microsoft Research/Asia - Beijing January 2005 Dave Probert, Ph.D. Architect, Windows Kernel

© Microsoft Corporation 2005 49

IRP flow of control (asynch)

Eventually a driver decides to be asynchronousdriver queues IRP for further processingdriver returns STATUS_PENDING up call stackhigher drivers may return all the way to user, or may

wait for IO to complete (synchronizing the stack)Eventually a driver decides IO is complete

usually due to an interrupt/DPC completing IOeach completion routine in device stack is called,

possibly at DPC or in arbitrary thread contextIRP turned into APC request delivered to original threadAPC runs final completion, accessing process memory

Page 48: Using Projects Based on Internal NT APIs to Teach OS Principles Microsoft Research/Asia - Beijing January 2005 Dave Probert, Ph.D. Architect, Windows Kernel

© Microsoft Corporation 2005 50

Path of an Async IO request

Fileobject

Devobj1

Devobj2

Security and access validation

Allocate IRP

Devobj1Dispatch routine

Devobj2Dispatch routine Interrupt service

routine

DPC routine

IoCompleteRequest

User APCs Completion ports

NtReadFile(Handle,…..)

Handle

IO Special APC

Page 49: Using Projects Based on Internal NT APIs to Teach OS Principles Microsoft Research/Asia - Beijing January 2005 Dave Probert, Ph.D. Architect, Windows Kernel

© Microsoft Corporation 2005 52

File System Device Stack

NT I/O Manager

File System Filters

File System DriverCache Manager

Virtual MemoryManager

Application

Kernel32 / ntdlluserkernel

Partition/VolumeStorage Manager

Disk Class Manager

Disk Driver

DISK

Page 50: Using Projects Based on Internal NT APIs to Teach OS Principles Microsoft Research/Asia - Beijing January 2005 Dave Probert, Ph.D. Architect, Windows Kernel

© Microsoft Corporation 2005 53

Relevant NT features

Page 51: Using Projects Based on Internal NT APIs to Teach OS Principles Microsoft Research/Asia - Beijing January 2005 Dave Probert, Ph.D. Architect, Windows Kernel

© Microsoft Corporation 2005 54

Object Manager– unified synchronization– handles on objects can be duplicated cross-process – automatic clean-up (internal reference counting)– built-in namespace management

NT API– most APIs take arbitrary handles on objects (e.g. processes,

threads)– operations are fairly low-level (for handles, virtual memory,

threads, exceptions)Virtual Memory

– Separation of address space and physical memory management– Rich functionality (shared memory, access to dirty bits, AWE)

IO– Inherently asynchronous

LPC (lightweight IPC)– supports efficient cross-process communication– cross-process exception processing (exception ports)

Page 52: Using Projects Based on Internal NT APIs to Teach OS Principles Microsoft Research/Asia - Beijing January 2005 Dave Probert, Ph.D. Architect, Windows Kernel

© Microsoft Corporation 2005 55

Native NT APIs

Page 53: Using Projects Based on Internal NT APIs to Teach OS Principles Microsoft Research/Asia - Beijing January 2005 Dave Probert, Ph.D. Architect, Windows Kernel

© Microsoft Corporation 2005 56

Native NT Process/Thread APIs

NtCreateProcess()NtQuery/SetInformationProcess()NtGetNextProcess()NtGetNextThread()NtOpenProcess()NtOpenThread()NtSuspendProcess()NtResumeProcess()

NtCreateJobObject()

NtOpenJobObject()

NtAssignProcessToJobObject()

NtQuery/SetInformationJobObject()

NtTerminateJobObject()

NtIsProcessInJob()

NtCreateJobSet()NtCreateThread()NtTerminateThread()NtSuspendThread()NtResumeThread()NtGet/SetContextThread()NtQuery/SetInformationThread()NtAlertThread()NtQueueApcThread()NtTerminateProcess()

Page 54: Using Projects Based on Internal NT APIs to Teach OS Principles Microsoft Research/Asia - Beijing January 2005 Dave Probert, Ph.D. Architect, Windows Kernel

© Microsoft Corporation 2005 57

Virtual Memory Manager NT Internal APIs

NtCreatePagingFile

NtAllocateVirtualMemory (Proc, Addr, Size, Type, Prot)Process: handle to a process

Protection: NOACCESS, EXECUTE, READONLY, READWRITE, NOCACHE

Flags: COMMIT, RESERVE, PHYSICAL, TOP_DOWN, RESET, LARGE_PAGES, WRITE_WATCH

NtFreeVirtualMemory(Process, Address, Size, FreeType)FreeType: DECOMMIT or RELEASE

NtQueryVirtualMemory

Page 55: Using Projects Based on Internal NT APIs to Teach OS Principles Microsoft Research/Asia - Beijing January 2005 Dave Probert, Ph.D. Architect, Windows Kernel

© Microsoft Corporation 2005 58

Virtual Memory Manager NT Internal APIs

NtProtectVirtualMemory

Pagefault

NtLockVirtualMemory, NtUnlockVirtualMemory– locks a region of pages within the working set list– requires PROCESS_VM_OPERATION on target

process and SeLockMemoryPrivilegeNtReadVirtualMemory, NtWriteVirtualMemory (

Proc, Addr, Buffer, Size)NtFlushVirtualMemory

Page 56: Using Projects Based on Internal NT APIs to Teach OS Principles Microsoft Research/Asia - Beijing January 2005 Dave Probert, Ph.D. Architect, Windows Kernel

© Microsoft Corporation 2005 59

Virtual Memory Manager NT Internal APIs

NtCreateSection– creates a section but does not map it

NtOpenSection– opens an existing section

NtQuerySection– query attributes for section

NtExtendSection

NtMapViewOfSection (Sect, Proc, Addr, Size, …)

NtUnmapViewOfSection

Page 57: Using Projects Based on Internal NT APIs to Teach OS Principles Microsoft Research/Asia - Beijing January 2005 Dave Probert, Ph.D. Architect, Windows Kernel

© Microsoft Corporation 2005 60

Virtual Memory Manager NT Internal APIs

APIs to support AWE (Address Windowing Extensions)– Private memory only– Map only in current process– Requires LOCK_VM privilege

NtAllocateUserPhysicalPages (Proc, NPages, &PFNs[])NtMapUserPhysicalPages (Addr, NPages, PFNs[])NtMapUserPhysicalPagesScatterNtFreeUserPhysicalPages (Proc, &NPages, PFNs[])

NtResetWriteWatchNtGetWriteWatch

Read out dirty bits for a section of memory since last reset

Page 58: Using Projects Based on Internal NT APIs to Teach OS Principles Microsoft Research/Asia - Beijing January 2005 Dave Probert, Ph.D. Architect, Windows Kernel

© Microsoft Corporation 2005 61

Generic object services

• NAMESPACE ops: directories, symlinks• NtQueryObject• NtQuery/SetSecurityObject• NtWaitForSingle/MultipleObjects• ObOpenObjectByName/Pointer• ObReferenceObjectbyName/Handle• NtDuplicateObject• NtClose• ObDereferenceObject

Page 59: Using Projects Based on Internal NT APIs to Teach OS Principles Microsoft Research/Asia - Beijing January 2005 Dave Probert, Ph.D. Architect, Windows Kernel

© Microsoft Corporation 2005 62

NT IO APIsEstablish IO handles• NtCreateFile• NtOpenFile• NtCreateNamedPipeFile• NtCreateMailslotFileIO Completion APIs• NtCreateIoCompletion• NtOpenIoCompletion• NtQueryIoCompletion• NtSetIoCompletion• NtRemoveIoCompletion

Actual IO operations• NtReadFile• NtReadFileScatter• NtWriteFile• NtWriteFileGather• NtCancelIoFile• NtFlushBuffersFile

File operations• NtLockFile• NtUnlockFile• NtDeleteFile

Page 60: Using Projects Based on Internal NT APIs to Teach OS Principles Microsoft Research/Asia - Beijing January 2005 Dave Probert, Ph.D. Architect, Windows Kernel

© Microsoft Corporation 2005 63

NT IO APIs - 2Meta IO operationsNtFsControlFileNtDeviceIoControlFileNtQueryDirectoryFileNtQueryAttributesFileNtQueryFullAttributesFileNtQueryEaFileNtSetEaFileNtQueryInformationFileNtSetInformationFileNtNotifyChangeDirectoryFile

Administrative operations

NtLoadDriver

NtUnloadDriver

NtQueryVolumeInformationFile

NtSetVolumeInformationFile

NtQueryQuotaInformationFile

NtSetQuotaInformationFile

Page 61: Using Projects Based on Internal NT APIs to Teach OS Principles Microsoft Research/Asia - Beijing January 2005 Dave Probert, Ph.D. Architect, Windows Kernel

© Microsoft Corporation 2005 64

OS Principles

Page 62: Using Projects Based on Internal NT APIs to Teach OS Principles Microsoft Research/Asia - Beijing January 2005 Dave Probert, Ph.D. Architect, Windows Kernel

© Microsoft Corporation 2005 65

Basic DistinctionsLibraries vs Kernel

– protecting shared state– access to privileged architecture

Kernel vs Services– primarily performance and existing processors– shared address space

Abstractions: Necessity, Convenience, Efficiency– access and protection vs.– managing the kernel/user boundary vs.– performance

Page 63: Using Projects Based on Internal NT APIs to Teach OS Principles Microsoft Research/Asia - Beijing January 2005 Dave Probert, Ph.D. Architect, Windows Kernel

© Microsoft Corporation 2005 66

OS Services as ‘User-mode’ Services

Microkernel approach– limited kernel services: threads, VM, IPC– all other services in user-mode process

NT approach– rich kernel – primarily for subsystems (originally)– range of integration (e.g. Posix vs Win32)– heavy reliance on services (daemons)

SPACE (and Pebble)– abstracts physical resources (processors, mmu, traps)– kernel is just another ‘user-mode’ service– can build more conventional systems on top

The Oz Project Course– Use NT threads, VM, LPC/exceptions to abstract processors,

address spaces (mmu), and traps

Page 64: Using Projects Based on Internal NT APIs to Teach OS Principles Microsoft Research/Asia - Beijing January 2005 Dave Probert, Ph.D. Architect, Windows Kernel

© Microsoft Corporation 2005 67

Use of SPACE abstractions in Oz

Processor Trap ArchitectureMMUVA

Phys Addr

Trap

Trap Handler

Fundamental SPACE abstractions:

Processor: executes code resulting in traps and memory references

MMU: maps virtual → physical addressing

Traps: fundamental IPC mechanism

Conventional OS kernel abstractions:

Threads, VM, IPC => changes require kernel changes

Instead use to simulate Processors/MMU/Traps

Page 65: Using Projects Based on Internal NT APIs to Teach OS Principles Microsoft Research/Asia - Beijing January 2005 Dave Probert, Ph.D. Architect, Windows Kernel

© Microsoft Corporation 2005 68

Oz Projects(in no particular order)

Page 66: Using Projects Based on Internal NT APIs to Teach OS Principles Microsoft Research/Asia - Beijing January 2005 Dave Probert, Ph.D. Architect, Windows Kernel

© Microsoft Corporation 2005 69

Loading Program Images

Take the sections in a file and load into an address space– dynamic relocation– shared libraries– creating stacks and initial thread contexts– create environment and other parental state– Initialize IO descriptors from parent

Using– NtCreateProcess to create an NT process container– NT VM to manage/modify process contents– NT library functions to take apart images and set environment– NT thread APIs to execute thread in new process– NT handle duplication to set IO descriptors

Page 67: Using Projects Based on Internal NT APIs to Teach OS Principles Microsoft Research/Asia - Beijing January 2005 Dave Probert, Ph.D. Architect, Windows Kernel

© Microsoft Corporation 2005 70

System Calls

Communicate requests for system services– implement basic system calls– access data cross domains– parameter validation and penetration testing– asynchronous operations– servicing models

Using– NT LPC to perform RPC– NT VM to access/modify child process state– NT threads to support flow-of-control

Page 68: Using Projects Based on Internal NT APIs to Teach OS Principles Microsoft Research/Asia - Beijing January 2005 Dave Probert, Ph.D. Architect, Windows Kernel

© Microsoft Corporation 2005 71

Manage address spaces, physical memory & ptes

Build data structures for managing VM-related resources– basic data structures and algorithms– support various page-table models

Using– NT VM to implement actual mappings (and shared memory) for

address spaces– NT VM to implement dirty bits– NT VM shared memory provides page sharing and simulates

physical pages with virtual pages– NT threads provide DMA (for IO)

Page 69: Using Projects Based on Internal NT APIs to Teach OS Principles Microsoft Research/Asia - Beijing January 2005 Dave Probert, Ph.D. Architect, Windows Kernel

© Microsoft Corporation 2005 72

Implementing Virtual Memory

Build various kinds of virtual memory systems– use previous resource management for addresses, mem, page

tables – page file organization– replacement algorithms– shared memory, object/file-backed memory regions– IO/DMA simulation– performance measurement of algorithms

Using– NT LPC/exceptions to handle page faults– NT VM to control actual mappings– NT IO for access to page file

Page 70: Using Projects Based on Internal NT APIs to Teach OS Principles Microsoft Research/Asia - Beijing January 2005 Dave Probert, Ph.D. Architect, Windows Kernel

© Microsoft Corporation 2005 73

Create Processes

Implement processes– create/delete processes in the Oz world – use loader project for loading programs– experiment with different models for process creation and

program execution– implement handle tables for referencing objects– manage run-down and synchronization issues

Using– NT processes to create real VM state (but for little else)

Page 71: Using Projects Based on Internal NT APIs to Teach OS Principles Microsoft Research/Asia - Beijing January 2005 Dave Probert, Ph.D. Architect, Windows Kernel

© Microsoft Corporation 2005 74

Create Threads / Scheduling

Implement threads– create/delete threads in an Oz process – build appropriate data structures to represent thread– tie into Oz process data structures– create stacks and manage thread execution context (PCB)– experiment with ideas like scheduler activations– implement a scheduler, using different policies, priorities, etc– add multi-processor support

Using– NT threads to represent processors and execute Oz threads on

NT threads (Oz threads are essentially user-mode threads)– NT APCs (Asynchronous Procedure Calls) to deliver timer

interrupts to running Oz threads

Page 72: Using Projects Based on Internal NT APIs to Teach OS Principles Microsoft Research/Asia - Beijing January 2005 Dave Probert, Ph.D. Architect, Windows Kernel

© Microsoft Corporation 2005 75

Synchronization

Implement various synchronization primitives– build user-mode, kernel-mode, and hybrid synchronization

primitives of various kinds (events, rw locks)– extend Oz scheduler to support blocking of threads– experiment with deadlock issues, priority inversion

Using– NT compare&swap primitives

Page 73: Using Projects Based on Internal NT APIs to Teach OS Principles Microsoft Research/Asia - Beijing January 2005 Dave Probert, Ph.D. Architect, Windows Kernel

© Microsoft Corporation 2005 76

Other project areasIO Architecture

File Systems

Networking

System bootstrap

Performance measurement & instrumentation (e.g. tracing)

Error handling

Exceptions and traps, stack unwinding

Management of physical resources

Management of resource sharing policies

NUMA multiprocessors

Security, authentication, ACLs

Object support

Namespaces

Virtual machines

Page 74: Using Projects Based on Internal NT APIs to Teach OS Principles Microsoft Research/Asia - Beijing January 2005 Dave Probert, Ph.D. Architect, Windows Kernel

© Microsoft Corporation 2005 77

The Plan

Page 75: Using Projects Based on Internal NT APIs to Teach OS Principles Microsoft Research/Asia - Beijing January 2005 Dave Probert, Ph.D. Architect, Windows Kernel

© Microsoft Corporation 2005 78

Oz is long-envisioned, recently staffed

More detail and initial projects in April

Pre-announced at Microsoft Faculty Summits this year

Intend to beta test in several universities 2005/2006 academic year (US and China and ??)

Make widely available mid 2006

Will particularly target current users of Nachos

Considering project textbooks– May supplement OS Principles textbook in China– May supplement major OS textbook in US or independently

Page 76: Using Projects Based on Internal NT APIs to Teach OS Principles Microsoft Research/Asia - Beijing January 2005 Dave Probert, Ph.D. Architect, Windows Kernel

© Microsoft Corporation 2005 79

Q&A