View
217
Download
2
Tags:
Embed Size (px)
Citation preview
Windows Server 2008Kernel AdvancesMark RussinovichTechnical FellowMicrosoft Corporation
Content of this talk was co-developed with Dave Solomon (www.solsem.com)
Scope Of Talk
This talk covers key enhancements to the Windows Server 2008 kernel and related core components
Many of the enhancements were introduced in Windows Vista
Does not cover client-focused kernel changes such as
Multimedia Class SchedulerProtected ProcessesUser Account Control, Mandatory Integrity ControlsSuperfetch, ReadyBoost, ReadyDriveI/O priorities, Bitlocker
Scope of Talk (Cont.)
Does not cover important enhancements inNetworking (e.g., new TCP/IP stack, Windows Filtering Platform)Installation (e.g. WIM, Component Based Servicing)Management (group policy and AD improvements)Diagnostics and Monitoring (WDI)Data protection (Bitlocker)
Agenda
Platform SupportProcesses and ThreadsI/O and File SystemMemory ManagementStartup and ShutdownReliability and RecoverySecurity
Kernel Variants
Windows Server 2008 has both 32-bit and 64-bit kernels64-bit includes x64 (AMD64, Intel 64) and ItaniumThis is the last 32-bit server release
Uniprocessor kernel variants are gone in Windows Server 2008
Performance impact of running MP on UP insignificantMultiprocessor systems becoming the norm
Kernel variants therefore reduce to:
* For 32-bit systems with >4GB RAM or hardware no execute support
Kernel 32-bit
64-bit
Multiprocessor x xMultiprocessor Checked x xMultiprocessor PAE* x
Multiprocessor PAE* Checked
x
Dynamic Partitioning
Before, hardware upgrades and maintenance have required a shutdown, resulting in downtimeWindows Server 2008 reduces the need for downtime by supporting these hardware configuration changes without a reboot
Hot plug PCI ExpressSome vendor proprietary Windows Server 2003 configurations support hot plug PCI Express
Hot replace of memoryWindows Server 2003 supports hot add memory
Hot add and replace of processors
Adding a Processor
When a new processor is addedDrivers that requested notification are calledSystem-wide plug and play rebalance is done to include new processor interrupts
Processes must opt-in for affinity updates to have the new processor added to their affinity mask
Some applications make decisions at initialization based on the number of processors The System, Smss, and Svchost processes have affinity updates enabled
Hardware Error Reporting
Before, hardware errors were not reported on in a uniform way
No standards for hardware error reportingNo common mechanism in kernel to collect and report hardware errors
Windows Server 2008 introduces a new common error reporting infrastructure called Windows Hardware Error Architecture (WHEA)
Supports hardware error standards via plug-insCommon error format for all error typesError source discovery
Hypervisor
Windows Server 2008 is the basis for Microsoft’s new virtualization offeringHypervisor to control low level access to system resources
Processors, Local APICs, physical memoryKernel enlightenments for improved performance and scalability
Root partition implements device support using Server Core
Different than VMWare ESX where hypervisor implements drivers
Hardware
Windows hypervisor
RootPartitio
n
ServerCore
AppsApps Apps
Child Partitio
n
Child Partitio
n
OS 1 OS 2
Agenda
Platform SupportProcesses and ThreadsI/O and File SystemMemory ManagementStartup and ShutdownReliability and RecoverySecurity
Time Accounting
Before, Windows accounted for CPU time based on the interval clock timer
10-15ms resolution
Thread quantum expiration was not always fairA thread might get almost no turn or up to three turnsThreads also were charged for interrupts that occurred while they were running
Idle T1 T2
T1 & T2 come out of wait; T1 begins
Time slice interval
Cycle Time Counter
Windows Server 2008 reads Time Stamp Counter (TSC) at context switch
Actual CPU cycles consumed charged to threadInterrupt time not charged to the interrupted thread
Allows for more accurate quantum accountingThread gets at least 1 turn and can get at most a turn + 1 tickAlso provides accurate time accounting for thread execution
Idle T1 T1
Time slice interval
T2
Other Infrastructure Changes
Enhanced Thread Pool mechanismConvenient library for efficient use of threads and CPU
New synchronization APIsAllows for faster DLL loading and easier multithreading development
Private namespacesMore secure protection of application objects such as synchronization and shared memory objects
Hard resource quotasProvides ability to prevent resource exhaustion of critical shared resources including paged pool and non-paged pool
Agenda
Platform SupportProcesses and ThreadsI/O and File SystemMemory ManagementStartup and ShutdownReliability and RecoverySecurity
Self-Healing NTFS
Before, NTFS corruptions required running Chkdsk, which often could only be done on the next rebootIn Windows Server 2008, an NTFS worker thread performs background Chkdsk-type corrections when NTFS detects a corrupt file or directory
Minor disk errors are transparent to the userOnly corrupted files/folder inaccessible during repairsunlike lock of the entire volumeNo need to reboot to repair corruptions
SMB2
SMB is the original Windows remote file system protocol
Can’t adapt to new NTFS featuresNot designed for today’s large data sizes
SMB2 introduced in Windows Vista and Windows Server 2008
Supports NTFS client-side symbolic linksOperations can be batched to minimize client/server round trips Support for arbitrary buffer sizes for more efficient copies result in 30-40x throughput improvement
SMB2 Performance
Performance gains are realized for full SMB2 connections between Server 2008 and Vista SP1
I/O Completion Port ImprovementsI/O completion ports allow threads to wait efficiently for completion of multiple I/O requests
Completed I/Os queue on the completion port
Before, each completion caused unnecessary context switch to the issuing thread
This might cause a delay since the thread might not run immediately to process this
Windows Server 2008 defers I/O completion to when the thread pulls the I/O off the completion port
Avoids context switch, thus improving performance
Agenda
Platform SupportProcesses and ThreadsI/O and File SystemMemory ManagementStartup and ShutdownReliability and RecoverySecurity
Static System Address Space Before, system virtual address space divided into fixed regions
Reason for limits on nonpaged, paged pool, system page table entries
Application
Paged PoolNonpaged
PoolSystem PTEs
2 GB User Mode
2 GB Kernel Mode
Session Pool
Dynamic System Address SpaceIn 32-bit Windows Server 2008, virtual memory assigned as needed
Kernel page tables allocated on demand instead of at boot
Components still cannot exceed 2 GB on 32-bit systems
Kernel stack usage is reduced through “stack jumping”
Stacks can grow and shrink as requiredBenefits:
Supports more users on terminal servers
Up to 64GB supported when booted /3GB (instead of 16GB) Paged Pool
Nonpaged Pool
System PTEs
2 GB Kernel Mode
Session Pool
Memory Manager Performance ImprovementsFewer and larger disk reads for page faults and system cache readahead
64 KB limit removed; large I/Os done despite valid page rangeReadahead done directly into page cache lists
Fewer, faster, and larger disk writes for pagefile and mapped file I/O
Larger cluster size (average 1MB), reduced fragmentationElimination of zero writes
Concurrency improvements in many areas More lock free searches, better parallelism
NUMA Enhancements
More memory allocations are NUMA aware:Initial nonpaged pool has separate address ranges for each nodePer-node look-asides for full pagesPage table allocation for system PTEs, the system cache, etc. distributed across nodes
I/O system directs interrupt completion to node that initiated I/O
NUMA Enhancements (Cont.)
Ideal node used more effectively for process memory allocations
Page faults for threads running on non-ideal processor go into ideal node’s memoryPrefetch pages to ideal node for application, instead of ideal node for prefetch threadPages migrated to ideal node on soft page faultNUMA topology used to influence locality for memory allocations
New NUMA APIs allow applications to specify preferred node number for memory allocations and file mappings
NUMA Memory Allocation
Thread T gets scheduled on another node, but memory allocations go to its ideal node
T
Node 1 Node 2
CPU 0
CPU 3
CPU 1
CPU 2
CPU 0
CPU 3
CPU 1
CPU 2
Ideal CPU
RAM RAM
Ideal Node
Agenda
Platform SupportProcesses and ThreadsI/O and File SystemMemory ManagementStartup and ShutdownReliability and RecoverySecurity
Startup Processes On Server 2003
Session Manager (SMSS) created Winlogon and Csrss for each session
Session creation was done seriallyWas bottleneck for Terminal Services
Winlogon, the interactive logon manager, created
Local Security Authority (Lsass.exe) Service Control Manager (Services.exe)
Startup Processes On Server 2008
In Windows Server 2008Initial Smss.exe creates an instance of itself to initialize each sessionPermits parallel session creation
Minimum parallel session startups is 4Maximum is number of processors
Session 0 Smss runs Wininit.exe (new)Wininit starts what Winlogon used to start: Services, LsassAlso starts a new process, Local Session Manager (Lsm.exe)
Session 1-n Smss’s create initialize interactive sessions
Session-specific instance of Csrss.exe and Winlogon.exe
Clean Service ShutdownBefore, services had no way to extend the time allowed for shutdown
After a fixed timeout (default 20 seconds), SCM was killed and system halted (while services were running) This was a problem for services that needed to flush data
In Windows Server 2008, services can request preshutdown notification
Can take as long as they want to shut down as long as they are responsive If the service stops responding the system gives up on it after 3 minutesAfter pre-shutdown services stop, the system performs Windows XP-style shutdown for other services
Agenda
Platform SupportProcesses and ThreadsI/O and File SystemMemory ManagementStartup and ShutdownReliability and RecoverySecurity
Kernel Transaction Manager (KTM)
Before, applications had to work hard to recover from errors during modification of files and Registry keysWindows Server 2008 implements a generalized transaction manager
Provide all or nothing transaction semantics
Kernel Transaction Manager coordinates between transaction clients (applications) and Resource Managers
Builds on Common Log File System (Clfs.sys) introduced in Windows Server 2003 R2 for efficient transaction logging facilitiesThird parties can write user-mode or kernel-mode Resource Managers
Kernel Transaction Manager (KTM)
Registry and NTFS enhanced to provide transaction semantics across Registry and file system operations
Transactions can span modifications across one or many Registry keys, files, and volumes
Using DTC and Windows Server 2008 , transactions can coordinate changes across files, registry, SQL Server, Oracle, MSMQUsed by Windows Update and System Protection
No more unbootable systems when there’s a system failure during a hotfix
Better Handling of Process CrashesBefore, unhandled exception handling was executed in context of thread incurring exception
Relied on thread stack being validCorrupt thread stacks resulted in “silent process death”
In Windows Server 2008, unhandled exceptions send a message to the Windows Error Reporting (WER) service
WER launches Werfault.exe, which replaces Dwwin.exePermits WER to be invoked for threads who’s stack is too corrupted to invoke unhandled exception filterBottom line: all process crashes will get recorded (and reported if configured to do so)
Improved Crash Dump Support
Before, crashes during early part of system startup would not result in a crash dump
Crash dumps are written to paging file on boot partitionPaging files were not opened by Smss process (after kernel initialization)
Now, the paging file is open before system start drivers initialize
Crashes that might result are therefore recorded
Before, default crash dump type on servers was set to full memory dumps
Now, default dump type on all systems is kernel dumps
Agenda
Platform SupportProcesses and ThreadsI/O and File SystemMemory ManagementStartup and ShutdownReliability and RecoverySecurity
Address Space Load Randomization (ASLR)
Prior to Windows Server 2008Kernel, HAL, executables and DLLs loaded at fixed locationsBuffer overflows commonly relied on known system function addresses to cause specific code to execute
Windows Server 2008 loader bases modules at one of 256 random points in the address space
Operating System (OS) images now include relocation informationRelocation performed once per image and shared across processes
User stack locations are also randomizedDrivers, Kernel and HAL also randomized
Kernel32
NTDLL
User32
Exe
Server 2003 1
Kernel32
NTDLL
User32
Exe
Server 2003 2
Kernel32
User32
Exe
Server 2008 1
User32
Exe
Server 2008 2
NTDLLKernel32
NTDLL
Service Security Hardening
Before, service bugs allowed for privilege elevation attacksIn Windows Server 2008, services apply principle of least-privilege to limit system exposure in case of compromiseService-specific SIDs permit a service’s access to objects to be limited
Only required objects give SID accessFirewall policy can be applied to service SID (and many Windows Server 2008 services have this specified)
Write-restricted service processes further limit write access
Can only modify objects that allow restricted write access
Service Security Hardening
Service can specify which privileges (e.g., shutdown, audit, etc.) they require
Limits power of service processesSpecified in MULTI_SZ registry value under service key called RequiredPrivileges
On service start, SCM computes union of all required privileges for service(s) inside service process
If process token does not contain one, service start failsPrivileges not explicitly specified are removed from tokenIf no required privileges specified, assumes all privileges in process token are needed
Summary and More Information
Lots of exciting kernel changes in Windows Server 2008 for performance, scalability, reliability, securityFurther reading:
Article on Vista Kernel Changes in February, March, April 2007 issues of TechNet magazineWindows Server 2008 Kernel Changes article (March 2008)Windows Internals 5th Edition (Summer 2008)
My Other Sessions
Security Panel:10:45 tomorrow
CLI301 The Case of the Unexplained...:9:00 tomorrow15:15 tomorrow
Resources
TechNet Library
Knowledge Base
Forums TechNet Magazine
Security bulletins User
Groups
Newsgroups
E-learning Product Evaluations
Videos Webcasts V-labs
Blogs MVPs Certification Chats
Visit TechNet in the ATE Pavilion and get a FREE 60-day subscription to TechNet Plus!
learn
support
connect
subscribe
Technical Communities, Webcasts, Blogs, Chats & User Groupshttp://www.microsoft.com/communities/default.mspx
Microsoft Learning and Certificationhttp://www.microsoft.com/learning/default.mspx
Microsoft Developer Network (MSDN) & TechNet http://microsoft.com/msdn http://microsoft.com/technet
Trial Software and Virtual Labshttp://www.microsoft.com/technet/downloads/trials/default.mspx
New, as a pilot for 2007, the Breakout sessions will be available post event, in the TechEd Video Library, via the My Event page of the website
Complete your evaluation on the My Event pages of the website at the CommNet or the Feedback Terminals to win!
All attendees who submit a session feedback form within 12 hours after the session ends will have the chance to win the very latest HTC 'Touch' smartphone complete with Windows Mobile® 6 Professional
© 2007 Microsoft Corporation. All rights reserved.This presentation is for informational purposes only.MICROSOFT MAKES NO WARRANTIES, EXPRESS OR IMPLIED, IN THIS SUMMARY.