View
212
Download
0
Category
Preview:
Citation preview
OCA Crash AnalysisOCA Crash Analysis
Andre VachonAndre VachonSoftware Development LeadSoftware Development LeadWindows Product FeedbackWindows Product FeedbackMicrosoft CorporationMicrosoft Corporation
22
What Is OCAWhat Is OCA
Online Crash AnalysisOnline Crash Analysis Free failure analysis service, supported on Free failure analysis service, supported on
Windows XP and later operating systemsWindows XP and later operating systems Gathers direct customer data about Gathers direct customer data about
customer Windows crashescustomer Windows crashes Helps Microsoft and IHVs understand Helps Microsoft and IHVs understand
customer problemscustomer problems
33
Goals Of OCA Data AnalysisGoals Of OCA Data Analysis
Provide feedback to customers to improve Provide feedback to customers to improve overall satisfaction overall satisfaction Real-time feedback about what caused the problem Real-time feedback about what caused the problem
on their machineon their machine Links to help customers solve problemsLinks to help customers solve problems
Make Windows a more reliable platform Make Windows a more reliable platform Find and fix bugs for all kernel mode bluescreens Find and fix bugs for all kernel mode bluescreens
Make crash data more actionable for developersMake crash data more actionable for developers Help Microsoft and IHVs prioritize problemsHelp Microsoft and IHVs prioritize problems
44
OCA Data Analysis ProcessOCA Data Analysis Process
Fully automatedFully automated No human interventionNo human intervention Runs in 2-3 secondsRuns in 2-3 seconds
Takes dumps received from the customer and sends Takes dumps received from the customer and sends them to the debuggerthem to the debugger
Execute !analyze in the debuggerExecute !analyze in the debugger Generate a bucket IDGenerate a bucket ID
Store the output of the analysis into the OCA DatabaseStore the output of the analysis into the OCA Database If the bucket ID has a solution, send the solution back to If the bucket ID has a solution, send the solution back to
the customerthe customer
55
What Does OCA CollectWhat Does OCA Collect
Dump filesDump files Minidumps by defaultMinidumps by default Optionally, customers can submit full dumpsOptionally, customers can submit full dumps
XML dataXML data List of .sys files on the machineList of .sys files on the machine List of PnP IDs enumerated by PnPList of PnP IDs enumerated by PnP
on the machineon the machine
All the data is packaged in a .cab fileAll the data is packaged in a .cab file
66
What Is In A Kernel MinidumpWhat Is In A Kernel Minidump Header, basic OS information, PRCBHeader, basic OS information, PRCB OS Module list (loaded and unloaded)OS Module list (loaded and unloaded) Faulting EPROCESS, ETHREAD, Stack and contextFaulting EPROCESS, ETHREAD, Stack and context Data pages pointed to by the contextData pages pointed to by the context Data pages pointed to by the bugcheck params (Windows XP SP1)Data pages pointed to by the bugcheck params (Windows XP SP1) Some Optional data pages, if space is available in the dump fileSome Optional data pages, if space is available in the dump file Optional bugcheck callback dataOptional bugcheck callback data Minidumps will never contain all the information (neither Minidumps will never contain all the information (neither
will full dumps)will full dumps) Targeted data collection to allow analysis of the majority of failuresTargeted data collection to allow analysis of the majority of failures We ask specific customers to send us additional data when We ask specific customers to send us additional data when
neededneeded User minidumps contain different types of informationUser minidumps contain different types of information
77
Minidump ImprovementsMinidump Improvements Windows XPSP1 minidump improvementsWindows XPSP1 minidump improvements
Sysdata.xml contains PNP IDsSysdata.xml contains PNP IDs Save data pages pointed to by bugcheck parametersSave data pages pointed to by bugcheck parameters KeBugCheck routine improvements in Windows XP SP1 and SP2 KeBugCheck routine improvements in Windows XP SP1 and SP2
to collect more targeted data for crashesto collect more targeted data for crashes More data pages pointed to by registersMore data pages pointed to by registers
Windows XP SP2 minidump improvementsWindows XP SP2 minidump improvements More accurately save the context of the crashMore accurately save the context of the crash
Saved all the pages backed by those registersSaved all the pages backed by those registers SMBIOS data tablesSMBIOS data tables MM pool changes better isolate a number of pool corruptionsMM pool changes better isolate a number of pool corruptions
88
Debugging A Kernel MinidumpDebugging A Kernel Minidump DebuggersDebuggers
Kernel minidumps require using KD or WinDbgKernel minidumps require using KD or WinDbg Both WinDbg and VS supports debugging user mode minidumpsBoth WinDbg and VS supports debugging user mode minidumps
Step 1: Get the imagesStep 1: Get the images A minidump contains minimal data, so code images must be loaded at debug timeA minidump contains minimal data, so code images must be loaded at debug time Use the module timestamps stored in the dump files to find the correct imagesUse the module timestamps stored in the dump files to find the correct images All MS kernel mode code for recent OSes is on the internet symbol serverAll MS kernel mode code for recent OSes is on the internet symbol server
Step 2: Extract PDB information from the imagesStep 2: Extract PDB information from the images The debug record stored in an image used to look for the symbolsThe debug record stored in an image used to look for the symbols If you have the wrong image, wrong symbols will be loadedIf you have the wrong image, wrong symbols will be loaded
Step 3: Get symbolsStep 3: Get symbols Symbol server is again the best solutionSymbol server is again the best solution
Data in the minidump is limitedData in the minidump is limited Look at what you canLook at what you can Some minidumps will not yield useful results if critical information is missingSome minidumps will not yield useful results if critical information is missing
Read the docs for details on loading a minidump in the debuggerRead the docs for details on loading a minidump in the debugger
99
What Is A Bucket?What Is A Bucket?
Identifies component most likely responsibleIdentifies component most likely responsiblefor the crashfor the crash Based on heuristics in !analyzeBased on heuristics in !analyze Heuristics are continually improvedHeuristics are continually improved
Represents a unique bug or problemRepresents a unique bug or problem If multiple bugs map to a bucket, weIf multiple bugs map to a bucket, we
split the bucketsplit the bucket Responses and solutions are associatedResponses and solutions are associated
to a bucketto a bucket A human has to verify the analysis results before a A human has to verify the analysis results before a
response can be attached to a bucketresponse can be attached to a bucket
1010
Sample BucketsSample Buckets
OLD_IMAGE_FOO.SYSOLD_IMAGE_FOO.SYS Crash caused by an old version of foo.sysCrash caused by an old version of foo.sys
OLD_IMAGE_foo.sys_DEV_3577OLD_IMAGE_foo.sys_DEV_3577 Crash caused by an old version of foo.sys on device ID 3577Crash caused by an old version of foo.sys on device ID 3577
0x44_BUGCHECKING_DRIVER_foo0x44_BUGCHECKING_DRIVER_foo Driver foo.sys is known to commonly cause bugcheck 0x44Driver foo.sys is known to commonly cause bugcheck 0x44
POOL_CORRUPTION_fooPOOL_CORRUPTION_foo Driver foo.sys is known to cause pool corruptionDriver foo.sys is known to cause pool corruption
0xBE_foo!bar+1a0xBE_foo!bar+1a Driver foo.sys crashed in routine barDriver foo.sys crashed in routine bar
1111
Customer InteractionCustomer Interaction Send back to customers information about their problem in real-timeSend back to customers information about their problem in real-time
Currently Web-based interactionCurrently Web-based interaction Contains link to web pages hosted by the third-partyContains link to web pages hosted by the third-party Better integration in the OS in the futureBetter integration in the OS in the future
Two categories of feedbackTwo categories of feedback Response: link to a page describing a problem we know about, but is not Response: link to a page describing a problem we know about, but is not
solved yetsolved yet General troubleshooting steps of KB articleGeneral troubleshooting steps of KB article Company wants direct customer feedbackCompany wants direct customer feedback
Solutions: Content that describes how to “fix” a problemSolutions: Content that describes how to “fix” a problem New driversNew drivers
Hosted by ISV, IHV, OEM or Windows UpdateHosted by ISV, IHV, OEM or Windows Update Service PackService Pack Tools to resolve a problemTools to resolve a problem End-of-life statements are acceptable when hosted by the companyEnd-of-life statements are acceptable when hosted by the company
1212
Creating ResponsesCreating Responses
Responses are linked by the OCA teamResponses are linked by the OCA team Send mail to pfat @ microsoft.com when you find the Send mail to pfat @ microsoft.com when you find the
root cause of a bucket and have a fix for itroot cause of a bucket and have a fix for it
Microsoft has generic templates for various Microsoft has generic templates for various solutions and responsessolutions and responses Redirection to third party sitesRedirection to third party sites Redirection to Windows UpdateRedirection to Windows Update KB Articles, etc.KB Articles, etc.
IHVs and ISVs need to provide static web pages IHVs and ISVs need to provide static web pages to have redirectsto have redirects
1313
Customer ConnectionCustomer Connection
We collect very limited userWe collect very limited userfeedback todayfeedback today We collect whether responses were helpful or We collect whether responses were helpful or
not to the customernot to the customer OCA intends to improve interactionOCA intends to improve interaction
with customerswith customers Collect Customer repro stepsCollect Customer repro steps Enable direct contact between customer and Enable direct contact between customer and
developerdeveloper Ability for customers to get updated status on Ability for customers to get updated status on
past crashespast crashes
1414
OCA Crash InvestigationOCA Crash Investigation
Data collected by OCA is stored in a large database for Data collected by OCA is stored in a large database for crash analysis purposescrash analysis purposes
Primary categorization is BucketIDPrimary categorization is BucketID Additional crash data stored in the OCA DBAdditional crash data stored in the OCA DB
OS VersionOS Version Failure dateFailure date Faulting driverFaulting driver Faulting driver timestampFaulting driver timestamp OEM NameOEM Name CPU informationCPU information Bug numberBug number More data as we scale our SQL implementationMore data as we scale our SQL implementation
1515
OCA Data SharingOCA Data Sharing
IHVsIHVs https://winqual.microsoft.comhttps://winqual.microsoft.com hosts the Error Reporting Site hosts the Error Reporting Site
Secure data sharing with any IHV signed up with WinQualSecure data sharing with any IHV signed up with WinQual Data sharing is done based on file name and file version Data sharing is done based on file name and file version Statistics and actual customer dump files are shared with IHVsStatistics and actual customer dump files are shared with IHVs More improvements coming to the siteMore improvements coming to the site
If you need more information to debug problems, send us mailIf you need more information to debug problems, send us mail
OEMsOEMs OCA data is shared with OEMS on a regular basisOCA data is shared with OEMS on a regular basis OEMs see a list of all the crashes that happen on their machinesOEMs see a list of all the crashes that happen on their machines Expect to hear from your OEM if you have a lot of OCA crashesExpect to hear from your OEM if you have a lot of OCA crashes
1616
OCA Data NormalizationOCA Data Normalization
The OCA data can not be normalized to The OCA data can not be normalized to determine absolute quality of a driverdetermine absolute quality of a driver OCA is an anonymous, opt-in systemOCA is an anonymous, opt-in system
We don’t know how many users send in reports and how oftenWe don’t know how many users send in reports and how often
We don’t know the software usageWe don’t know the software usage scenarios of customersscenarios of customers We don’t get reports for “success” scenariosWe don’t get reports for “success” scenarios We don’t know what the actual problem wasWe don’t know what the actual problem was
until it’s fixeduntil it’s fixed
Just fix the largest buckets firstJust fix the largest buckets first
1717
What Is !analyzeWhat Is !analyze
Debugger extension designed to find root cause of bugsDebugger extension designed to find root cause of bugs Automated analysisAutomated analysis Simplifies analysis of known problemsSimplifies analysis of known problems
Understand various states of the OSUnderstand various states of the OS Provides good starting point to analyze complex problemsProvides good starting point to analyze complex problems
Extract commonly used debugging informationExtract commonly used debugging information
Results of the analysis areResults of the analysis are ““Bucket ID”Bucket ID”
Unique string representing the bugUnique string representing the bug An Owner for the problem, extracted from triage.iniAn Owner for the problem, extracted from triage.ini In verbose modeIn verbose mode
Detailed list of all the data found during the analysisDetailed list of all the data found during the analysis
1818
!analyze Output!analyze Outputkd> !analyze -vkd> !analyze -vTHREAD_STUCK_IN_DEVICE_DRIVER (ea)THREAD_STUCK_IN_DEVICE_DRIVER (ea)<text><text>Debugging Details:Debugging Details:------------------------------------FAULTING_THREAD: 82493da8FAULTING_THREAD: 82493da8DEFAULT_BUCKET_ID: GRAPHICS_DRIVER_FAULTDEFAULT_BUCKET_ID: GRAPHICS_DRIVER_FAULTBUGCHECK_STR: 0xEABUGCHECK_STR: 0xEALAST_CONTROL_TRANSFER: from bf9c148e to bf9c1c8fLAST_CONTROL_TRANSFER: from bf9c148e to bf9c1c8fSTACK_TEXT:STACK_TEXT:ae328db0 bf9c148e af0df9c0 013bca06 ae328df0 xxxxxx!vDmaCopy_r6+0x495ae328db0 bf9c148e af0df9c0 013bca06 ae328df0 xxxxxx!vDmaCopy_r6+0x495ae328dfc bf9a94ef 00000026 ae328ec0 ae329304 xxxxxx!vCopyFBToDMABuffer+0x17aae328dfc bf9a94ef 00000026 ae328ec0 ae329304 xxxxxx!vCopyFBToDMABuffer+0x17a……STACK_COMMAND: .thread ffffffff82493da8 ; kbSTACK_COMMAND: .thread ffffffff82493da8 ; kbFOLLOWUP_IP: xxxxxx!vDmaCopy_r6+495 bf9c1c8f 3b1f cmp ebx,[edi]FOLLOWUP_IP: xxxxxx!vDmaCopy_r6+495 bf9c1c8f 3b1f cmp ebx,[edi]FOLLOWUP_NAME: xxxxxxFOLLOWUP_NAME: xxxxxxSYMBOL_NAME: xxxxxx!vDmaCopy_r6+495SYMBOL_NAME: xxxxxx!vDmaCopy_r6+495MODULE_NAME: xxxxxxMODULE_NAME: xxxxxxIMAGE_NAME: xxxxxx.dllIMAGE_NAME: xxxxxx.dllDEBUG_FLR_IMAGE_TIMESTAMP: 3edc0abbDEBUG_FLR_IMAGE_TIMESTAMP: 3edc0abbBUCKET_ID: 0xEA_xxxxxx!vDmaCopy_r6+495BUCKET_ID: 0xEA_xxxxxx!vDmaCopy_r6+495INTERNAL_BUCKET_URL: http://dbgportal/DBGPortal_ViewBucket.asp?BucketID=0xEA_xxxxxx!INTERNAL_BUCKET_URL: http://dbgportal/DBGPortal_ViewBucket.asp?BucketID=0xEA_xxxxxx!
vDmaCopy_r6%2b495&FrameID=undefinedvDmaCopy_r6%2b495&FrameID=undefinedOCA_CRASHES: xxxxOCA_CRASHES: xxxxINTERNAL_RAID_BUG: http://watson/bug.aspx?DB=6&BugID=840654INTERNAL_RAID_BUG: http://watson/bug.aspx?DB=6&BugID=840654Followup: xxxxxxFollowup: xxxxxx
1919
!analyze Algorithm!analyze Algorithm
Multi step algorithmMulti step algorithm Uses bugcheck or verifier codeUses bugcheck or verifier code
as initial inputas initial input Does stack analysisDoes stack analysis Uses additional data about known Uses additional data about known
problems provided by developersproblems provided by developers Iterates on all the data above to determine Iterates on all the data above to determine
the root causethe root cause
2020
Analysis Step 1Analysis Step 1
Use bugcheck parameters to extractUse bugcheck parameters to extractbasic informationbasic information Each bugcheck is processed by a separate Each bugcheck is processed by a separate
routine that understands the meaningroutine that understands the meaningof each parameter of each parameter
Save trap frame, context recording, faulting Save trap frame, context recording, faulting thread, etc.thread, etc.
If specific follow-up or faulting driver is found, If specific follow-up or faulting driver is found, report resultsreport results
2121
Analysis Step 2Analysis Step 2
Use information in step 1 to get faulting stackUse information in step 1 to get faulting stack Scan the stack for special functions such as Scan the stack for special functions such as
Trap0E to find alternate stackTrap0E to find alternate stack Analyze frames on the final stack to determine Analyze frames on the final stack to determine
most likely culpritmost likely culprit Different weights are assigned to routinesDifferent weights are assigned to routines
Internal kernel routines have lowest weightInternal kernel routines have lowest weight Device drivers have highest weightDevice drivers have highest weight Fine grain control provided by triage.iniFine grain control provided by triage.ini
Highest weight frame found on the stack is treated as Highest weight frame found on the stack is treated as the culpritthe culprit
2222
Symbol Server And MinidumpsSymbol Server And Minidumps
Minidumps store the timestamp of imagesMinidumps store the timestamp of images Debugger uses the file name, timestamp and image Debugger uses the file name, timestamp and image
size to map the imagesize to map the image Debugger looks for the symbol file name in the Debugger looks for the symbol file name in the
mapped imagemapped image If the wrong image is loaded by the debugger, the If the wrong image is loaded by the debugger, the
symbols will also be wrongsymbols will also be wrong
Storing images and symbols in symbol server is Storing images and symbols in symbol server is the best way for the debugger to get the correct the best way for the debugger to get the correct version of the imageversion of the image Also simplifies archiving of driver versionsAlso simplifies archiving of driver versions
2323
IHV And ISV SymbolsIHV And ISV Symbols
Symbols greatly help with the automated Symbols greatly help with the automated analysis of failuresanalysis of failures Don’t lose your symbols !Don’t lose your symbols !
Sharing symbols with MicrosoftSharing symbols with Microsoft You can submit symbols with driver submissions to You can submit symbols with driver submissions to
WHQLWHQL On-site vendors can host their own symbol serverOn-site vendors can host their own symbol server Symbol data is stored securelySymbol data is stored securely
Symbols are not shared with other IHVs internallySymbols are not shared with other IHVs internally Symbols are not shared on the external public symbol serverSymbols are not shared on the external public symbol server
Sharing symbols is totally optional, but encouragedSharing symbols is totally optional, but encouraged
2424
Analysis Step 2 – IHV SymbolsAnalysis Step 2 – IHV Symbols Without valid symbolsWithout valid symbols With valid symbolsWith valid symbols
f18e7968 nt!KeBugCheckEx+0x19f18e7968 nt!KeBugCheckEx+0x19f18e7980 nt!IopfCallDriver+0x18f18e7980 nt!IopfCallDriver+0x18f18e7990 Fastfat!FatSingleAsync+0x74f18e7990 Fastfat!FatSingleAsync+0x74f18e7a5c Fastfat!FatCommonRead+0x88ef18e7a5c Fastfat!FatCommonRead+0x88ef18e7acc Fastfat!FatFsdRead+0x136f18e7acc Fastfat!FatFsdRead+0x136f18e7adc nt!IopfCallDriver+0x31f18e7adc nt!IopfCallDriver+0x31f18e7ae8 SYMEVENT!f18e7ae8 SYMEVENT!CSymIrp::IrpRead+0x4bCSymIrp::IrpRead+0x4bf18e7af8 nt!IopfCallDriver+0x31f18e7af8 nt!IopfCallDriver+0x31f18e7b0c nt!IopPageReadInternal+0xf2f18e7b0c nt!IopPageReadInternal+0xf2f18e7b2c nt!IoPageRead+0x19f18e7b2c nt!IoPageRead+0x19f18e7b9c nt!MiDispatchFault+0x270f18e7b9c nt!MiDispatchFault+0x270f18e7bec nt!MmAccessFault+0x5b7f18e7bec nt!MmAccessFault+0x5b7f18e7bec nt!_KiTrap0E+0xb8f18e7bec nt!_KiTrap0E+0xb8f18e7cc4 nt!CcMapData+0xeff18e7cc4 nt!CcMapData+0xeff18e7cf0 Fastfat!FatReadVolumeFile+0x38f18e7cf0 Fastfat!FatReadVolumeFile+0x38f18e7e78 Fastfat!FatMountVolume+0x1f7f18e7e78 Fastfat!FatMountVolume+0x1f7f18e7e98 Fastfat!f18e7e98 Fastfat!FatCommonFileSystemControl+0x47FatCommonFileSystemControl+0x47
BUCKET_ID: BUCKET_ID: POOL_CORRUPTION_Foo.sysPOOL_CORRUPTION_Foo.sys
f18e7968 nt!KeBugCheckEx+0x19f18e7968 nt!KeBugCheckEx+0x19f18e7980 nt!IopfCallDriver+0x18f18e7980 nt!IopfCallDriver+0x18f18e7990 Fastfat!FatSingleAsync+0x74f18e7990 Fastfat!FatSingleAsync+0x74f18e7a5c Fastfat!FatCommonRead+0x88ef18e7a5c Fastfat!FatCommonRead+0x88ef18e7acc Fastfat!FatFsdRead+0x136f18e7acc Fastfat!FatFsdRead+0x136f18e7adc nt!IopfCallDriver+0x31f18e7adc nt!IopfCallDriver+0x31f18e7b0c SYMEVENT+0x61cbf18e7b0c SYMEVENT+0x61cbf18e7b2c nt!IoPageRead+0x19f18e7b2c nt!IoPageRead+0x19f18e7b9c nt!MiDispatchFault+0x270f18e7b9c nt!MiDispatchFault+0x270f18e7bec nt!MmAccessFault+0x5b7f18e7bec nt!MmAccessFault+0x5b7f18e7bec nt!_KiTrap0E+0xb8f18e7bec nt!_KiTrap0E+0xb8f18e7cc4 nt!CcMapData+0xeff18e7cc4 nt!CcMapData+0xeff18e7cf0 Fastfat!FatReadVolumeFile+0x38f18e7cf0 Fastfat!FatReadVolumeFile+0x38f18e7e78 Fastfat!FatMountVolume+0x1f7f18e7e78 Fastfat!FatMountVolume+0x1f7f18e7e98 Fastfat!f18e7e98 Fastfat!FatCommonFileSystemControl+0x47FatCommonFileSystemControl+0x47f18e7ee4 Fastfat!f18e7ee4 Fastfat!FatFsdFileSystemControl+0x85FatFsdFileSystemControl+0x85f18e7ef4 nt!IopfCallDriver+0x31f18e7ef4 nt!IopfCallDriver+0x31f18e7f44 nt!IopMountVolume+0x1d1f18e7f44 nt!IopMountVolume+0x1d1
BUCKET_ID: 0x35_SYMEVENT+61cbBUCKET_ID: 0x35_SYMEVENT+61cb
2525
Analysis Step 3Analysis Step 3
If stack does not yield an interesting If stack does not yield an interesting frame, analyze raw stack dataframe, analyze raw stack data Iterate on all stack values using the same Iterate on all stack values using the same
weight algorithmweight algorithm The ‘dps’ command will show that outputThe ‘dps’ command will show that output
This finds drivers that corrupt the stackThis finds drivers that corrupt the stack
2626
Analysis Step 4Analysis Step 4
Check for presence of memory or pool Check for presence of memory or pool corrupting driverscorrupting drivers
Check for corrupted code streamsCheck for corrupted code streamsusing !chkimgusing !chkimg Bad RAMBad RAM
Check for other possible problems, such Check for other possible problems, such as invalid call sequencesas invalid call sequences Possible CPU problemPossible CPU problem
2727
Pool CorruptionPool Corruption
Pool corruption is very badPool corruption is very bad Driver A crashes because of driver B’s bugDriver A crashes because of driver B’s bug Very hard to identify the culpritVery hard to identify the culprit We estimate about 15% of all crashes are caused by We estimate about 15% of all crashes are caused by
pool corruptionpool corruption
Many OCA failures are due to pool corruptionMany OCA failures are due to pool corruption Every vendor has buckets assigned to them that are Every vendor has buckets assigned to them that are
due to another driverdue to another driver
Run Driver Verifier !Run Driver Verifier ! Track down all pool corruptions and fix them !Track down all pool corruptions and fix them !
2828
Hardware IssuesHardware Issues Hardware problems are quite commonHardware problems are quite common
Heating issuesHeating issues Investigating data in SMBIOS and ACPI to help with thisInvestigating data in SMBIOS and ACPI to help with this
Bad DMABad DMA May be detectable in the future with new hardware support in the May be detectable in the future with new hardware support in the
processorprocessor Bad diskBad disk
Diagnosis tools are being investigatedDiagnosis tools are being investigated Chipset problems (timing issues)Chipset problems (timing issues)
No known detection mechanismsNo known detection mechanisms CPU bugsCPU bugs
No known detection mechanismsNo known detection mechanisms Power glitches, surgePower glitches, surge
No known detection mechanismsNo known detection mechanisms Bad memoryBad memory
Developing algorithms to detect bad memory from a minidumpDeveloping algorithms to detect bad memory from a minidump Shipping a stand-alone memory checkerShipping a stand-alone memory checker
http://http://oca.microsoft.com/en/windiag.aspoca.microsoft.com/en/windiag.asp
2929
Analysis Step 5Analysis Step 5
Generate final bucket ID and follow-up Generate final bucket ID and follow-up based on all gathered informationbased on all gathered information Determine which fields need to be embedded Determine which fields need to be embedded
in the bucket IDin the bucket ID
Assign ownership of failureAssign ownership of failure Lookup in the OCA database for bug ID or Lookup in the OCA database for bug ID or
solution for this bucketsolution for this bucket
3030
Triage.iniTriage.ini
Data file used to drive !analyze heuristics.Data file used to drive !analyze heuristics.It containsIt contains Lists of known bad driversLists of known bad drivers Reliability of certain routines within a driverReliability of certain routines within a driver Who owns a particular module or routineWho owns a particular module or routine How certain bucket IDs should be generatedHow certain bucket IDs should be generated
!analyze parses all the data in triage.ini to !analyze parses all the data in triage.ini to generate the final resultsgenerate the final results
Data updated on a daily basisData updated on a daily basis New tokens to control bucketing added regularlyNew tokens to control bucketing added regularly
3131
Triage.ini TokensTriage.ini Tokens
Timestamps – link date, in HEX formatTimestamps – link date, in HEX format Driver – full name of the imageDriver – full name of the image Module – name of the image without the extensionModule – name of the image without the extension Name – owner of that routine or moduleName – owner of that routine or module
poolcorruptors!<driver> = <timestamp>poolcorruptors!<driver> = <timestamp>
memorycorruptors!<driver> = <timestamp>memorycorruptors!<driver> = <timestamp>
oldimages!<driver> = <timestamp>oldimages!<driver> = <timestamp>
bugcheckingdriver!0x6_<driver> = <timestamp>bugcheckingdriver!0x6_<driver> = <timestamp>
Additional_DriverInfo!<driver> = Build, deviceID, OffsetAdditional_DriverInfo!<driver> = Build, deviceID, Offset
<module>!<routine> = Ignore_<module>!<routine> = Ignore_
<module>!<routine> = maybe_<name><module>!<routine> = maybe_<name>
<module>!<routine> = specific_<name><module>!<routine> = specific_<name>
<module>!<routine> = last_<name><module>!<routine> = last_<name>
3232
Changing Your OCA BucketsChanging Your OCA Buckets
Images and SymbolsImages and Symbols Sharing images and symbols with Microsoft can allow Sharing images and symbols with Microsoft can allow
your buckets to be merged, or routines ignoredyour buckets to be merged, or routines ignored
Triage.ini changeTriage.ini change Algorithm changesAlgorithm changes
!analyze is not directly extensible by third parties yet!analyze is not directly extensible by third parties yet !analyze can call driver specific analysis routines. Can be !analyze can call driver specific analysis routines. Can be
used to parse bugcheck data blockused to parse bugcheck data block
For any improvements, send mail to pfat @ For any improvements, send mail to pfat @ microsoft.commicrosoft.com
3333
RetriagingRetriaging
Process of re-analyzing crashesProcess of re-analyzing crashes Re-execute !analyze on the dump file and Re-execute !analyze on the dump file and
update the database informationupdate the database information
Done when a developer gives us an Done when a developer gives us an analysis changeanalysis change Triage.iniTriage.ini New !analyze heuristicNew !analyze heuristic
Dumps that are retriaged can goDumps that are retriaged can gointo new bucketsinto new buckets
3434
Call To ActionCall To Action
Look at your OCA failuresLook at your OCA failures These are REAL customer problemsThese are REAL customer problems
Fix your pool corruption problemsFix your pool corruption problems Tell us about the bugs you fix, so we can Tell us about the bugs you fix, so we can
update !analyze and point customers to your update !analyze and point customers to your driver updatesdriver updates
Attend the WinDbg Ask the Experts Attend the WinDbg Ask the Experts sessionssessions
3535
ResourcesResources
Debugger URL and download siteDebugger URL and download site http://www.microsoft.com/whdc/ddk/debugginghttp://www.microsoft.com/whdc/ddk/debugging
Debugger e-mail – for debugger bug reports and Debugger e-mail – for debugger bug reports and feature requestsfeature requests windbgfbwindbgfb @ microsoft.com @ microsoft.com We try to fix all the bugs people reportWe try to fix all the bugs people report We do not provide general debugging supportWe do not provide general debugging support
on this aliason this alias
Debugger newsgroupDebugger newsgroup Microsoft.public.windbgMicrosoft.public.windbg Good place for general debugging issuesGood place for general debugging issues
Recommended