How to proceed when 1000 call agents tell you: 'My ... · PDF filecpvmonitorandperl/CPAN...

Preview:

Citation preview

How to proceed when 1 000 call agents tell you:’My Computer is slow‘

Tobias Oetiker <tobi@oetiker.ch>

OETIKER+PARTNER AG

22nd Large Installation System Administration Conference

boot up

I users blame IT performanceI stop watch and heisenbugsI sysinternals toolsI autoit and winspyI sorry, no quick fixI but we can monitor it

boot up

I users blame IT performanceI stop watch and heisenbugsI sysinternals toolsI autoit and winspyI sorry, no quick fixI but we can monitor it

boot up

I users blame IT performanceI stop watch and heisenbugsI sysinternals toolsI autoit and winspyI sorry, no quick fixI but we can monitor it

boot up

I users blame IT performanceI stop watch and heisenbugsI sysinternals toolsI autoit and winspyI sorry, no quick fixI but we can monitor it

boot up

I users blame IT performanceI stop watch and heisenbugsI sysinternals toolsI autoit and winspyI sorry, no quick fixI but we can monitor it

boot up

I users blame IT performanceI stop watch and heisenbugsI sysinternals toolsI autoit and winspyI sorry, no quick fixI but we can monitor it

design goals

I passive monitoring from users perspectiveI let users give their inputI minimal impactI simple setup and updateI central data store

design goals

I passive monitoring from users perspectiveI let users give their inputI minimal impactI simple setup and updateI central data store

design goals

I passive monitoring from users perspectiveI let users give their inputI minimal impactI simple setup and updateI central data store

design goals

I passive monitoring from users perspectiveI let users give their inputI minimal impactI simple setup and updateI central data store

design goals

I passive monitoring from users perspectiveI let users give their inputI minimal impactI simple setup and updateI central data store

three tools

I CPV monitor: observe the systemI CPV reporter: easy problem reportingI CPV explorer: view the results

three tools

I CPV monitor: observe the systemI CPV reporter: easy problem reportingI CPV explorer: view the results

three tools

I CPV monitor: observe the systemI CPV reporter: easy problem reportingI CPV explorer: view the results

cpv monitor and perl/CPAN

Look it’s perl honey!I AutoItI use Win32::GuiTest;I use Win32::API;I use Win32::OLE;I use Win32::GUI;I use FSA::Rules;I use threads;

cpv monitor and perl/CPAN

Look it’s perl honey!I AutoItI use Win32::GuiTest;I use Win32::API;I use Win32::OLE;I use Win32::GUI;I use FSA::Rules;I use threads;

cpv monitor and perl/CPAN

Look it’s perl honey!I AutoItI use Win32::GuiTest;I use Win32::API;I use Win32::OLE;I use Win32::GUI;I use FSA::Rules;I use threads;

cpv monitor and perl/CPAN

Look it’s perl honey!I AutoItI use Win32::GuiTest;I use Win32::API;I use Win32::OLE;I use Win32::GUI;I use FSA::Rules;I use threads;

cpv monitor and perl/CPAN

Look it’s perl honey!I AutoItI use Win32::GuiTest;I use Win32::API;I use Win32::OLE;I use Win32::GUI;I use FSA::Rules;I use threads;

cpv monitor and perl/CPAN

Look it’s perl honey!I AutoItI use Win32::GuiTest;I use Win32::API;I use Win32::OLE;I use Win32::GUI;I use FSA::Rules;I use threads;

cpv monitor and perl/CPAN

Look it’s perl honey!I AutoItI use Win32::GuiTest;I use Win32::API;I use Win32::OLE;I use Win32::GUI;I use FSA::Rules;I use threads;

cpv system overview

cpv monitor structure

lesson #1: fsm are cool

lesson #1: seemingly simple

lesson #1: complexity trap

cpv monitor

cpv monitor monitor

cpv reporter

cpv explorer

thinking BIG

wants

I ∼ 1500 clients in the call-centerI dynamic configurationI individual profiles

infrastructure

data store : PostgreSQLconfiguration : Apache, CPVservice.cgi

analysis : Apache, Qooxdoo, CPVjson.cgi, Gnuplot

thinking BIG

wants

I ∼ 1500 clients in the call-centerI dynamic configurationI individual profiles

infrastructure

data store : PostgreSQLconfiguration : Apache, CPVservice.cgi

analysis : Apache, Qooxdoo, CPVjson.cgi, Gnuplot

thinking BIG

wants

I ∼ 1500 clients in the call-centerI dynamic configurationI individual profiles

infrastructure

data store : PostgreSQLconfiguration : Apache, CPVservice.cgi

analysis : Apache, Qooxdoo, CPVjson.cgi, Gnuplot

thinking BIG

wants

I ∼ 1500 clients in the call-centerI dynamic configurationI individual profiles

infrastructure

data store : PostgreSQLconfiguration : Apache, CPVservice.cgi

analysis : Apache, Qooxdoo, CPVjson.cgi, Gnuplot

thinking BIG

wants

I ∼ 1500 clients in the call-centerI dynamic configurationI individual profiles

infrastructure

data store : PostgreSQLconfiguration : Apache, CPVservice.cgi

analysis : Apache, Qooxdoo, CPVjson.cgi, Gnuplot

thinking BIG

wants

I ∼ 1500 clients in the call-centerI dynamic configurationI individual profiles

infrastructure

data store : PostgreSQLconfiguration : Apache, CPVservice.cgi

analysis : Apache, Qooxdoo, CPVjson.cgi, Gnuplot

observation tools

I GetWindowText and friendsI Reading log filesI Windows WMI (Load, Processes)I Active Probing (Ping, HTTP)I HTTPAnalyzer ($$$) for http(s)I Full Custom Probes

observation tools

I GetWindowText and friendsI Reading log filesI Windows WMI (Load, Processes)I Active Probing (Ping, HTTP)I HTTPAnalyzer ($$$) for http(s)I Full Custom Probes

observation tools

I GetWindowText and friendsI Reading log filesI Windows WMI (Load, Processes)I Active Probing (Ping, HTTP)I HTTPAnalyzer ($$$) for http(s)I Full Custom Probes

observation tools

I GetWindowText and friendsI Reading log filesI Windows WMI (Load, Processes)I Active Probing (Ping, HTTP)I HTTPAnalyzer ($$$) for http(s)I Full Custom Probes

observation tools

I GetWindowText and friendsI Reading log filesI Windows WMI (Load, Processes)I Active Probing (Ping, HTTP)I HTTPAnalyzer ($$$) for http(s)I Full Custom Probes

observation tools

I GetWindowText and friendsI Reading log filesI Windows WMI (Load, Processes)I Active Probing (Ping, HTTP)I HTTPAnalyzer ($$$) for http(s)I Full Custom Probes

lesson #2: finding outlook errors

I outlook modal popup send button does not workI GetAsyncKeyState: Although the least significant bit of the

return value indicates whether the key has been pressed since thelast query, due to the pre-emptive multitasking nature ofWindows, another application can call GetAsyncKeyState andreceive the “recently pressed” bit instead of your application.The behavior of the least significant bit of the return valueis retained strictly for compatibility with 16-bit Windowsapplications (which are non-preemptive) and should not berelied upon.

I GetClassName(WindowFromPoint(GetCursorPos()))eq ’MsoCommandBar’;

lesson #2: finding outlook errors

I outlook modal popup send button does not workI GetAsyncKeyState: Although the least significant bit of the

return value indicates whether the key has been pressed since thelast query, due to the pre-emptive multitasking nature ofWindows, another application can call GetAsyncKeyState andreceive the “recently pressed” bit instead of your application.The behavior of the least significant bit of the return valueis retained strictly for compatibility with 16-bit Windowsapplications (which are non-preemptive) and should not berelied upon.

I GetClassName(WindowFromPoint(GetCursorPos()))eq ’MsoCommandBar’;

lesson #2: finding outlook errors

I outlook modal popup send button does not workI GetAsyncKeyState: Although the least significant bit of the

return value indicates whether the key has been pressed since thelast query, due to the pre-emptive multitasking nature ofWindows, another application can call GetAsyncKeyState andreceive the “recently pressed” bit instead of your application.The behavior of the least significant bit of the return valueis retained strictly for compatibility with 16-bit Windowsapplications (which are non-preemptive) and should not berelied upon.

I GetClassName(WindowFromPoint(GetCursorPos()))eq ’MsoCommandBar’;

lesson #3: WMGetText

I GetWindowText or WMGetTextI Application becomes real busy with WMGetTextI stay with GetWindowText

lesson #3: WMGetText

I GetWindowText or WMGetTextI Application becomes real busy with WMGetTextI stay with GetWindowText

lesson #3: WMGetText

I GetWindowText or WMGetTextI Application becomes real busy with WMGetTextI stay with GetWindowText

lesson #4: server issues

I 2008-10-27: 1,459 devices sent 2,417,807 samplesI 4 Core / 32-bit / 4 GB ramI 40 days of data 100,000,000 samplesI index does not fit in ramI too much data for processing

lesson #4: server issues

I 2008-10-27: 1,459 devices sent 2,417,807 samplesI 4 Core / 32-bit / 4 GB ramI 40 days of data 100,000,000 samplesI index does not fit in ramI too much data for processing

lesson #4: server issues

I 2008-10-27: 1,459 devices sent 2,417,807 samplesI 4 Core / 32-bit / 4 GB ramI 40 days of data 100,000,000 samplesI index does not fit in ramI too much data for processing

lesson #4: server issues

I 2008-10-27: 1,459 devices sent 2,417,807 samplesI 4 Core / 32-bit / 4 GB ramI 40 days of data 100,000,000 samplesI index does not fit in ramI too much data for processing

lesson #4: server issues

I 2008-10-27: 1,459 devices sent 2,417,807 samplesI 4 Core / 32-bit / 4 GB ramI 40 days of data 100,000,000 samplesI index does not fit in ramI too much data for processing

lesson #5: index compaction

I function based indexI hours since 2007 is good for 7 years with 2 byteI 2 byte for metric idI 2 byte for workstation idI two WHERE conditions

lesson #5: index compaction

I function based indexI hours since 2007 is good for 7 years with 2 byteI 2 byte for metric idI 2 byte for workstation idI two WHERE conditions

lesson #5: index compaction

I function based indexI hours since 2007 is good for 7 years with 2 byteI 2 byte for metric idI 2 byte for workstation idI two WHERE conditions

lesson #5: index compaction

I function based indexI hours since 2007 is good for 7 years with 2 byteI 2 byte for metric idI 2 byte for workstation idI two WHERE conditions

lesson #5: index compaction

I function based indexI hours since 2007 is good for 7 years with 2 byteI 2 byte for metric idI 2 byte for workstation idI two WHERE conditions

lesson #6: random data reduction

I too much data for statisticsI how to get 12% of the samples?I add 2 byte random value to each sampleI select all sample with rand < maxrand 12

100

lesson #6: random data reduction

I too much data for statisticsI how to get 12% of the samples?I add 2 byte random value to each sampleI select all sample with rand < maxrand 12

100

lesson #6: random data reduction

I too much data for statisticsI how to get 12% of the samples?I add 2 byte random value to each sampleI select all sample with rand < maxrand 12

100

lesson #6: random data reduction

I too much data for statisticsI how to get 12% of the samples?I add 2 byte random value to each sampleI select all sample with rand < maxrand 12

100

lesson #7: threaded perl

I works very well on win32I full copy — lots of memoryI save require modules after creating the threadI only thread where really necessary

lesson #7: threaded perl

I works very well on win32I full copy — lots of memoryI save require modules after creating the threadI only thread where really necessary

lesson #7: threaded perl

I works very well on win32I full copy — lots of memoryI save require modules after creating the threadI only thread where really necessary

lesson #7: threaded perl

I works very well on win32I full copy — lots of memoryI save require modules after creating the threadI only thread where really necessary

lesson #8: measuring boot and logon time

t

GWP boot

WMI SystemUpTime

Services.exestarted

WMI ProcessCreateDate

Logon

WMI LogonSessionStartTime

Explorer.exeor CPV.exe

started

WMI ProcessCreateDate

Load.Gwp.StartUp.Logon2CpvLoad.Gwp.StartUp.Boot2Service

Load.Gwp.StartUp.Logon2Explorer

lesson #9: detecting crashes

I no wait but process handleI no signals only exit codesI 0xC0000005 - segfaultI 0x00000103 - still runningI TerminateProcess can define exit code

ImplementationI find active windowI attach process handleI poll for exit code

lesson #9: detecting crashes

I no wait but process handleI no signals only exit codesI 0xC0000005 - segfaultI 0x00000103 - still runningI TerminateProcess can define exit code

ImplementationI find active windowI attach process handleI poll for exit code

lesson #9: detecting crashes

I no wait but process handleI no signals only exit codesI 0xC0000005 - segfaultI 0x00000103 - still runningI TerminateProcess can define exit code

ImplementationI find active windowI attach process handleI poll for exit code

lesson #9: detecting crashes

I no wait but process handleI no signals only exit codesI 0xC0000005 - segfaultI 0x00000103 - still runningI TerminateProcess can define exit code

ImplementationI find active windowI attach process handleI poll for exit code

lesson #9: detecting crashes

I no wait but process handleI no signals only exit codesI 0xC0000005 - segfaultI 0x00000103 - still runningI TerminateProcess can define exit code

ImplementationI find active windowI attach process handleI poll for exit code

lesson #9: detecting crashes

I no wait but process handleI no signals only exit codesI 0xC0000005 - segfaultI 0x00000103 - still runningI TerminateProcess can define exit code

ImplementationI find active windowI attach process handleI poll for exit code

lesson #9: detecting crashes

I no wait but process handleI no signals only exit codesI 0xC0000005 - segfaultI 0x00000103 - still runningI TerminateProcess can define exit code

ImplementationI find active windowI attach process handleI poll for exit code

lesson #9: detecting crashes

I no wait but process handleI no signals only exit codesI 0xC0000005 - segfaultI 0x00000103 - still runningI TerminateProcess can define exit code

ImplementationI find active windowI attach process handleI poll for exit code

lesson #10: application hangs - symptoms

lesson #10: application hangs - symptoms

lesson #10: application hangs - symptoms

lesson #10: application hangs - symptoms

lesson #10: application hangs - symptoms

lesson #10: application hangs - detection

I dead apps don’t process messagesI explorer fakes responsiveness

ImplementationI find active windowI window ping: SendMessage WM_NULLI wait until the window is back

lesson #10: application hangs - detection

I dead apps don’t process messagesI explorer fakes responsiveness

ImplementationI find active windowI window ping: SendMessage WM_NULLI wait until the window is back

lesson #10: application hangs - detection

I dead apps don’t process messagesI explorer fakes responsiveness

ImplementationI find active windowI window ping: SendMessage WM_NULLI wait until the window is back

lesson #10: application hangs - detection

I dead apps don’t process messagesI explorer fakes responsiveness

ImplementationI find active windowI window ping: SendMessage WM_NULLI wait until the window is back

lesson #10: application hangs - detection

I dead apps don’t process messagesI explorer fakes responsiveness

ImplementationI find active windowI window ping: SendMessage WM_NULLI wait until the window is back

positive

I CPV reporter - being part of the solutionI CPV explorer - data accessibilityI case: CRM crash detectionI ongoing: webapp monitoringI structured problem solvingI closed feedback loopI SLA benchmarks

positive

I CPV reporter - being part of the solutionI CPV explorer - data accessibilityI case: CRM crash detectionI ongoing: webapp monitoringI structured problem solvingI closed feedback loopI SLA benchmarks

positive

I CPV reporter - being part of the solutionI CPV explorer - data accessibilityI case: CRM crash detectionI ongoing: webapp monitoringI structured problem solvingI closed feedback loopI SLA benchmarks

positive

I CPV reporter - being part of the solutionI CPV explorer - data accessibilityI case: CRM crash detectionI ongoing: webapp monitoringI structured problem solvingI closed feedback loopI SLA benchmarks

positive

I CPV reporter - being part of the solutionI CPV explorer - data accessibilityI case: CRM crash detectionI ongoing: webapp monitoringI structured problem solvingI closed feedback loopI SLA benchmarks

positive

I CPV reporter - being part of the solutionI CPV explorer - data accessibilityI case: CRM crash detectionI ongoing: webapp monitoringI structured problem solvingI closed feedback loopI SLA benchmarks

positive

I CPV reporter - being part of the solutionI CPV explorer - data accessibilityI case: CRM crash detectionI ongoing: webapp monitoringI structured problem solvingI closed feedback loopI SLA benchmarks

challenge

I CPV drama triangle - victim / rescuerI who is begin observedI mapping the human waysI side effectsI high observability assumptions

challenge

I CPV drama triangle - victim / rescuerI who is begin observedI mapping the human waysI side effectsI high observability assumptions

challenge

I CPV drama triangle - victim / rescuerI who is begin observedI mapping the human waysI side effectsI high observability assumptions

challenge

I CPV drama triangle - victim / rescuerI who is begin observedI mapping the human waysI side effectsI high observability assumptions

challenge

I CPV drama triangle - victim / rescuerI who is begin observedI mapping the human waysI side effectsI high observability assumptions

future work

I DLL injectionI webapps, webapps, webappsI dealing with the data

future work

I DLL injectionI webapps, webapps, webappsI dealing with the data

future work

I DLL injectionI webapps, webapps, webappsI dealing with the data

Questions

Tobi Oetiker <tobi@oetiker.ch>OETIKER+PARTNER AG

Commercial Contact:Claus Henning Simon <ClausHenning.Simon@swisscom.com>Swisscom IT Services AG

Recommended