38
Inside Azure Diagnostics

Inside Azure Diagnostics (DevLink 2014)

Embed Size (px)

DESCRIPTION

Discussion of diagnostic/troubleshooting options with the Azure diagnostic agent in Cloud Services.

Citation preview

Page 1: Inside Azure Diagnostics (DevLink 2014)

Inside Azure Diagnostics

Page 2: Inside Azure Diagnostics (DevLink 2014)

Michael S. CollierPrincipal Cloud Architect

[email protected]@MichaelCollierwww.MichaelSCollier.com

Page 3: Inside Azure Diagnostics (DevLink 2014)

17

COLUMBUS, OH OCTOBER 17, 2014 CLOUDDEVELOP.ORG

Page 4: Inside Azure Diagnostics (DevLink 2014)

Today’s Agenda1. The need for diagnostic

data in cloud applications

2. Data we can monitor

3. Using the Azure Diagnostic Agent

4. Real-world guidance for troubleshooting Azure apps

Page 5: Inside Azure Diagnostics (DevLink 2014)

Successful projects share at least one common trait . . .

Success vs. Failure

node.js C# Java

Agile- vs -

Waterfall

Page 6: Inside Azure Diagnostics (DevLink 2014)

Successful projects share at least one common trait . . .

Success vs. Failure

Diagnostics Data / Telemetry

Page 7: Inside Azure Diagnostics (DevLink 2014)

A True Story

Scenario1 week before date of production launch. “Am I ready?”

Well, we eventually log

any fatal errors, but that’s all.

OH . . .

Logs? Yeah . . .we really don’t have logs.

Let’s run some tests and look at your logs

I guess that’s better than

nothing.

We looked at Azure diagnostic logging but

didn’t see much value in it

Page 8: Inside Azure Diagnostics (DevLink 2014)

A True Story

You’re kidding? Right?

Page 9: Inside Azure Diagnostics (DevLink 2014)

A True StoryScenarioo Determine if solution is

production readyo Deployed as an Azure Cloud

Serviceo No load testso No performance testso No unit testso Very little instrumentation

We have a problemhttp://www.cutedaily.com/wp-content/uploads/2011/11/shockedbaby.jpg

Page 10: Inside Azure Diagnostics (DevLink 2014)

A True StoryResolution1. Enable Azure diagnostics

– Set key performance counters

2. Add logging statements around key functionality– Especially external

services3. Test, test, test4. Analyze5. Fix it

Scenarioo Determine if solution is

production readyo Deployed as an Azure Cloud

Serviceo No load testso No performance testso No unit testso Very little instrumentation

Page 11: Inside Azure Diagnostics (DevLink 2014)

Instrumentation more important in “the cloud”o Need to have good instrumentation for on-premises

applications

o Cloud – it matters more!

o Distributed environments and serviceso Composite applicationso Reliance on 3rd party vendors . . . such as Microsoft for Azureo Highly automated environmentso Scale out modelo Massive amounts of data

Page 12: Inside Azure Diagnostics (DevLink 2014)

The Cloud Scales

worker roles

web roles

Page 13: Inside Azure Diagnostics (DevLink 2014)

The Cloud Scales . . . You Do Not

worker roles

web roles

Diagnostic Data – 4x

Page 14: Inside Azure Diagnostics (DevLink 2014)

Diagnostic DataWhat data do you gather today?

Performance Counters

Custom Logs(nLog, Log4net, etc.)

IIS Logs

Windows Event Logs

Crash Dumps

Page 15: Inside Azure Diagnostics (DevLink 2014)

Diagnostic Data

Performance Counters

Custom Logs(nLog, Log4net, etc.)

IIS Logs

Windows Event Logs

Crash Dumps

Page 16: Inside Azure Diagnostics (DevLink 2014)

Diagnostic Data – Azure Not so Different

Performance Counters

Custom Logs(nLog, Log4net, etc.)

IIS Logs

Windows Event Logs

Crash Dumps

Azure

Sto

rage

Page 17: Inside Azure Diagnostics (DevLink 2014)

Diagnostic Data StorageDiagnostic Item Table Name Blob Container

NameWindows Event Logs WADWindowsEventLogsTable  

Performance Counters WADPerformanceCountersTable  

Trace Log Statements WADLogsTable  

Azure Diagnostic Infrastructure Logs

WADDiagnosticInfrastructureLogs

 

Custom Logs(i.e. log4net, NLog, etc.)

  <custom>

IIS Logs WADDirectoriesTable* wad-iis-logfiles

IIS Failed Request Logs WADDirectoriesTable* wad-iis-failedreqlogfiles

Crash Dumps WADDirectoriesTable*  * Location of the blob log file is specified in the Container field and name of the blob in the RelativePath field. The AbsolutePath field contains the name of the file as it existed on the role instance.

Page 18: Inside Azure Diagnostics (DevLink 2014)

Diagnostic Monitor Agent

1. Role starts2. Diagnostic monitor agent

starts3. Diagnostics configured4. Data buffered locally5. Data transferred to storage

wad-control-containero Container in Azure blob

storage

Page 19: Inside Azure Diagnostics (DevLink 2014)

Diagnostic Monitor Agent

Page 20: Inside Azure Diagnostics (DevLink 2014)

Configuration Options

Default Configuration

Imperative Configuration

Declarative Configuration

o Trace logso IIS logso Infrastructure

logs

o No transfer

o OnStart()

o Overrides default

o diagnostics.wadcfg

o Root of worker or \bin of web

Page 21: Inside Azure Diagnostics (DevLink 2014)

Imperativepublic override bool OnStart(){    // Create the DiagnosticMonitorConfiguration object to use for configuring the monitoring agent.    DiagnosticMonitorConfiguration config = DiagnosticMonitor.GetDefaultInitialConfiguration();     // Performance Counter configuration    config.PerformanceCounters.DataSources.Add(new PerformanceCounterConfiguration    {        CounterSpecifier = @"\Processor(_Total)\% Processor Time",        SampleRate = TimeSpan.FromSeconds(30)    });       config.PerformanceCounters.ScheduledTransferPeriod = TimeSpan.FromMinutes(1);     // Log configuration    config.Logs.ScheduledTransferLogLevelFilter = LogLevel.Information;    config.Logs.ScheduledTransferPeriod = TimeSpan.FromMinutes(1);     // Event Log configuration    config.WindowsEventLog.DataSources.Add("Application!*");    config.WindowsEventLog.DataSources.Add("System!*");    config.WindowsEventLog.ScheduledTransferLogLevelFilter = LogLevel.Warning;    config.WindowsEventLog.ScheduledTransferPeriod = TimeSpan.FromMinutes(1);    // Start the diagnostic monitor with the new configuration    DiagnosticMonitor.Start("Microsoft.WindowsAzure.Plugins.Diagnostics.ConnectionString", config);     return base.OnStart();}

Impacts local agent only!

Page 22: Inside Azure Diagnostics (DevLink 2014)

Imperative

Deployment ID

Page 23: Inside Azure Diagnostics (DevLink 2014)

Declarative Configuration using Visual Studio

demo

Page 24: Inside Azure Diagnostics (DevLink 2014)

1. wad-control-containera. Created for each role instance

2. Imperative codea. RoleInstanceManager.SetCurrentConfiguration() – update instance’s

diagnostics.wadcfg onlyb. DiagnosticMonitor.Start() – impacts current instance only; will not

update diagnostics.wadcfg

3. Declarative configurationa. Root of worker role or bin of web roleb. Updates to diagnostics.wadcfg take effect only if the wad-control-container

blob has never been updated programmatically.

4. Default configurationa. Last resortb. Collects, but doesn’t transfer to Azure storage

There’s a Precedence

Proble

m?

Page 25: Inside Azure Diagnostics (DevLink 2014)

oDeployment Updateo Change configuration and redeploy

package

oRemotelyo Visual Studioo APIo Cerebrata Azure Management Studio

Update Diagnostic Configuration

Page 26: Inside Azure Diagnostics (DevLink 2014)

On-Demand TransferInstruct WAD to transfer specific data sources to storageSpecify which data sourcesSpecify time range to transferSpecify a notification queueCode or API (or tool)

Overwrites current diagnostic configurationUse sparingly . . . . With caution

More info see http://mcollier.net/DiagOnDemand

Page 27: Inside Azure Diagnostics (DevLink 2014)

Bonus: Verbose LoggingAdditional host-level data – not DiagnosticAgent.exe

WAD*deploymentID*PT*aggregation_interval*[R|RI]Table

Aggregation at 5 minutes, 1 hour, and 12 hour intervals

10 day retention period

Page 28: Inside Azure Diagnostics (DevLink 2014)

Let’s Get Realo Sample every 1 -2 minutes*o Transfer every 5 minutes*

o Transfer only what is needed

o Azure Diagnostics writes data in 60 second wide partitions

o Too much data could overwhelm the partition

* Don’t take my word for it. You don’t know me. Test and validate for your situation.

Page 29: Inside Azure Diagnostics (DevLink 2014)

Query Azure Diagnostic Data

demo

Page 30: Inside Azure Diagnostics (DevLink 2014)

o Two separate channels for telemetry dataoVital informationo Application or service failures. Higher level of alerting.o Fix and return to “normal” as soon as possibleo Alert now – email, SMS, dashboard, ninjas from ceiling, etc.

oDay-to-day operational datao Root cause analysisoHow to prevent in the futureo Azure diagnostics

o Fine tune the alerts – reduce false alarms and noise

Set Priorities

Page 31: Inside Azure Diagnostics (DevLink 2014)

Define Key Metrics

Compute node resource usage

Windows Event logs

Database queries

response times

Application specific

exceptions

Database connection & cmd failures

Microsoft Azure Storage

Analytics

Process for Azure hosted solutions is not that different from traditional, on-premises solutions.

Page 32: Inside Azure Diagnostics (DevLink 2014)

o Log all calls to external services. Challenge an SLA?

o Log details of transient faults

o Partition telemetry data by date (or hour) – reduce impact of data aggregation or reporting

o Use a different storage account!

o Remove old / non-relevant telemetry data

o Apply to development, test, and QA versions – validate performance & ensure telemetry systems operating correctly

Considerations

Page 33: Inside Azure Diagnostics (DevLink 2014)

o Use declarative configuration (diagnostics.wadcfg) exclusively.

o Bring Azure diagnostic data into relational databaseo Easier reportingo Periodically fetch from Azure table and insert into SQL Database table.

Use PK and keep most recent.o Custom code

o Supplement Azure diagnostics with other toolso New Relic or AppDynamicso Cerebrata Azure Management Studioo AzureWatch (Paraleap)

Considerations (cont.)

Page 34: Inside Azure Diagnostics (DevLink 2014)

o Instrumentation and telemetry are key to successful projects

o Cloud metrics similar to metrics for traditional applications

o Be realistic and set priorities

o 3rd party tools can be essential for troubleshooting

Summary

Page 35: Inside Azure Diagnostics (DevLink 2014)

o Diagnostics Configuration Order of Precedence – http://bit.ly/1eomek9

o Use the Azure Diagnostic Configuration File – http://bit.ly/1mVHN3u

o Cloud Service Fundamentals (wiki) – http://bit.ly/1k1YkjI

o Failsafe: Guidance for Resilient Cloud Architectures – http://bit.ly/Q33mkU

o Best Practices for the Design of Large-Scale Services on Windows Azure Cloud Services – http://bit.ly/1qp4omC

Resources

Page 36: Inside Azure Diagnostics (DevLink 2014)

oMulti-part series on Azure diagnostics

oMany other fantastic articles:o Getting Started with Azure Searcho Azure storage queueso Cloud Serviceso Automated testing in Azure

Just Azure

www.JustAzure.com

Page 37: Inside Azure Diagnostics (DevLink 2014)

Questions?

Page 38: Inside Azure Diagnostics (DevLink 2014)

Thank You!Michael S. CollierPrincipal Cloud Architect

[email protected]@MichaelCollierwww.MichaelSCollier.com