CodeFest 2014. Christopher Bennage — Semantic Logging. Avoiding the log chaos

1

Christopher Bennagepatterns & practicesmicrosoft.com/practices

Semantic LoggingAvoiding the Log Chaos

Christopher Bennage is a developer on Microsofts patterns and practices team.

teamhttp://microsoft.com/practicestwitter@bennageblog http://dev.bennage.comemail [email protected] http://github.com/bennage

No real structureWhats in there? Sheer number of files & types of logs is overwhelmingHard to consume/automateSubject to compatibility/inconsistencies.

Logs are Frustrating

First, lets acknowledge the painfulness of logging.

We need logging because it helps us to understand what is going on inside of our systems. This is especially true as our system become more sophisticated. It is almost impossible to understand what is going on in a distributed or message-based system without some kind of logging.

Lets also acknowledge that logging its not a particularly interesting topic either. Often, we are only concerned about it because we know that it will be important later.

Adding logging to a system can feel a like a distraction from getting the real work done. Since it is annoying to add logging, we often rush through it.

The problem is compounded again by the fact that we frequently dont know how we are going to use the logs until we need to use the logs.

The result is that our logs are often:InconsistentWithout structureHard to consumeHard to automateOverwhelming

Photo: http://www.flickr.com/photos/absent/2157057475/

Semantic?

Semantic logging is a way to address these problems.We use the word semantic to express that our logs have meaning. You might also call this approach something like structured logging, or perhaps strongly typed logging.

Even though this talk is about the Semantic Logging Application Block, what I am really interest in is changing the way you think about logging. Creating a structured or semantic log is essentially a different way of thinking about logging.

The primary idea is that we deliberately include structure in the entries we log from the very beginning. We are acknowledging from the beginning that we cannot predict exactly how our logs will be consumed.The entries in our log will have more metadata describing what was logged. It also means that we use logging techniques that encourage more discipline as we set up our logging.

Our intention with SLAB is to help you put the effort in the right place. We want to remove as much of the friction as possible for taking a more structured and semantic approach to your logs.

Unstructured Log

An example

176 [main] INFO examples.Sort - Populating an array of 2 elements in reverse order.225 [main] INFO examples.SortAlgo - Entered the sort method.262 [main] DEBUG SortAlgo.OUTER i=1 - Outer loop.276 [main] DEBUG SortAlgo.SWAP i=1 j=0 - Swapping intArray[0] = 1 and intArray[1] = 0290 [main] DEBUG SortAlgo.OUTER i=0 - Outer loop.304 [main] INFO SortAlgo.DUMP - Dump of integer array:317 [main] INFO SortAlgo.DUMP - Element [0] = 0331 [main] INFO SortAlgo.DUMP - Element [1] = 1343 [main] INFO examples.Sort - The next log statement should be an error message.346 [main] ERROR SortAlgo.DUMP - Tried to dump an uninitialized array.467 [main] INFO examples.Sort - Exiting main method.

Lets take a look at a typical unstructured log.

Here we have output from some sort of sorting algorithm. Even though I am calling it unstructured, we should recognize that it is not entirely without structure. We see things like event ids, log levels, et cetera.Nevertheless, it is not structured enough. What we really care about is usually the log message in each entry. The messages are strings that were intended to read by a human. The developer who added this logging had an understanding of what the names in the entries meant. However, it is not clear what those names mean when we look at the log after the fact.The problem is that the relevant information is flattened into the message. Unless you understand how to reserved parse the message you will have difficultly understanding what the message really means.

Now this is really a simple example, imagine a more complex and distributed system. The loss context when reading the log messages becomes even more pronounced.

Structured Log

An example using Azure

Query by payload argument

Now lets take a look at the output from a structured log. Here we have some log entries stored in an Azure table.

We still have a message that is intended for humans to read. However, we are also retained that same information in a more structured format. Notice the 5 in the log message. Now notice how value appears in the payload for the log entry.

Likewise, in the name example you can see the approver name embedded in the human-readable message, but also broken out into a separate field.

This allows us to query by the payload argument without any additional parsing of the human-readable message.

Logging cannot be just a checkmark of doing something.You have to think about consumption and purpose.Allow appropriate decisions to be made at appropriate time, explicitly separating:WHAT to logWHEN to log itWHERE to log

We are on a mission

Changing the way people think about logging

Our team, patterns and practices, experienced a lot of this pain first hand when we were building WASABI, the Auto Scaling Block for Azure. (I must confess that I was not part of that particular team.) We had a personal need to improve our logging.

Now, we want to share the things that helped us. Specifically, we want to change the way developers think about logging.

We want you to think about the consumption and purpose of the logging at the begging.We allow the appropriate decisions to be made at the appropriate times by explicitly separating:What to logWhen to log itWhere the log is stored

Many logging frameworks already abstract away the idea of where the log is stored, so thats not necessarily something new.

demo >> EventSource & SLAB in-process

7

Technologies at Play

Event Tracing for Windows (ETW)

Native to Windows platformGreat performance & OK diagnostic toolingHistorically hard to publish events

EventSource class

Introduced in .NET Framework 4.5Meant to ease authoring experienceExtensible but supports ETW-only out of the box

Semantic Logging Application Block (SLAB)

Provides several destinations for events published with EventSourceDoes not require any knowledge in ETWAdditional tooling support for authoring events

One thing that can be confusing about SLAB is understanding the technologies that are being used and how they related to one another.

Event Tracing for Windows, or ETW, is a service provided by the Windows platform. It is a general purpose, high speed tracing facility for Windows. It was first introduced all the way back in Windows 2000. It offers excellent performance, but most .NET developer are not familiar with it because it has been historically hard to use from managed code.

In .NET 4.5, the EventSource class was introduced. It eases the authoring experience for tracing events. It is extensible however it only supports ETW out of the box.

SLAB builds on top of EventSource, and allows you to provide different destinations for events that are published with EventSource. This means that you can use SLAB without any knowledge of ETW.

Technologies at Play

.NET Event Source

Custom Event Source

SLAB in-process

Observable EventListener

Sinks

SLAB out-of-process

TraceEvent

Sinks

Third party tools (e.g. PerfView)

Event Log

ETW

Lets take a more visual look at the technologies involved.

First, you will begin by deriving from .NETs EventSource class. This custom class is where you will define what can be logged for your system. (Remember that we are trying to separate the what, when, and where. The when is defined as the location in our application where we invoke the custom event source.)

I havent mentioned it yet, but with SLAB you have a the choice about processes the events either in-process or out-of-process.

In the in-process case, we dont really care about ETW at all. We set up some called an ObservableEventListener that takes events published from our custom event source and writes them to the sinks. In this case, the parts that are SLAB are the ObservableEventListener itself and the sinks. This may be a little bit confusing, because ObservableEventListener inherits from EventListener which is part of .NET itself.

In the out-of-process story, the custom event source in our primary application is publishing the events to ETW. From ETW, we can access the events in a few different ways such as the Event Log or PerfView.

Generally though we have a secondary application running whose purpose is to receive the published events and write them to one or more sinks.

SinksFormattersOut-Of-Process ServiceEvent Source AnalyzerObservable Event Listener

Sinks

Features of SLAB

Lets talk more specifically about the features that SLAB offers. There are five high level features to understand:

Sinks are the destination where log entries will ultimately be persisted.Formatters provide a pluggable way to format the log entries before they are persistedOut-of-process service we provide a ready-to-go service for working with the events out-of-processEvent Source Analyzer for ensuring that we have consistently and properly designed our event sourceObservable Event Listener allows for interesting filtering of event using Reactive Extensions (Rx)

Azure TablesSQL DatabaseFlat fileRolling flat fileConsoleElasticsearch

Sinks

Features of SLAB

In the box, we provide sinks for:

Azure TablesSQL DatabaseFlat fileRolling flat fileConsoleElasticsearch

The sink for Elasticsearch has just been release (March 28) as part of SLAB 1.1. Were very excited about it because it is first release since going open source, and we had significant community involvement.

I should note that Windows Event Log is not supported by SLAB. However is now natively supported by EventSource.

JSONXMLNatural (plain-text)

Formatters

Features of SLAB

The formatters are used for sinks such as flat file and console.

Hosted as a Windows Service or consoleAll sinks are supportedConfiguration-driven with support for re-configuration

BenefitsIncreased fault tolerance in case of application crashMonitored application does not reference SLABCan monitor multiple processes from a single service.Moves the logging overhead from the application to a separate process (but the overhead is still there!)

Out of Process Service

Features of SLAB

In the box we provide both a Windows Service and a console application that can be used to monitor events out-of-process. In both case, all of the sinks are supported. The hosts are configuration driven and there is support for re-configuring on-the-fly.

The events in the originating process are sent to ETW. The host is a dedicated process, independent of your application, that is used to just to persist the events to different destinations.

This has several benefits, however the primary benefit is resiliency. There is increased fault tolerance in case of your application crashing. In addition, there is no need to reference SLAB in the monitored application if you are only using out-of-process. Likewise, it remove some of the logging overhead from your primary application.

The output from multiple monitored applications can be sent to a single out-of-process host.

There are some downsides to this. It can be slightly more complicated to set up and it is more difficult to work with at development time. We discuss the trades off in the documentation.

// can be run in a unit test[TestMethod]public void AnalyzeAExpenseEvents(){ EventSourceAnalyzer.InspectAll(AExpenseEvents.Log);}

// will verify correctness of events// this example has inconsistent ID and order of parameters[Event(111)]public void MyInvalidEvent(int someArgument, string otherArgument, int userId){ this.WriteEvent(222, someArgument, userId, otherArgument);}

Event Source Analyzer

Features of SLAB

There are still some rough edges when using EventSource. For example, you have to specify the event id in both the attribute decorating the logging method as well as in the call to WriteEvent. Likewise, you have to be consistent in the order of the parameters passed to these two methods. Since this can be error prone, SLAB provides the Event Source Analyzer. It allows you to inspect a specific instance of an event source for problems.

Here you can see some sample code where we inspect the event source during a unit test.

Event listener is IObservable.Event sinks are IObservers.Can leverage Reactive Extensions (Rx) to filter, pre-process or transform the event stream before its persisted.

Based on Observable

Features of SLAB

SLABs implementation of Event Listener implements IObservable. Likewise, the included syncs implement IObserver. These interfaces are part of Reactive Extensions, also known as Rx. Since these interfaces are used, this means that we can leverage Rx for filtering and transforming the event stream before we pass it along to a sink.

I suspect that there are many of you who are not familiar with Rx. Thats okay. I should also state that there is no hard dependency on Rx unless you want to use it for processing the events. Let me do a quick demonstration of Rx and how it can work with SLAB for these who may not be familiar with it.

demo >> Flush on Error/Alarm Flood Throttle

16

Support for ActivityIdsAbility to capture events from source not publicly availableSink for Elasticsearch Performance improvementsImproved extensibility storyMinor bug fixes

http://aka.ms/slab1_1

SLAB 1.1

Whats Coming Next Now

Originally, I was going to share whats next with SLAB. However, we just released the 1.1 update for SLAB (March 27).This new release includes support for ActivityIds. Activity Ids are a feature added to Event Source in .NET 4.5.1. They allow you to trace events across a logical transition. This is especially usefully when using something like TPL (Task Parallel Library). I wont go into details here, theres more information about Activity Id support in the release notes.

In additional to some performance improvements and bug fixes, we have also included an additional sink for Elasticsearch. If you are not familiar with Elasticsearch, its a data store with built-in full text search and analysis capabilities. Again, more details are available in the release notes.

I would also like to emphasis that this release was largely due to community contributions. All of the projects under the Enterprise Library umbrella are now open sourced under Apache 2.0 and we are welcoming community contributions. If you are interested in contributing, please check http://aka.ms/entlibopen.

Evaluate SLAB and adopt it (search for slab in NuGet).Read the docs - aka.ms/slab Practice the Hands-on Labs &Quickstarts - aka.ms/el6holsEngage with us by providing feedback and/or submitting contributions - slab.codeplex.com

Call to Action

If you are interested in SLAB, let me recommend the following:

Evaluate SLAB and adopt it (search for slab in NuGet).Read the docs - aka.ms/slab Practice the Hands-on Labs &Quickstarts - aka.ms/el6holsEngage with us by providing feedback and/or submitting contributions - slab.codeplex.com

We are very interested in your feedback. Now that we are truly open source, you can be very direct with your feedback. You can send us a pull request!

http://slab.codeplex.comhttp://aka.ms/slabhttp://entlib.codeplex.com

Resources

microsoft.com/practices

@bennagedev.bennage.com

19

?

Christopher Bennagepatterns & [email protected]

Internet

CodeFest 2014. Christopher Bennage — Semantic Logging. Avoiding the log chaos