Windows Azure 1.4 Diagnostics All Up Overview
I know that there have been a number of posts and articles out there about using diagnostics in Windows Azure. This, in fact, was part of the problem when I went to flesh out the details of what’s available recently. I found a bunch of different articles, but spread across many different releases of Azure so it was fairly time consuming to figure out what would work with the latest Azure SDK (1.4). So this post will hopefully just bring together the main points for using Azure diagnostics with the 1.4 version of the SDK.
To begin with, as many of you probably know, Azure includes a built in Trace listener that will take your Trace.* commands (like Trace.Write, Trace.WriteLine, Trace.TraceInformation, etc.) and store them in memory. However, you need to “do something” to move it from memory into persistent storage. That could be kicking off a manual transfer of the data, or configuring a schedule for the transfers to occur. In addition to that, you can also choose to move over information from event logs, capture performance counters, move over IIS logs as well as custom logs.
In addition to all of the typical logging and debugging tools like those mentioned above, you can also configure your deployment to allow you to RDP into the Azure server hosting your application, as well as enable IntelliTrace for limited debugging and troubleshooting in an application that’s been deployed. Let’s walk through these different pieces.
To configure the different diagnostic components, such as how often they persist data to storage, how much storage should be allocated, which perfmon counters to capture, etc., it can most easily be done by writing some code in the WebRole.cs file that comes with a standard Web Role Azure application (and I think most Azure features other than VM role have something analogous to a WebRole class, like the WorkerRole.cs file with a Worker Role project). Before we start looking at code, you should go into your Azure Role project and check the box on the Configuration tab that says “Specify the storage account credentials for the Diagnostics results:”. Use the picker button there to select a storage account you have in Azure; do not use local development. This will save a new connection string to the project called Microsoft.WindowsAzure.Plugins.Diagnostics.ConnectionString.
Now let’s look at the entire chunk of code in the web role class, and then I’ll break down some of the specifics:
publicoverrideboolOnStart()
{
// For information on handling configuration changes
// see the MSDN topic at
try
{
//initialize the settings framework
Microsoft.WindowsAzure.CloudStorageAccount.SetConfigurationSettingPublisher((configName, configSetter) =>
{
configSetter(RoleEnvironment.GetConfigurationSettingValue(configName));
});
//get the storage account using the default Diag connection string
CloudStorageAccountcs =
CloudStorageAccount.FromConfigurationSetting(
"Microsoft.WindowsAzure.Plugins.Diagnostics.ConnectionString");
//get the diag manager
RoleInstanceDiagnosticManagerdm = cs.CreateRoleInstanceDiagnosticManager(
RoleEnvironment.DeploymentId,
RoleEnvironment.CurrentRoleInstance.Role.Name,
RoleEnvironment.CurrentRoleInstance.Id);
//get the current configuration
DiagnosticMonitorConfiguration dc = dm.GetCurrentConfiguration();
//if that failed, get the values from config file
if (dc == null)
dc = DiagnosticMonitor.GetDefaultInitialConfiguration();
//Windows Azure Logs
dc.Logs.BufferQuotaInMB = 10;
dc.Logs.ScheduledTransferLogLevelFilter = LogLevel.Verbose;
dc.Logs.ScheduledTransferPeriod = TimeSpan.FromMinutes(5);
//Windows Event Logs
dc.WindowsEventLog.BufferQuotaInMB = 10;
dc.WindowsEventLog.DataSources.Add("System!*");
dc.WindowsEventLog.DataSources.Add("Application!*");
dc.WindowsEventLog.ScheduledTransferPeriod = TimeSpan.FromMinutes(15);
//Performance Counters
dc.PerformanceCounters.BufferQuotaInMB = 10;
PerformanceCounterConfigurationperfConfig =
newPerformanceCounterConfiguration();
perfConfig.CounterSpecifier = @"\Processor(_Total)\% Processor Time";
perfConfig.SampleRate = System.TimeSpan.FromSeconds(60);
dc.PerformanceCounters.DataSources.Add(perfConfig);
dc.PerformanceCounters.ScheduledTransferPeriod = TimeSpan.FromMinutes(10);
//Failed Request Logs
dc.Directories.BufferQuotaInMB = 10;
dc.Directories.ScheduledTransferPeriod = TimeSpan.FromMinutes(30);
//Infrastructure Logs
dc.DiagnosticInfrastructureLogs.BufferQuotaInMB = 10;
dc.DiagnosticInfrastructureLogs.ScheduledTransferLogLevelFilter =
LogLevel.Verbose;
dc.DiagnosticInfrastructureLogs.ScheduledTransferPeriod =
TimeSpan.FromMinutes(60);
//Crash Dumps
CrashDumps.EnableCollection(true);
//overall quota; must be larger than the sum of all items
dc.OverallQuotaInMB = 5000;
//save the configuration
dm.SetCurrentConfiguration(dc);
}
catch (Exception ex)
{
System.Diagnostics.Trace.Write(ex.Message);
}
returnbase.OnStart();
}
Now let’s talk through the code in a little more detail. I start out by getting the value of the connection string used for diagnostics, so I can connect to the storage account being used. That storage account is then used to get down to the diagnostic monitor configuration class. Once I have that I can begin configuring the various logging components.
The Windows Azure Logs is where all of the Trace.* calls are saved. I configure it to store up to 10MB worth of data in the table that it uses, and to persist writes to the table every 5 minutes for all writes that are Verbose are higher. By the way, for a list of the different tables and queues that Windows Azure uses to store this logging data you can see here – – and here – Infrastructure logs and Diagnostics logs are virtually identical.
For event viewer entries, I have to add each log I want to capture to the list of DataSources for the WindowsEventLog class. The values that I could provide are Application!*, System!* or UserData!*. The other properties are the same as described for Windows Azure Logs.
For perfmon counters, you have to describe which counters you want to capture and how frequently they should sample data. In the example above, I added a counter for CPU and configured it to sample data every 60 seconds.
Finally, the last couple of things I did were to enable capturing crash dumps, changed the overall quota for all the logging data to approximately 5GB, and then I saved changes. It’s very important that you bump up the overall quota, or you will likely throw an exception that says you don’t have enough storage available to make the changes described above. So far 5GB has seemed like a safe value, but of course your mileage may vary.
Now it’s ready to go, so it’s time to publish the application. When you publish the application out of Visual Studio, there are a couple of other things to note:
In the Publish Settings dialog, you should check the box to Enable IntelliTrace; I’ll explain more on that later. In addition, I would recommend that you click on the link to Configure Remote Desktop connections…; at times I found this to be the only way I was able to solve an issue. Since the documentation on remote desktop has faded out of being current a bit, let me just suggest to you that you use this dialog rather than manually editing configuration files. It brings up a dialog that looks like this:
The main things to note here are:
- You can seemingly use any certificate for which you have a PFX file. Note that you MUST upload this certificate to your hosted service before publishing the application.
- The User name field is whatever you want it to be; a local account with that user name and password will be created.
So now you complete both dialogs and publish your application. Hit your application once to fire it up and make sure your web role code executes. Once you do that, you should be able to go examine the diagnostics settings for the application and see your customizations implemented, as shown here (NOTE: I am using the free CodePlex tools for managing Azure that can be downloaded from
After I have some code that has executed, and I’ve waited until the next Scheduled Transfer Period for the Windows Azure Logs, I can see my Trace.* calls showing up in the WADLogsTable as shown here:
Also, since I configured support for RDP into my application, when I click on the web role the option to make an RDP connection to it is enabled in the toolbar in the Azure Developer Portal:
So I have all the logs and traces from my application available to me now, and I can RDP into the servers if I need to investigate further. The other cool feature I enabled was IntelliSense. Describing IntelliSense is beyond the scope of this posting, but you can find out some great information about it here and here When IntelliTrace is enabled, it says so when I view my hosted service in the Visual Studio Server Explorer:
I can then right click on an instance in my application and select the View IntelliTrace logs menu item. That downloads the IntelliTrace logs from Azure and opens them up in Visual Studio, which looks like this:
As you can see from the picture, I can see the threads that were used, any exceptions that were raised, System Info, the modules that were loaded, etc. I simulated an exception to test this out by setting my overall storage allocation for diagnostic info to 50MB. You may recall that I mentioned needing more like 5GB. I made the change and published my application, and then a few minutes later downloaded the IntelliTrace logs. Sure enough I found the error highlighted here in the second page of logs:
So there you have it – a good overview of diagnostics in Windows Azure 1.4. We’re capturing Trace events, event logs, perf counters, IIS logs, crash dumps, and any custom diagnostic log files. I can RDP into the server for additional troubleshooting if needed. I can download IntelliTrace logs from my application and have a limited debugging experience in my local instance of Visual Studio 2010.