Part 3 – Performance monitoring Microsoft Exchange in production

by Clive Williams on 12/04/2016 10:00

Exchange server performance health checker script to help with performance monitoringof Microsoft Exchange in production

In the previous posts in this series we covered the Performance risks with Microsoft Exchange and then Performance testing Microsoft Exchange using Jetstress and Load Generator.

Often a good approach to mitigating performance risk in Exchange implementations is to employ specialised monitoring in production. Even if you’ve done testing as described in the Performance testing Microsoft Exchange using Jetstress and Load Generator blog post, you probably need to plan to monitor as well during your upgrade. Exchange workloads can be extremely volatile. Changes such as a popular new feature can have significant impact. Changes for users can prompt new ways of using the system to help solve business issues.

This specialised monitoring is above what you would normally use for the general health or capacity management of an environment. It can be resource intensive in terms of manpower to collect, analyse and communicate results. You also need to be careful that it does not impact system resource usage by ‘monitoring the monitoring’. Thus it needs to be planned and managed. You probably are going to use this during the initial time immediately post-implementation or throughout a phased migration. Then it’s probably something you’re going to turn off once your implementation becomes stable. However, it’s a useful process to use periodically to ensure the ongoing health of the environment and to spot trends.

What to use for performance monitoring and analysis

There’s a number of ways to measure performance and workload on Exchange. I’ll cover each in the sections below. You’re probably going to use a combination of these to get complete information and to provide a sanity check.

Exchange utilities and tools

Exchange has a number of tools and utilities that can be used to help you with verifying that the correct behaviour of your Exchange environment has been achieved.

Best Practices Analyser for Exchange Server 2013

The Best Practices Analyzer was introduced in Exchange 2013 Service Pack 1. It is very useful to run this against your installation to validate your Exchange setup. Details can be found at Office 365 Best Practices Analyzer for Exchange Server 2013.

Other tools

There are a number of tools or scripts that can be used to report on aspects Exchange behaviour.  Typically you can find these at the TechNet Gallery for Exchange. Last time I looked there was over 1,000 results so not all may be useful or relevant to your environment!  A lot of these are based on Powershell scripts that can be adapted if you have the skills or time. One that has proved useful is the Exchange Server Performance Health Checker.

Note there are some older tools from Microsoft (e.g. ExMon) that measure part of the workload. Be careful that you understand what is being reported by these tools as you may be under representing what is happening on your environment.

Exchange IIS log and Microsoft Log Parser Studio

There are a number of logs that are produced in an Exchange.

One log that should be looked at is the IIS log, especially for Exchange 2013 and later. All requests go through IIS for these versions. This can be optimally analysed using Microsoft’s Log Parser Studio.  Details of the tool and its use can be found at IIS Logs and Log Parser Studio Reports.

Some of the Exchange scripts referenced above will help you work through the Exchange logs.

It needs planning to make sure that any required logs are enabled and also that any logging does not impact resource requirements.

Perfmon counters

Windows performance monitor counters provide a valuable source of information about both system resource consumption and processed workload. Exchange has a rich set of specialised counters (over 3000 in Exchange 2013) that can be used to help analyse performance and workload.

As standard, Exchange 2013 has two performance monitor jobs running:

  • ExchangeDiagnosticsDailyPerformanceLog
  • ExchangeDiagnosticsPerformanceLog.

These produce files that are used by Exchange’s Managed Availability framework (which runs as the Microsoft Exchange Health Management service). Although you can use information captured in these logs, you are advised to set up your own, separate collection.

Microsoft have published recommended counters to monitor. This link shows what to do for Exchange 2013: Exchange 2013 Performance Counters. (Other versions have similar pages to determine what to collect.)

Given that the number of counters that need to be analysed, you probably need some form of tool to assist you. The next section describes some tools.

Performance Analysis of Logs (PAL) threshold tool

When looking at perfmon counters, one useful tool is the Performance Analysis of Logs (PAL) Tool. This tool takes perfmon logs and analyses them against provided threshold files supplied by industry experts (especially for Microsoft products but increasingly for others). It produces a comprehensive report highlighting possible exceeded thresholds. You can analyse against multiple threshold files. If you’re really brave, you can produce your own threshold recommendations!  It handles not only simple thresholds (if A is greater than B), but also multiple conditions (if A is greater than B and C is less than D) and trends (if rate of change in A is greater than B) so is very powerful.

This tool is written in Powershell and has limitations particularly with memory consumption for large perfmon files. It also isn’t the speediest to execute. For analysing larger Exchange installations, we at Wildstrait have written our own tool to handle this situation.

Tuning Exchange performance

OK so now you’ve done the monitoring, what can you do about any performance problems that you may have detected?

  • First of all, if the major system resources (CPU, memory, disk latency) show symptoms of problems, these need to be addressed. There needs to be a systematic analysis of any detected problems as root causes can be many and not necessarily directly related to Exchange.
  • Review the Application log and System log on the computers running Exchange and Active Directory for any events related to the smooth running of Exchange and fix any issues.

Microsoft have published a number of recommendations for areas to look at. I’m summarising these here but beware some of these have been disputed by other parties! It’s definitely worth looking at the result of monitoring before diving in and changing any set-up.

  • Examine use of hyperthreading. Microsoft recommend turning off hyperthreading for Exchange. So it’s worth considering and investigating whether this can help.
  • Tune .NET. Microsoft recommend making sure various hotfixes are installed to prevent excessive garbage collection. Also ensure that the recommended version of .NET (4.5.1 was indicated previously) is installed. Check current documentation and your software levels for further details.
  • Look at network optimisation. For example, Microsoft do not recommend using multiple NICs for Exchange. Offload features especially RSS are advised.
  • Tune storage caching. Set DAS storage caching for 100% write cache for multi-role or Mailbox Exchange servers. Microsoft recommend other roles (CAS, AD) may need different settings and benefit from tuning. Start with 25% write / 75% read as a starting point.
  • Use SSL offload if feasible and where suitable devices are available. This will reduce CPU consumption.
  • Set the pagefile size to recommended size and check key memory performance counters.
  • Adopt virtualisation best practices. Microsoft have made recommendations about what they believe is best for Exchange but I’ve seen there is some dispute between Microsoft and virtualisation providers about this.
  • Active Directory performance. AD bottlenecks can be a frequent cause of Exchange performance issues. Look for high query latencies (\MSExchange ADAccess Domain Controllers(*)\LDAP Search Time) within Exchange and high CPU (\Processor(_Total)\% Processor Time) on the AD environment. Use the ‘Active Directory Diagnostics’ data collector set to examine in detail. Look at memory on the AD servers to see whether caching can be optimised.
  • Log on tuning. If you’re using NTLM authentication, make sure the known issues are not happening. Look at some key counters related to Netlogon.

There’s some areas that may appear to be beneficial to look into but Microsoft believe they’re not useful.  So don’t bother about:

  • Tuning IIS as Microsoft believe this makes little difference for Exchange.
  • Using storage tiering as Exchange ‘hot blocks’ of storage will vary more than this mechanism can support.

Balancing Exchange

Microsoft strongly recommend using features in Exchange to balance load across different Exchange components. Specific areas of balancing include:

  • CAS layer
    • Use load balancing
    • Use suitable traffic distribution policy
    • Spread inbound requests across all CAS servers
  • DAG
    • Equal distribution of active copies
    • Redistribute active copies during localised high load events
  • Within databases
    • Aim for equal utilisation both in space and activity
    • Spread out heavy and light users
    • Ongoing re-balancing
  • Within Mailboxes
    • Aim for more folders fewer messages
    • Mailbox shaping

The aim is to employ all your available system resources in the most effective way.

Exchange workload management

Next it’s worth checking that the workload being supported by the system is what you expect it to be!  There may be unexpected peaks or the mix between traffic types may vary from your design. There’s some key counters that help show what the actual workload is. Look at RPC Operations/sec and message traffic counters to indicate overall workload transaction rate. Look at the RPC Average latency counter as an indicator of potential issues.

Automated workload management

Exchange 2013 introduced built-in features to self-manage workloads based on the health of system resources. It’s worth understanding what’s included as it’s possible the supplied default configurations may need to be changed for your environment. This article, Exchange-workload-management, provides a good overview.

In particular, two mechanisms are used.

User message throttling

Microsoft describes this as:

Message throttling refers to a group of limits that are set on the number of messages and connections that can be processed by a Microsoft Exchange Server 2013 computer. These limits prevent the accidental or intentional exhaustion of system resources on the Exchange server.

Details can be found at Message throttling.

You can use this to alleviate performance issues in at least a couple of cases:

  • Where you have applications that use Exchange to send heavy loads of messages.
  • To ensure high priority, high workload users have a different policy applied to them.
Back pressure

Microsoft describes this as:

Back pressure is a system resource monitoring feature of the Microsoft Exchange Transport service that exists on Microsoft Exchange 2013 Mailbox servers and Edge Transport servers.

Exchange can detect when vital resources, such as available hard drive space and memory, are under pressure, and take action in an attempt to prevent service unavailability. Back pressure prevents the system resources from being completely overwhelmed, and the Exchange server tries to process the existing messages before accepting any new messages. When utilization of the system resource returns to a normal level, the Exchange server gradually resumes normal operation and starts accepting new messages again.

Details can be found at Back pressure. You will know when this has been fired as it will write to the Event log.

This is something that ideally you need to avoid. The settings for this are configurable. Although the defaults are by and large sensible, they should be reviewed against your system configuration.

Bringing it all together

In this blog post series we have seen that there are performance risks with Microsoft Exchange that can impact the business. These risks can be exacerbated with the changing usage of Exchange (e.g. increased browser and mobile access) and with steps like upgrading versions, re-platforming and adding plug ins.

One way to mitigate the performance risks in Microsoft Exchange implementations is through performance testing, using tools such as Jetstress and Load Generator. As covered in this current blog post, another way to mitigate the performance risks with Microsoft Exchange implementations is production monitoring, which may also be done in conjunction with performance testing.

I recommend following the approaches set out in these posts if you need to mitigate the performance risks with Microsoft Exchange. Knowing what may be required and including them into plans well in advance of conducting changes should result in a Microsoft Exchange environment that performs to the needs of your organisation.

Please contact us if you need further help or advice on mitigating the performance risks with Microsoft Exchange in your organisation.


Get blog posts by email

New call-to-action
New call-to-action