Share this
Part 3 – Performance monitoring Microsoft Exchange in production
by Clive Williams on 12 April 2016
In the previous posts in this series we covered the Performance risks with Microsoft Exchange and then Performance testing Microsoft Exchange using Jetstress and Load Generator.
Often a good approach to mitigating performance risk in Exchange implementations is to employ specialised monitoring in production. Even if you’ve done testing as described in the Performance testing Microsoft Exchange using Jetstress and Load Generator blog post, you probably need to plan to monitor as well during your upgrade. Exchange workloads can be extremely volatile. Changes such as a popular new feature can have significant impact. Changes for users can prompt new ways of using the system to help solve business issues.
This specialised monitoring is above what you would normally use for the general health or capacity management of an environment. It can be resource intensive in terms of manpower to collect, analyse and communicate results. You also need to be careful that it does not impact system resource usage by ‘monitoring the monitoring’. Thus it needs to be planned and managed. You probably are going to use this during the initial time immediately post-implementation or throughout a phased migration. Then it’s probably something you’re going to turn off once your implementation becomes stable. However, it’s a useful process to use periodically to ensure the ongoing health of the environment and to spot trends.
What to use for performance monitoring and analysis
There’s a number of ways to measure performance and workload on Exchange. I’ll cover each in the sections below. You’re probably going to use a combination of these to get complete information and to provide a sanity check.
Exchange utilities and tools
Exchange has a number of tools and utilities that can be used to help you with verifying that the correct behaviour of your Exchange environment has been achieved.
Best Practices Analyser for Exchange Server 2013
The Best Practices Analyzer was introduced in Exchange 2013 Service Pack 1. It is very useful to run this against your installation to validate your Exchange setup. Details can be found at Office 365 Best Practices Analyzer for Exchange Server 2013.
Other tools
There are a number of tools or scripts that can be used to report on aspects Exchange behaviour. Typically you can find these at the TechNet Gallery for Exchange. Last time I looked there was over 1,000 results so not all may be useful or relevant to your environment! A lot of these are based on Powershell scripts that can be adapted if you have the skills or time. One that has proved useful is the Exchange Server Performance Health Checker.
Note there are some older tools from Microsoft (e.g. ExMon) that measure part of the workload. Be careful that you understand what is being reported by these tools as you may be under representing what is happening on your environment.
Exchange IIS log and Microsoft Log Parser Studio
There are a number of logs that are produced in an Exchange.
One log that should be looked at is the IIS log, especially for Exchange 2013 and later. All requests go through IIS for these versions. This can be optimally analysed using Microsoft’s Log Parser Studio. Details of the tool and its use can be found at IIS Logs and Log Parser Studio Reports.
Some of the Exchange scripts referenced above will help you work through the Exchange logs.
It needs planning to make sure that any required logs are enabled and also that any logging does not impact resource requirements.
Perfmon counters
Windows performance monitor counters provide a valuable source of information about both system resource consumption and processed workload. Exchange has a rich set of specialised counters (over 3000 in Exchange 2013) that can be used to help analyse performance and workload.
As standard, Exchange 2013 has two performance monitor jobs running:
- ExchangeDiagnosticsDailyPerformanceLog
- ExchangeDiagnosticsPerformanceLog.
These produce files that are used by Exchange’s Managed Availability framework (which runs as the Microsoft Exchange Health Management service). Although you can use information captured in these logs, you are advised to set up your own, separate collection.
Microsoft have published recommended counters to monitor. This link shows what to do for Exchange 2013: Exchange 2013 Performance Counters. (Other versions have similar pages to determine what to collect.)
Given that the number of counters that need to be analysed, you probably need some form of tool to assist you. The next section describes some tools.
Performance Analysis of Logs (PAL) threshold tool
When looking at perfmon counters, one useful tool is the Performance Analysis of Logs (PAL) Tool. This tool takes perfmon logs and analyses them against provided threshold files supplied by industry experts (especially for Microsoft products but increasingly for others). It produces a comprehensive report highlighting possible exceeded thresholds. You can analyse against multiple threshold files. If you’re really brave, you can produce your own threshold recommendations! It handles not only simple thresholds (if A is greater than B), but also multiple conditions (if A is greater than B and C is less than D) and trends (if rate of change in A is greater than B) so is very powerful.
This tool is written in Powershell and has limitations particularly with memory consumption for large perfmon files. It also isn’t the speediest to execute. For analysing larger Exchange installations, we at Wildstrait have written our own tool to handle this situation.
Tuning Exchange performance
OK so now you’ve done the monitoring, what can you do about any performance problems that you may have detected?
- First of all, if the major system resources (CPU, memory, disk latency) show symptoms of problems, these need to be addressed. There needs to be a systematic analysis of any detected problems as root causes can be many and not necessarily directly related to Exchange.
- Review the Application log and System log on the computers running Exchange and Active Directory for any events related to the smooth running of Exchange and fix any issues.
Microsoft have published a number of recommendations for areas to look at. I’m summarising these here but beware some of these have been disputed by other parties! It’s definitely worth looking at the result of monitoring before diving in and changing any set-up.
- Examine use of hyperthreading. Microsoft recommend turning off hyperthreading for Exchange. So it’s worth considering and investigating whether this can help.
- Tune .NET. Microsoft recommend making sure various hotfixes are installed to prevent excessive garbage collection. Also ensure that the recommended version of .NET (4.5.1 was indicated previously) is installed. Check current documentation and your software levels for further details.
- Look at network optimisation. For example, Microsoft do not recommend using multiple NICs for Exchange. Offload features especially RSS are advised.
- Tune storage caching. Set DAS storage caching for 100% write cache for multi-role or Mailbox Exchange servers. Microsoft recommend other roles (CAS, AD) may need different settings and benefit from tuning. Start with 25% write / 75% read as a starting point.
- Use SSL offload if feasible and where suitable devices are available. This will reduce CPU consumption.
- Set the pagefile size to recommended size and check key memory performance counters.
- Adopt virtualisation best practices. Microsoft have made recommendations about what they believe is best for Exchange but I’ve seen there is some dispute between Microsoft and virtualisation providers about this.
- Active Directory performance. AD bottlenecks can be a frequent cause of Exchange performance issues. Look for high query latencies (\MSExchange ADAccess Domain Controllers(*)\LDAP Search Time) within Exchange and high CPU (\Processor(_Total)\% Processor Time) on the AD environment. Use the ‘Active Directory Diagnostics’ data collector set to examine in detail. Look at memory on the AD servers to see whether caching can be optimised.
- Log on tuning. If you’re using NTLM authentication, make sure the known issues are not happening. Look at some key counters related to Netlogon.
There’s some areas that may appear to be beneficial to look into but Microsoft believe they’re not useful. So don’t bother about:
- Tuning IIS as Microsoft believe this makes little difference for Exchange.
- Using storage tiering as Exchange ‘hot blocks’ of storage will vary more than this mechanism can support.
Balancing Exchange
Microsoft strongly recommend using features in Exchange to balance load across different Exchange components. Specific areas of balancing include:
- CAS layer
- Use load balancing
- Use suitable traffic distribution policy
- Spread inbound requests across all CAS servers
- DAG
- Equal distribution of active copies
- Redistribute active copies during localised high load events
- Within databases
- Aim for equal utilisation both in space and activity
- Spread out heavy and light users
- Ongoing re-balancing
- Within Mailboxes
- Aim for more folders fewer messages
- Mailbox shaping
The aim is to employ all your available system resources in the most effective way.
Exchange workload management
Next it’s worth checking that the workload being supported by the system is what you expect it to be! There may be unexpected peaks or the mix between traffic types may vary from your design. There’s some key counters that help show what the actual workload is. Look at RPC Operations/sec and message traffic counters to indicate overall workload transaction rate. Look at the RPC Average latency counter as an indicator of potential issues.
Automated workload management
Exchange 2013 introduced built-in features to self-manage workloads based on the health of system resources. It’s worth understanding what’s included as it’s possible the supplied default configurations may need to be changed for your environment. This article, Exchange-workload-management, provides a good overview.
In particular, two mechanisms are used.
User message throttling
Microsoft describes this as:
Message throttling refers to a group of limits that are set on the number of messages and connections that can be processed by a Microsoft Exchange Server 2013 computer. These limits prevent the accidental or intentional exhaustion of system resources on the Exchange server.
Details can be found at Message throttling.
You can use this to alleviate performance issues in at least a couple of cases:
- Where you have applications that use Exchange to send heavy loads of messages.
- To ensure high priority, high workload users have a different policy applied to them.
Back pressure
Microsoft describes this as:
Back pressure is a system resource monitoring feature of the Microsoft Exchange Transport service that exists on Microsoft Exchange 2013 Mailbox servers and Edge Transport servers.
Exchange can detect when vital resources, such as available hard drive space and memory, are under pressure, and take action in an attempt to prevent service unavailability. Back pressure prevents the system resources from being completely overwhelmed, and the Exchange server tries to process the existing messages before accepting any new messages. When utilization of the system resource returns to a normal level, the Exchange server gradually resumes normal operation and starts accepting new messages again.
Details can be found at Back pressure. You will know when this has been fired as it will write to the Event log.
This is something that ideally you need to avoid. The settings for this are configurable. Although the defaults are by and large sensible, they should be reviewed against your system configuration.
Bringing it all together
In this blog post series we have seen that there are performance risks with Microsoft Exchange that can impact the business. These risks can be exacerbated with the changing usage of Exchange (e.g. increased browser and mobile access) and with steps like upgrading versions, re-platforming and adding plug ins.
One way to mitigate the performance risks in Microsoft Exchange implementations is through performance testing, using tools such as Jetstress and Load Generator. As covered in this current blog post, another way to mitigate the performance risks with Microsoft Exchange implementations is production monitoring, which may also be done in conjunction with performance testing.
I recommend following the approaches set out in these posts if you need to mitigate the performance risks with Microsoft Exchange. Knowing what may be required and including them into plans well in advance of conducting changes should result in a Microsoft Exchange environment that performs to the needs of your organisation.
Please contact us if you need further help or advice on mitigating the performance risks with Microsoft Exchange in your organisation.
Share this
- Agile Development (153)
- Software Development (126)
- Agile (76)
- Scrum (66)
- Application Lifecycle Management (50)
- Capability Development (47)
- Business Analysis (46)
- DevOps (43)
- IT Professional (42)
- Equinox IT News (41)
- Agile Transformation (38)
- IT Consulting (38)
- Knowledge Sharing (36)
- Lean Software Development (35)
- Requirements (35)
- Strategic Planning (35)
- Solution Architecture (34)
- Digital Disruption (32)
- IT Project (31)
- International Leaders (31)
- Digital Transformation (26)
- Project Management (26)
- Cloud (25)
- Azure DevOps (23)
- Coaching (23)
- IT Governance (23)
- System Performance (23)
- Change Management (20)
- Innovation (20)
- MIT Sloan CISR (15)
- Client Briefing Events (13)
- Architecture (12)
- Working from Home (12)
- IT Services (10)
- Data Visualisation (9)
- Kanban (9)
- People (9)
- Business Architecture (8)
- Communities of Practice (8)
- Continuous Integration (7)
- Business Case (4)
- Enterprise Analysis (4)
- Angular UIs (3)
- Business Rules (3)
- Java Development (3)
- Lean Startup (3)
- Satir Change Model (3)
- API (2)
- Automation (2)
- GitHub (2)
- Scaling (2)
- Toggles (2)
- .Net Core (1)
- Diversity (1)
- Security (1)
- Testing (1)
- February 2024 (3)
- January 2024 (1)
- September 2023 (2)
- July 2023 (3)
- August 2022 (4)
- August 2021 (1)
- July 2021 (1)
- June 2021 (1)
- May 2021 (1)
- March 2021 (1)
- February 2021 (2)
- November 2020 (2)
- September 2020 (1)
- July 2020 (1)
- June 2020 (3)
- May 2020 (3)
- April 2020 (2)
- March 2020 (8)
- February 2020 (1)
- November 2019 (1)
- August 2019 (1)
- July 2019 (2)
- June 2019 (2)
- April 2019 (3)
- March 2019 (2)
- February 2019 (1)
- December 2018 (3)
- November 2018 (3)
- October 2018 (3)
- September 2018 (1)
- August 2018 (4)
- July 2018 (5)
- June 2018 (1)
- May 2018 (1)
- April 2018 (5)
- March 2018 (3)
- February 2018 (2)
- January 2018 (2)
- December 2017 (2)
- November 2017 (3)
- October 2017 (4)
- September 2017 (5)
- August 2017 (3)
- July 2017 (3)
- June 2017 (1)
- May 2017 (1)
- March 2017 (1)
- February 2017 (3)
- January 2017 (1)
- November 2016 (1)
- October 2016 (6)
- September 2016 (1)
- August 2016 (5)
- July 2016 (3)
- June 2016 (4)
- May 2016 (7)
- April 2016 (13)
- March 2016 (8)
- February 2016 (8)
- January 2016 (7)
- December 2015 (9)
- November 2015 (12)
- October 2015 (4)
- September 2015 (2)
- August 2015 (3)
- July 2015 (8)
- June 2015 (7)
- April 2015 (2)
- March 2015 (3)
- February 2015 (2)
- December 2014 (4)
- September 2014 (2)
- July 2014 (1)
- June 2014 (2)
- May 2014 (9)
- April 2014 (1)
- March 2014 (2)
- February 2014 (2)
- December 2013 (1)
- November 2013 (2)
- October 2013 (3)
- September 2013 (2)
- August 2013 (6)
- July 2013 (2)
- June 2013 (1)
- May 2013 (4)
- April 2013 (5)
- March 2013 (2)
- February 2013 (2)
- January 2013 (2)
- December 2012 (1)
- November 2012 (1)
- October 2012 (2)
- September 2012 (3)
- August 2012 (3)
- July 2012 (3)
- June 2012 (1)
- May 2012 (1)
- April 2012 (1)
- February 2012 (1)
- December 2011 (4)
- November 2011 (2)
- October 2011 (2)
- September 2011 (4)
- August 2011 (2)
- July 2011 (3)
- June 2011 (4)
- May 2011 (2)
- April 2011 (2)
- March 2011 (3)
- February 2011 (1)
- January 2011 (4)
- December 2010 (2)
- November 2010 (3)
- October 2010 (1)
- September 2010 (1)
- May 2010 (1)
- February 2010 (1)
- July 2009 (1)
- April 2009 (1)
- October 2008 (1)