Share this
Part two: Following the trail to performance problems in front of the webserver
by Ben Rowan on 23 February 2016
In part one of this series 'Finding the haystack: troubleshooting a hard to find software performance problem', we talked about a client with a post-release performance problem and discussed the first few steps involved in the trouble shooting phase. The analysis told us that the problem was not server side and narrowed the problem space to “everything outside the server.” So where does that leave us?
Complex environments, incomplete information
The client has large numbers of staff spread throughout the country. The majority of users connect to a Citrix desktop via a “toaster” terminal on their desk, and most users log on to a Citrix Hosted Desktop (CHD). A Citrix Delivered Desktop (CDD) is available for users with more complex needs, and a small number of power users have physical PCs. There are Citrix farms in two data centres, one co-located with the application, and one remotely.
At this stage, the only source of information about how the system performed was service desk tickets raised by end users. This is a valuable source of information, but it is difficult to use as a basis for diagnosis of a problem: User reports are subjective and self-selecting to the extent that they border on “anecdotal.” However, analysis of the tickets showed one thing very clearly: users on CHD were most affected, followed by CDD users. Physical PCs were largely unaffected.
These reports pointed us squarely at the performance of the Citrix environment, and what we wanted was an objective measure of user experience across all CHD and CDD instances. This would have been very valuable given the apparent transience and non-repeatability of the problem. Unfortunately there was no monitoring in place which could provide those figures, so it was up to us to fill the information gap.
Real users
We wanted to see for ourselves how things were performing for users who were logging service desk tickets, so we asked the service desk to put staff in touch with us. Sitting down with the end users, it was clear that the performance they were experiencing was unacceptably poor, and worse than what we’d expect as a result of the known changes in performance under this release. We observed that the performance problems came and went, lasting anything up to half an hour.
Armed with details of problematic sessions, we looked again at the webserver level response time figures, focussing on users we knew were experiencing problems. In every case, the server response times were consistent throughout the day: From the server’s perspective, the response time was the same when users reported bad performance as it was when performance was good. This supported the theory that the problem was outside the bounds of the webserver.
The next thing we wanted to confirm was that the network-level HTTP response times were close to the HTTP response times recorded by the webservers. If they weren’t, we could be confident we were dealing with a network issue. When the service desk next put us in touch with a user who reported poor performance, we used WireShark to capture the network traffic between the client and the server. (We did this on CDD to keep the packet capture as noise-free as possible.) Comparing the WireShark measured response times to those recorded by the webserver showed a difference of a handful of milliseconds; far too small to suggest the problem was the network CDD instance and the webserver.
It must be Citrix…?
By this stage we were confident we could rule out problems with:
- Webserver performance (encapsulating database and storage layer performance, application server resource consumption limits, etc.)
- Problems between the webservers and CHDs (including the network, load balancers, and so on)
This narrows the problem space down to somewhere between receiving the response from the webserver and getting the rendered page on the glass, in front of the user. Still a lot of ground to cover. The obvious thing to look at is whether the Citrix servers are overloaded. But Citrix supports a wide variety of applications, and performance problems are only being reported in one of them. Wouldn’t an overloaded Citrix environment impact all applications?
Look out for the next article in this series 'Part three: Target sighted – is Citrix causing the performance problem?'
Share this
- Agile Development (153)
- Software Development (126)
- Agile (76)
- Scrum (66)
- Application Lifecycle Management (50)
- Capability Development (47)
- Business Analysis (46)
- DevOps (43)
- IT Professional (42)
- Equinox IT News (41)
- Agile Transformation (38)
- IT Consulting (38)
- Knowledge Sharing (36)
- Lean Software Development (35)
- Requirements (35)
- Strategic Planning (35)
- Solution Architecture (34)
- Digital Disruption (32)
- IT Project (31)
- International Leaders (31)
- Digital Transformation (26)
- Project Management (26)
- Cloud (25)
- Azure DevOps (23)
- Coaching (23)
- IT Governance (23)
- System Performance (23)
- Change Management (20)
- Innovation (20)
- MIT Sloan CISR (15)
- Client Briefing Events (13)
- Architecture (12)
- Working from Home (12)
- IT Services (10)
- Data Visualisation (9)
- Kanban (9)
- People (9)
- Business Architecture (8)
- Communities of Practice (8)
- Continuous Integration (7)
- Business Case (4)
- Enterprise Analysis (4)
- Angular UIs (3)
- Business Rules (3)
- Java Development (3)
- Lean Startup (3)
- Satir Change Model (3)
- API (2)
- Automation (2)
- GitHub (2)
- Scaling (2)
- Toggles (2)
- .Net Core (1)
- Diversity (1)
- Security (1)
- Testing (1)
- February 2024 (3)
- January 2024 (1)
- September 2023 (2)
- July 2023 (3)
- August 2022 (4)
- August 2021 (1)
- July 2021 (1)
- June 2021 (1)
- May 2021 (1)
- March 2021 (1)
- February 2021 (2)
- November 2020 (2)
- September 2020 (1)
- July 2020 (1)
- June 2020 (3)
- May 2020 (3)
- April 2020 (2)
- March 2020 (8)
- February 2020 (1)
- November 2019 (1)
- August 2019 (1)
- July 2019 (2)
- June 2019 (2)
- April 2019 (3)
- March 2019 (2)
- February 2019 (1)
- December 2018 (3)
- November 2018 (3)
- October 2018 (3)
- September 2018 (1)
- August 2018 (4)
- July 2018 (5)
- June 2018 (1)
- May 2018 (1)
- April 2018 (5)
- March 2018 (3)
- February 2018 (2)
- January 2018 (2)
- December 2017 (2)
- November 2017 (3)
- October 2017 (4)
- September 2017 (5)
- August 2017 (3)
- July 2017 (3)
- June 2017 (1)
- May 2017 (1)
- March 2017 (1)
- February 2017 (3)
- January 2017 (1)
- November 2016 (1)
- October 2016 (6)
- September 2016 (1)
- August 2016 (5)
- July 2016 (3)
- June 2016 (4)
- May 2016 (7)
- April 2016 (13)
- March 2016 (8)
- February 2016 (8)
- January 2016 (7)
- December 2015 (9)
- November 2015 (12)
- October 2015 (4)
- September 2015 (2)
- August 2015 (3)
- July 2015 (8)
- June 2015 (7)
- April 2015 (2)
- March 2015 (3)
- February 2015 (2)
- December 2014 (4)
- September 2014 (2)
- July 2014 (1)
- June 2014 (2)
- May 2014 (9)
- April 2014 (1)
- March 2014 (2)
- February 2014 (2)
- December 2013 (1)
- November 2013 (2)
- October 2013 (3)
- September 2013 (2)
- August 2013 (6)
- July 2013 (2)
- June 2013 (1)
- May 2013 (4)
- April 2013 (5)
- March 2013 (2)
- February 2013 (2)
- January 2013 (2)
- December 2012 (1)
- November 2012 (1)
- October 2012 (2)
- September 2012 (3)
- August 2012 (3)
- July 2012 (3)
- June 2012 (1)
- May 2012 (1)
- April 2012 (1)
- February 2012 (1)
- December 2011 (4)
- November 2011 (2)
- October 2011 (2)
- September 2011 (4)
- August 2011 (2)
- July 2011 (3)
- June 2011 (4)
- May 2011 (2)
- April 2011 (2)
- March 2011 (3)
- February 2011 (1)
- January 2011 (4)
- December 2010 (2)
- November 2010 (3)
- October 2010 (1)
- September 2010 (1)
- May 2010 (1)
- February 2010 (1)
- July 2009 (1)
- April 2009 (1)
- October 2008 (1)