EU DC Partial Outage Resolved: A Detailed RCA

EU DC Partial Outage Resolved: A Detailed RCA

Incident Summary

Due to an overload on one of the nodes in the EU DC for Zoho Desk, the system was unable to handle the heavy load, causing a slowdown in requests and resulting in a partial outage for customers with data residing in that node.

On May 2, 8:26 AM, CEST , the incident was identified by our team. Our engineers determined the root cause and initiated efforts to address it by adjusting our system configurations. During this time, we began transferring high-traffic organizations to other nodes to minimize impact on the affected node. Although our secondary adjustments mitigated the issue and enabled the portals to load and work by May 2, 03:19 PM, CEST, it was not completely resolved.

After completing the scheduled movement of top-traffic generating organizations, we deployed a bug fix to prevent the system from holding connections for extended periods. Stability was fully restored on May 3, 12:40 PM, CEST. The incident and its history were also captured on the service availability page (status.zoho.eu).
 
Technical Breakdown

We identified that our system was experiencing slowness, leading to a significant number of users experiencing difficulty in connecting to Desk. This issue directly impacted the availability and performance of our service.
 
Further analysis using our monitoring system and log data indicated that the high traffic was legitimate and not due to a DDoS attack. Also, this incident was solely related to a load-surge issue, and there was no data loss or data impact caused by the partial outage. Our engineering team worked to optimize our system configuration to handle the traffic, but it did not completely resolve the issue. In addition, we moved top traffic generating organizations from the impacted node to minimize traffic, resulting in some improvement. 

Furthermore, we identified a code bug that held connections for an extended period of time and quickly deployed a live build to rectify the issue.
 
Timeline (in CEST)
 
May 2, 09:26 AM
Incident identified
May 2, 11:40 AM
Root cause identified - high number of connections
May 2, 12:15 PM
System configurations tuned
May 2, 01:51 PM
Started moving top-traffic generating orgs
May 2, 03:54 PM
Second-level system configuration tuning
May 3, 12:12 PM
Preparation of build to fix code bug began
May 3, 12:29 PM
Movement of top-traffic generating orgs completed
May 3, 02:40 PM
Bug fix build went live, stabilizing the system.
 
Future Preventive Measures to Avoid Recurrence of the Issue
  • Relocating selected organizations to other nodes to keep the connection count to the affected node at a minimum.
  • Monitoring the system connections proactively and re-balancing them as necessary.
  • Setting a lower connection threshold to receive early notifications when breached and take prompt action to avoid customer impact.
  • Incorporate a code-check configuration rule to prevent code from holding connections for extended periods of time before being shipped to production.

Regards,
Zoho Desk Team

    Access your files securely from anywhere

        Zoho Developer Community




                                  Zoho Desk Resources

                                  • Desk Community Learning Series


                                  • Digest


                                  • Functions


                                  • Meetups


                                  • Kbase


                                  • Resources


                                  • Glossary


                                  • Desk Marketplace


                                  • MVP Corner


                                  • Word of the Day



                                      Zoho Marketing Automation
                                              • Sticky Posts

                                              • Live Webinar - Work smarter with Zoho Desk and Zoho Workplace integration

                                                Hello customers! Zoho Desk and Zoho Workplace are coming together for a webinar on 14th May, 2024. Zoho Workplace is a suite of productivity apps for email, chat, docs, calls, and more at one single place. Zoho Desk is closely integrated with a few tools
                                              • Apple iOS 17 and iPadOS 17 updates for Zoho Desk users

                                                Hello Zoho Desk users! Apple recently announced the release of iOS 17 and iPad OS 17. These latest OS updates will help you stay productive and efficient, through interactive and seamless user experiences. Zoho Desk has incorporated the updates to help
                                              • Zoho Desk Partners with Microsoft's M365 Copilot for seamless customer service experiences

                                                Hello Zoho Desk users, We are happy to announce that Zoho Desk has partnered with Microsoft's M365 to empower customer service teams with enhanced capabilities and seamless experiences for agents. Microsoft announced their partnership during their keynote
                                              • Zoho Desk Cheat Sheet For The Year-End

                                                Check out these Zoho Desk best practices to end this year on a high and have a great one ahead! #1 Set Business (Holiday) Hours - If you have limited working hours, please make sure you restrict your business hours or set them as holidays for the coming days. Let your customers know when you will, and won't, be available. #2 Update the Annual Holiday List - Check the holidays for the new year and update the holiday schedule. Usually, holidays from the current year will be carried over for the next
                                              • Deprecation of older versions of ASAP Mobile SDK | Zoho Desk

                                                Hello, everyone.    Greetings from Zoho Desk ASAP!   In order to continue to deliver the best and most secure experience to our mobile SDK users. On account of the recent enhancements and updates to the mobile SDKs, we have planned to mark the older versions


                                              Manage your brands on social media



                                                    Zoho TeamInbox Resources

                                                      Zoho DataPrep Resources



                                                        Zoho CRM Plus Resources

                                                          Zoho Books Resources


                                                            Zoho Subscriptions Resources

                                                              Zoho Projects Resources


                                                                Zoho Sprints Resources


                                                                  Qntrl Resources


                                                                    Zoho Creator Resources



                                                                        Zoho Campaigns Resources


                                                                          Zoho CRM Resources

                                                                          • CRM Community Learning Series

                                                                            CRM Community Learning Series


                                                                          • Kaizen

                                                                            Kaizen

                                                                          • Functions

                                                                            Functions

                                                                          • Meetups

                                                                            Meetups

                                                                          • Kbase

                                                                            Kbase

                                                                          • Resources

                                                                            Resources

                                                                          • Digest

                                                                            Digest

                                                                          • CRM Marketplace

                                                                            CRM Marketplace

                                                                          • MVP Corner

                                                                            MVP Corner





                                                                              Design. Discuss. Deliver.

                                                                              Create visually engaging stories with Zoho Show.

                                                                              Get Started Now