Update about delivery issues in Zoho Mail

Update about delivery issues in Zoho Mail

Late last night, we saw sporadic issues reported in the forums about delivery of email.

My apologies for the inconveniences we have caused you all. 

I want to take this opportunity to give you all a brief background of the source of this problem, and how we are addressing it. 

In the last one year, these are rough estimates of how we have grown: 
1. Number of users has grown by 35% 
2. Number of emails processed per day has become 2.5x compared to last year 

Our framework team has been anticipating this growth and has been working on moving our infrastructure to a more scalable system for more than 2 years now. After a lot of research and proving the concept with our massive log data, we have been moving the actual data in phases to the new system and have completed migrating 65% of data till now.  

While doing so, in the last 6 months, we have also made many enhancements towards splitting the data into several clusters based on the nature of functionality as well as user types so that any issues with one grid do not impact the other grids.  

This load of migration to the new system, added to the already increased volume of requests, compounded the stress on the system and led to the manifestation of a rare bug in one of the open source components we use. 

The priority was to fix the issue 
- without causing complete downtime to the Mail system 
- without corrupting any user data 
- without affecting mail reception (every single mail was stored in our queues and delivered after the issue was resolved)  

So, it took a while for us to fix and restore the grid and start processing mails from our queues. Going forward, we plan to further split the data into even smaller clusters so that even if there is a problem, the percentage of users impacted will be minimal and we will also have the option to quickly reroute storage to a different grid so that the duration of impact is also minimal. 

 In addition, we are also improving our monitoring system to forecast such issues well in advance so that we can quickly take proactive measures. 

 We are focused on resolving the problems and making the system better and doing our best towards that. 

Again, my apologies for the inconveniences we have caused you all.

Radha
Product Manager - Zoho Mail