Crushing Bugs & Looking to the Future
Summary: We’ve made fixes to the app that address some issues that users were reporting, including updates to the new settings dashboard and folders. We also enhanced HipChat integration, and took big steps to avoid another outage like we had in February.
Hiccups with Settings
Some customers had reported issues with our new settings dashboard. The dashboard itself appeared to be working, but the settings weren't integrated into the rest of the app correctly. While we missed this in staging, our customers started seeing the effects immediately.
- The "My New & Open" folder would show tickets for the last user to edit it in settings
- Rules would break because they were looking for numeric ID’s instead of email addresses
- Templates for automatic responses were saved with the wrong variables names
Fortunately we were able to squash the bulk of the bugs in a few days, and everything is back up and running smoothly.
After only two weeks, the new HipChat app is already #1 in our App Store. When we removed email body content from HipChat, users demanded it back. We’re happy to say that you can now select whether or not to include ticket bodies in your HipChat notifications.
Those bodies are now easier to read too, after a few new formatting changes for HTML emails.
Folder Count Issue
Internally, we're most excited about this fix, as it has been our most-reported bug from customers..
The elusive "Folder Count Issue" has dogged us for months, but we finally found the culprit this week. It turned out to be caused by four actions, which had to be taken by two separate users in an exact order.
It took eight hours to debug, but the issue is now fixed.
Upcoming Infrastructure Changes
With February’s outage burning in our minds, we spent a lot of time thinking about recovering and ensuring that it won’t repeat itself. It's not something we ever want to put our customers (or ourselves) through again, so we now have a backup plan for all our key points of failure. While next week's post will go into more detail, some of the biggest pieces:
- Dedicated resources for outgoing email
- Backup database server ready to takeover at a moment's notice
- Backup chat server so your customers are never left hanging
- Splitting resources between Amazon availability zones in case of system wide outages
We're moving up to nine EC2 instances in total( from five), and have completely automated provisioning our cluster from scratch. Next week we'll share the final plan, and what it took to switch over.