Releasing With Confidence
This is a short story of how we handle deployments of big changes in Transifex. The article also discusses how we are trying to reduce the chances of affecting our customers.
Before things go into production, we have unit tests in place, code reviews and engineers have manually tested their work. Then, there is a pass from the QA team to test our changes and bring issues back to the implementation team. If something is found, we follow the same flow once again.
However, sometimes things might go wrong. This is why we always try to be as prepared as we can, for any kind of mishap.
Confidence-Based Deployments
Before proceeding with the deployment, we gather as a team and do a confidence check. “From 1 to 5 how confident are we to proceed?”. If the confidence is high, we proceed. If someone is not feeling confident, the next thing to do is ask “What do we need to do to go to 5?”.
This creates the space for everyone to raise any last-minute concerns they might have and pushes us to create action items in case there is something more to do.
If confidence is high, then we try to answer one more question. “What could be the worst that could happen once we go live and what can we do to get out of that difficult situation?”. Making this question helps us think about worst case scenarios and what our contingency plan will be, which helps us move to the next step.
Making a List and Checking It Twice
We usually do big initiative deployments as a team. We do have automation in every stage of deployment. This could be done by one person, but being together while these steps turn to green is a great moment for us. That specific group for that specific point in time is on a journey to release to the world the work we’ve been doing for some time now.
Then we – engineers, QA testers and usually a member from the DevOps team – can also start testing things out on production once more when we go live.
One more thing that has proven helpful is – one or two days before a deployment – document the steps to get in production, a Release Plan. Especially in cases with many moving parts, this has been crucial so as not to forget things. If your team has been working with multiple services, you can understand the pain of missing one piece of the puzzle and trying to find out what went wrong.
This list could be as detailed as needed. A simple example would be:
- [ ] Inform everyone that we are releasing a big change
that might need to be reverted in case of an emergency.
Engineers and Customer Support are usually on the
front line of defence.
- [ ] Merge changes to the devel branch.
- [ ] Run database migrations.
- [ ] Release X service and bump X version.
- [ ] Release Y service.
…
- [ ] Inform everyone that the change is deployed.
…
- Emergency plan.
After Release Monitoring
Even if the deployment doesn’t have any kind of hiccups, we continue to monitor things for some time, Sentry errors, performance, and Slack messages for any irregularity that needs our attention. If everything looks fine for a period of time – which depends on the change – then we can gradually step out of the “danger zone”.
Each Team Has Their Own Flow
Although this is a fairly successful flow so far, this is by no means mandatory or needed in all cases. At Transifex, we prefer to have the flexibility to try out things, experiment, share findings between teams and then try again.
No Plan Is Bulletproof
Automation is there to help the team’s confidence and the chance of a change ruining a customer’s day. But, even the best plan could go wrong because something was missed.
Having these small tools in the team’s quiver can help stay focused on the moment, morale, accountability and a reminder that everyone’s opinion or gut feeling matters. These can also save us hours of firefighting that could be prevented by a ten minute detour.
Having an emergency plan always helps, right?
The Recipe For a Happier Deployment
To sum up, our suggestions for your next big and scary changes deployment would be:
- Take a confidence pulse from your team.
- Think of worst case scenarios and emergency plans.
- Make a list of all the things that you need to do to deploy your changes.
- Always keep in mind that a deployment is not the end goal. The team should be on top of everything, until you are out of the “danger zone”.