Continuous Rollback
Continuous delivery is the next step in continuous integration. When you work in continuous delivery, the application deployments should be very easy and automated. As easy as we can deploy, rollbacks should be easy to perform too, and also automated. However automated deployments and rollback can cause unexpected problems if you are not careful. One such problem brought our production system down. Here is what happened.
At Wix.com we built an automated deployment system based on Artifactory and Chef. Our system works like this. Every few minutes Chef script checks to see if the last version in Artifactory is the one that is deployed on production. If the artifact version is different than the one deployed on production, Chef will get the war from Artifactory and deploy it to all the appropriate machines.
Now what happened to us was that we decided that Artifactory should have a replica in case once instance goes down, so we installed an old back-up of Artifactory on a secondary location and created a script to replicates the master to the slave. Now you can probably guess what happened.
We had a bug in the script where we point the master to be the slave and the slave to be the master. Now the backed-up Artifactory was from last month. So what happened was that the replication went the wrong way and both repositories became rolled back to a state that is a month old.
Since Chef monitors the repository and found that the versions of the artifacts are different than the one in production, it deployed ALL the artifact, causing our entire production system to go back in time (yay, we invented a time machine).
You can guess how fun it was to bring back the whole production system back to the future.
Now of course we are in the process of putting some safeguards in place so it won’t happen again.











RSS Feeds 


