Previous chapter: Cultural Change
It is about time we talk about the actual continuous delivery process works, application lifecycle and how the code reaches production once development is done.
Phase 1: Git – Developers push the completed code (and the tests) to a Git repository.
Phase 2: Build – Team city (our CI build server) is triggered - checks out the code from Git; Runs the build, unit tests and integration tests. Once the build is passed a SNAPSHOT artifact is created and pushed to Artifactory.
So far the process was automatically. At this point we decided to go to a manual process in order to do the actual deployments. However this process should be as easy as pressing a button (which it actually is). We are now in the process of also automating deployment to staging making staging continuous deployment environment and production continuous delivery.
Phase 3: Deploy to staging (optional) – Once a snapshot artifact is in Artifactory, it is ready for deployment to staging servers. To manage and trigger a deployment we built our own command and control dashboard called “Wix Lifecycle” it connects to all our backend systems: Team City, Git, Noah, Chef, New Relic and our custom built reverse proxy/load balancers (called “Dispatcher).
Phase 4: Release candidate – To make a release candidate “Lifecycle” triggers Team City to do a special build using our custom build maven release plugin. The maven plugin copies the artifact from Snapshot to a separate repository while advancing the version number in the POM
Phase 5: Deploying to production (GA) – Using “Lifecycle” we trigger a production deployment. Lifecycle updates Noah with the artifact version we want to deploy. Chef agents are installed on all the production machines and every 15 minutes comparing the artifact version installed on the machine against the version that is configured in Noah (by lifecycle). If the versions do not match, Chef installs the configured version.
Phase 6: Installation – Since our services are always clustered the installation does not occur on all the cluster at once. Chef selects one server out of the cluster and installs the new artifact. Then Chef calls a special controller that exists on all the services built with our framework called “SelfTest”. This controller triggers a self test (or startup test) on the service. Only after the self test passes, Chef continues to install the next server. If self test fails then the deployment is stopped until a human intervene, which he can force the next server deployment or roll back the current deployment. Services do not get traffic until their “is_alive” controller returns “true”, and that will not happen unless Self test is passed.
Since during deployment we stop the service the load balancer will identify the instance is down and will not send traffic its way. In order for the instance to go back to the pool of services the Dispatcher is checking the service’s “is_alive” controller which will return “false” until the “Self Test” is passed successfully.
Phase 7: Monitoring – After the deployment the person who pushed the deploy button is responsible to look at the monitoring systems for any unusual behavior. If we identify problem with the deployment a click of a button triggers a rollback (which is simply deploying the previous version.
Many of the deployments do not deploy to all servers, but we also can select a “Test Bed”, a single server that is deployed and runs for a while so we can compare it to the other servers which were not deployed. If everything is OK then we continue with the deployment to all the servers. Having a “test bed” server allows us to minimize problems to only one server out of the cluster.