Automated deployments is the holy grail of web development shops. Simple shops with only a small number of homogeneous servers have a much easier time than those of us with more complex environments. There are so many pitfalls to watch out for. In this talk, we discuss the good, the bad and the ugly that we experienced in getting our continuous deployment process going.
Continuous integration via Jenkins had been running for months. We used it to build our svn instance and then off of our Git Master Branch for dev, QA and Staging. But getting the code deployed to 10 Data center servers and 6 Amazon servers where different servers have different configurations, operating system and yes, 32/64 bit issues, was quite a headache. Couple that with a 24×7 SAAS website with 6 Million daily visits, we had a challenge. We considered using an open source or commercial product such as Puppet, Capistrano or Maestro but they really didn’t meet the needs. They seemed either overkill or not flexible enough.
So we did the following changes to our build and deployments processes:
1. Moving from SVN cherry-picking releases to version-based promotion
In our previous workflow a release-master would take care of cherry-picking the commits associated to tickets that were successfully tested on Qa and Staging environments, and make a big merge to the next environment, all the way to production.
We moved to version-based workflow in which a new version is tagged everytime new features are merged to our master branch (using github pull requests).
Each time a version it’s approved on an environment (qa exploratory testing, product team acceptance, staging regression tests) it moves to the next one, ending up in production.
2. Decoupling build process from deployments
Updating an environment consisted of running a jenkins which will checkout a specific (environment-specific) branch from SVN, build it, and copy the output to a remote shared folder (using tools like robocopy or unison). In order to move towards continuous deployment we needed a way to deploy faster.
With the new workflow established (step 1), we started to move to each environments versions that match exactly a version deployed to dev environment, so we can now build versions once, deploy anywhere.
Decoupled build process now consists in taking a tagged version, build, and push the build output (a distributable) to a separate git repository. Git makes an excellent job in pushing only the deltas from previous version, even for binary files.
Now versions are built only once (unit tests and static code analysis tools run only once too). Using git as a store for our distributable versions allowed us to:
- Gain ultimate traceability, now not only on source code but on deployed files
- Mapping between deployed files and original source is a breeze, eg. this allows us to know exactly the differences between the code deployed on each environment, and other cool stuff generating automatic changelogs based on pull request info.
- Reduce our build time from more than 20min to < 2min
Once a version is built and pushed to git build repo, it’s ready to be deployed super-fast (a git pull of file deltas) to any environment.
Finally we added a 2nd build process using Release configuration that runs once before moving to qa environments, this one uses msbuild release optimizations and performs all client-side resource optimizations (minification, bundling, hashing, etc.)
3. Git-based deployments
Once versions are published to a git repo, they can be easily deployed using a git clone, and updated with a git pull.
To identify the versions we want to deploy to specific enviroments we use more git tags (eg. env-staging, env-production) that match specific version tags.
In order to setup a new web server we now:
- Clone the build repo
- Setup IIS website
- Run PostDeploy powershell script (more details below)
And each update consists in:
Manually move the env-* (eg. env-production) tag to a newer (or older) version. And on each target server:
- Fetch and check if the env-* tag moved, stop the process otherwise
- Run a predeploy powershell script, that will:
- Take the server out of Load Balancers (If it’s in)
- Git checkout the env-* tag
- Run postbuild powershell script, that will:
- Transform the config files based on the current environment name
- Recycle app pool if needed
- Run automated smoke tests
- Put the server back in the Load Balancers
Note: steps 2.a, 4.b and 4.d are automatically omitted if the diff to the new version doesn’t require an app restart (no .dll or .config changes)
Of course performing this process for each webserver doesn’t escalate well so in order to automate this process we considered other popular deployment tools, but none matched our needs.
We built (and open-sourced) a simplistic “deploy hub” web app that runs on each target server and updates deployed instances (running the process described above) triggered by a REST call, and can notify the result with different mechanisms.
This REST call can be done manually (curl), from a central deploy dashboard, a github post-receive webhook, or at the end of a build process.
Using git here allow us to update environments in no time (specially because as often you deploy, smaller are the deltas). This is very important in achieving zero-downtime, one of the pillars of continuous deployment.
Another consequence is rolling back is pretty straightforward (move the env-* tag and trigger a new deploy), which will happen even faster by just applying the git deltas from an already fetched version.
The overall results is TTR (Time to Release) has got extremely reduced (from hours to a few minutes), and we got rid of the stressful release day big-merges and regression nightmares.
Feedback from features comes as fast as possible, and the number of production found bugs (and the need for emergency hotfixes) reduced drastically .
Since Git does such a great job of identifying what has changed from one tag to another it was a natural way to be able to get code deployed and only have differences applied. So our Git based deployments goes like this:
1) We set up Jenkins to pull from the Git Master branch of our main repository.
2) We used Powershell to script out the commands such as msbuild, git pull, git push, config transformations(cft) ,etc.
3) With each build, we push the completely built website to a separate Repository (Builds) which only contains the build and a tag for each one.
4) To set up the Git deploy we need to go on each server once and do a Git Clone and Pull.
5) We also set an environment variable for each server which identifies the unique configuration for that server. Variables for each config identifies in the Environment variable include different connection strings, Some 3rd party DLL version and some domain configuration.
6) We Built a web app that we install on each server that exposes a Rest call that will execute Powershell script that does a git Pull and runs CFT which will do config file transformations based on the environment variable to configure the specific server environment. Next the powershell script does a recycle of the app pool if DLLs are being deployed and then does a few curl requests which are some simple smoke tests to ensure the site came back up right. Once the curl requests are successful it makes a call to the load balancer to put it back in.
7) We also built a web app that runs on a deployment server that is used to trigger the deployment. The deployment app takes a server out of the load balancer and then makes the rest call to server to trigger the deployment.
So, once QA finishes testing and blesses the release. We manually set that version with the production tag. Then we go to our deployment website, click a button and it rotates through the servers taking then out of the load balancer, calling the deployment app on each server and putting them back in. Then it emails out to tell us that it is done and if there were any issues.
Sounds clean no? Well to get there we had a bunch of pain points we had to get through.
1) Permissions. Since we used a deployment website that calls powershell, we had to run it under an account with proper permissions.
2) Since the deployment websites are on each web server we needed to make sure that permissions were tight. So we locked down the web site with ip and login credentials. We also only allow access to the deployment site if you are accessing it through the internal IP and via a different port other then port 80. We couldn’t use port any anyway because the main website accepts all IP addresses into the server. So we had to go with another port so there was no collision.
3) Because of using Git to deploy, we can not go in and manually deploy anything to the servers or the Git pull would not succeed since the server would not match the tag anymore. In the past we sometimes would take a server out of the load balancer to test a prod issue locally on the box. We needed to be aware if we do that, that we need to do a git reset to put the server back to the state that we can deploy again.
4) Since we run a post step after deploy to set the configuration files for the environment, we needed to have git ignore them.
5) Taking servers in and out of the load balancer that we have was not simple. There is no api for the coyote point so we had two option to take the servers out. SSh into the balancer and edit config files or do http Post commands to the load balancer.
6) Git was always seeing DLLs as changed even when they were not. So it was always deploying the DLLs which made simple deployments of files that do not reset the app pool impossible. We had to put in a compare to verify if the dll had changed or not.
In general it took us about 2 months of deployments to get to this point after we started. We evolved to it by adding different parts each release. It has greatly simplified our dev process and allows us to release more often with more confidence since our releases are smaller. It also enabled us to stop having to get up early to release code since we can automate the release to happen at slower times. We love Git J