Automatically upload assets to S3 when deploying to Elastic Beanstalk
This article follows my previous one about safely deploying sensitive-information to Elastic Beanstalk. Today, we will see how we can deploy your assets to S3 automatically.
However, it tended to be quite complicated. The easiest solution is to simply open a FTP, manually upload all the files to S3, and using them in your code. But let’s say you update a line of JS, or add a new image. You need to think about it, open again your FTP, upload it. That sounds like it could be automated.
Hopefully, it can.
There is a nice little tool called “s3cmd”, which can be installed pretty easily, and that let you upload, delete, retrieve… any files to Amazon S3.
1. Allow your instance to upload files to S3
The first step is to allow your EC2 instances (which are created by Elastic Beanstalk) to interact with Amazon S3. The simplest and most secure way is to add an instance role when you create your Elastic Beanstalk environment, and make sure you gave this instance role the right to write to your bucket on S3. You can learn more about EC2 instance role here.
2. Install S3CMD
As for the first article, we are going to make use of configuration file to customize your instance without creating a new AMI. Add the following line in your .config file (remember it must be in a .ebextensions folder, at the root of your project):
sources: /home/ec2-user: http://downloads.sourceforge.net/project/s3tools/s3cmd/1.5.0-alpha3/s3cmd-1.5.0-alpha3.tar.gz
The sources key allow to download a package, and unzip it. Here, I decided to unzip it in “/home/ec2-user” folder. The link may not be the cleanest one, but it’s the only one I’ve found. Also note that you must use AT LEAST the 1.5.0-alpha1 version (which is the first one that take advantage of instance role).
3. Upload the files
Here is what my public folder looks like:
We are going to add three commands in our .config file:
This code deserves some explanation. First, it’s inside the “container_commands” key. Commands in this key are executed at the end by Elastic Beanstalk, after the application and web server have been set up.
There is one little more thing to pay attention. The previous code will generate the following structure in your bucket:
However, if you add a slash in the s3cmd command (for instance, you replace “public/images” by “public/images/”, s3cmd will not upload the folder but only the files within, so the bucket will look like this:
Finally, you may ask yourself what the “leader_only” option means. It allows to say to Elastic Beanstalk: “hey, if my load balancer has fired multiple instances, only do execute those commands in one instance only”. It would indeed be useless to upload the exact same files multiple times :)…
4. Add Gzip compression
One drawback of this approach is that we cannot take advantage of the built-in Gzip compression while using Apache, for instance. Amazon S3 does not automatically gzip content, so we must compress our files manually. Because we are doing it on our own, we can compress them with the best compression. We are going to automate this process using some more commands:
container_commands: 01a_gzip_stylesheets: command: gzip -r --best public/stylesheets leader_only: true 01b_rename_stylesheets: command: rename .gz '' `find public/stylesheets -name '*.gz'` leader_only: true 01c_s3_upload_stylesheets: command: /home/ec2-user/s3cmd-1.5.0-alpha3/s3cmd put -r public/stylesheets s3://my-bucket/assets/ --acl-public --no-check-md5 --add-header "Content-Encoding:gzip" --config /home/ec2-user/s3cmd-1.5.0-alpha3 leader_only: true
We splitted the previous command in three. The first one recursively compress the files using gzip tool, using the best compression possible (–best option). The problem is that gzip tool will append the “.gz” extension to the file, and we would like to avoid that, so that we can use the same name for both development and production. The rename command does that.
Then, we simply upload the files as previously, except that we add a new header using the –add-header option. The “Content-Encoding:gzip” option will be used by the browser to unzip files.
Also notice the –no-check-md5 option. I realized that when gzipping content, files were re-uploaded every time to S3, even if they didn’t change. As a consequence, S3 created a new ETag for those resources, and it may force some users to re-download the resource instead of using the one they had in their browser’s cache. I suppose this happened because the md5 checksum that is computed to decide if the file should - or not - be re-uploaded, take into account the date. As it is gzipped on the fly, the date is always different, hence the hash. With this option, s3cmd no longer compute a md5 checksum, but only checks the file size (note that this can be a problem, sometimes).
I’ve found this way VERY useful for deploying. Even more useful when you consider that your filename may change often, or if you’re afraid of uploading an asset when deploying. The introduction of those config files to Elastic Beanstalk are indeed very powerful (but very time consuming, because the YAML file is not easy to write, it needs a lot of tries before achieving exactly what you want). On the other hand, you don’t need to create and maintains your own AMI just to install a few tools.