Michaël Gallego

This is my blog. What can you expect here? Well... Zend Framework 2, Amazon AWS...






Automatically upload assets to S3 when deploying to Elastic Beanstalk

This article follows my previous one about safely deploying sensitive-information to Elastic Beanstalk. Today, we will see how we can deploy your assets to S3 automatically.

Most of the time, your Zend Framework 2 application will contain a public folder that contains not only your index.php file, but also several folders like images, javascripts, stylesheets… Those files are static by nature, so it’s a good idea to serve them using S3 (or even better, a CDN like Amazon CloudFront), instead of hitting your server whenever a user request such files.

However, it tended to be quite complicated. The easiest solution is to simply open a FTP, manually upload all the files to S3, and using them in your code. But let’s say you update a line of JS, or add a new image. You need to think about it, open again your FTP, upload it. That sounds like it could be automated.

Hopefully, it can.

There is a nice little tool called “s3cmd”, which can be installed pretty easily, and that let you upload, delete, retrieve… any files to Amazon S3.

1. Allow your instance to upload files to S3

The first step is to allow your EC2 instances (which are created by Elastic Beanstalk) to interact with Amazon S3. The simplest and most secure way is to add an instance role when you create your Elastic Beanstalk environment, and make sure you gave this instance role the right to write to your bucket on S3. You can learn more about EC2 instance role here.

2. Install S3CMD

As for the first article, we are going to make use of configuration file to customize your instance without creating a new AMI. Add the following line in your .config file (remember it must be in a .ebextensions folder, at the root of your project):

  /home/ec2-user: http://downloads.sourceforge.net/project/s3tools/s3cmd/1.5.0-alpha3/s3cmd-1.5.0-alpha3.tar.gz

The sources key allow to download a package, and unzip it. Here, I decided to unzip it in “/home/ec2-user” folder. The link may not be the cleanest one, but it’s the only one I’ve found. Also note that you must use AT LEAST the 1.5.0-alpha1 version (which is the first one that take advantage of instance role).

3. Upload the files

Here is what my public folder looks like:

            // tons of small images

You may have a simpler structure than me. But my structure outlines some interesting things: I’m using SASS to generate CSS, and therefore, I’d like to be able to upload everything in the /javascripts folder, everything in the /stylesheets folder, ignore the /sass folder (people don’t need to have access to those). Furthermore, I’d like to upload only the top files in the /images folder (here, the sprite_fjfrhbbv464.png and background.jpg files). All the files in the /sprite sub-folder were used by SASS to generate the optimized sprite, therefore I don’t want to make them publicly available (hey, you understand why I don’t want to do this manually ? :D).

We are going to add three commands in our .config file:

    command: /home/ec2-user/s3cmd-1.5.0-alpha3/s3cmd put -r public/stylesheets s3://my-bucket/assets/ --acl-public --add-header Cache-Control:max-age=290304000 --config /home/ec2-user/s3cmd-1.5.0-alpha3
    leader_only: true
    command: /home/ec2-user/s3cmd-1.5.0-alpha3/s3cmd put -r public/javascripts s3://my-bucket/assets/ --acl-public --add-header Cache-Control:max-age=290304000 --config /home/ec2-user/s3cmd-1.5.0-alpha3
    leader_only: true
    command: /home/ec2-user/s3cmd-1.5.0-alpha3/s3cmd put -r public/images s3://my-bucket/assets/ --exclude sprite*/* --acl-public --add-header Cache-Control:max-age=290304000 --config /home/ec2-user/s3cmd-1.5.0-alpha3
    leader_only: true

This code deserves some explanation. First, it’s inside the “container_commands” key. Commands in this key are executed at the end by Elastic Beanstalk, after the application and web server have been set up.

The first and second commands are simple: they upload the whole javascripts and stylesheets folder in the /assets/ sub-folder of the bucket called “my-bucket”. The third command upload everything in the images folders, except everything that is in sprite folder (I’ve even added a wildcard in sprite name, in case I have a retina sprite folder, like sprite-2x or something like that). Very powerful ! Now, only the top-level images in the images folders are uploaded. Notice the –acl-public option, which is needed to make your files public (and hence accessible to the outside world). We also add a very far Cache-Control header, so that all those assets are cached for 1 year (you can change the values, but I recommend you keeping very high value, and use techniques like query parameters to force invalidation).

There is one little more thing to pay attention. The previous code will generate the following structure in your bucket:

        // ...
        // ...
        // ...

However, if you add a slash in the s3cmd command (for instance, you replace “public/images” by “public/images/”, s3cmd will not upload the folder but only the files within, so the bucket will look like this:

        // ...
    // the image files
        // ...

Finally, you may ask yourself what the “leader_only” option means. It allows to say to Elastic Beanstalk: “hey, if my load balancer has fired multiple instances, only do execute those commands in one instance only”. It would indeed be useless to upload the exact same files multiple times :)…

4. Add Gzip compression

One drawback of this approach is that we cannot take advantage of the built-in Gzip compression while using Apache, for instance. Amazon S3 does not automatically gzip content, so we must compress our files manually. Because we are doing it on our own, we can compress them with the best compression. We are going to automate this process using some more commands:

    command: gzip -r --best public/stylesheets
    leader_only: true
    command: rename .gz '' `find public/stylesheets -name '*.gz'`
    leader_only: true
    command: /home/ec2-user/s3cmd-1.5.0-alpha3/s3cmd put -r public/stylesheets s3://my-bucket/assets/ --acl-public --no-check-md5 --add-header "Content-Encoding:gzip" --config /home/ec2-user/s3cmd-1.5.0-alpha3
    leader_only: true

We splitted the previous command in three. The first one recursively compress the files using gzip tool, using the best compression possible (–best option). The problem is that gzip tool will append the “.gz” extension to the file, and we would like to avoid that, so that we can use the same name for both development and production. The rename command does that.

Then, we simply upload the files as previously, except that we add a new header using the –add-header option. The “Content-Encoding:gzip” option will be used by the browser to unzip files.

Also notice the –no-check-md5 option. I realized that when gzipping content, files were re-uploaded every time to S3, even if they didn’t change. As a consequence, S3 created a new ETag for those resources, and it may force some users to re-download the resource instead of using the one they had in their browser’s cache. I suppose this happened because the md5 checksum that is computed to decide if the file should - or not - be re-uploaded, take into account the date. As it is gzipped on the fly, the date is always different, hence the hash. With this option, s3cmd no longer compute a md5 checksum, but only checks the file size (note that this can be a problem, sometimes).


I’ve found this way VERY useful for deploying. Even more useful when you consider that your filename may change often, or if you’re afraid of uploading an asset when deploying. The introduction of those config files to Elastic Beanstalk are indeed very powerful (but very time consuming, because the YAML file is not easy to write, it needs a lot of tries before achieving exactly what you want). On the other hand, you don’t need to create and maintains your own AMI just to install a few tools.