Asset pipeline craziness
Today I wanted to add two more servers to our Rails cluster. This is still a manual process. We’re in the process of automating it but are not quite there yet.
However, after adding the servers to the load balancer, we soon noticed that some requests were unable to load our main JavaScript file, leading to broken pages. Surprising! This used to work fine in the past.
It turned out that the old and new servers disagreed on the fingerprints for the JS files. These need to be the same on all machines in the cluster, because a page loaded from one server can easily end up fetching its assets from a different server.
OK, so time to start digging. Why are the fingerprints different?
Here is the newest fingerprinted application.js from the old app server:
[ec2-user@ip-10-0-29-160 assets]$ ls -lat *.js | head -n 1
-rw-rw-r-- 1 deployer deployer 1027991 Oct 20 17:23 application-7e8b5168ffba576e9ae1c122ae51b056749021bf45567aac042cead13f69a2cc.js
And from the new server:
[ec2-user@ip-10-0-7-251 assets]$ ls -lat *.js | head -n 1
-rw-rw-r-- 1 deployer deployer 1027991 Oct 20 17:23 application-d4b406639785037a41d60173f525ee1491e3a80e90697de4518a010c4c700662.js
Notice that the file size is exactly the same, but the contents are
different. I verified with openssl sha256 <filename>
that the
fingerprints calculated by the Rails asset pipeline were correct.
So what is the difference?
Minified JS is nasty to compare, so to narrow this down, I found the smallest file with differences. Here are the full contents of both versions of that file:
(function(){$(function(){var t,e,i,n,o,s,r,a,l,u,c,h,d,p,f,m,g,v,y,_,b,w;retur
n g=$(".js-slider"),a=$(".js-daily-rate"),h=$(".js-price-hourly"),n=$(".js-pri
ce-daily"),y=$(".js-price-3-days"),w=$(".js-price-weekly"),_=$(".js-price-two-
weeks"),c=$(".js-price-four-weeks"),l=$(".js-rental-prices-form"),v=$(".js-ren
tal-terms"),b=a.attr("value"),p=g.data("min"),d=g.data("max"),i=g.data("cutoff
"),s=g.data("factor"),e=new UserCurrencyResolution(window.countries,window.cur
rencies),t=e.getCurrency(currentUser),m=function(t){return t>i?i+(t-i)*s:t},o=
function(t){return t>i?i+(t-i)/s:t},u=function(t){return MoneyHelper.format(10
0*t)},f=function(t){return h.html(u(t.one_hour)),n.html(u(t.one_day)),y.html(u
(t.three_days)),w.html(u(t.one_week)),_.html(u(t.two_weeks)),c.html(u(t.four_w
eeks))},r=function(t){return this.apiClient=new Gomore.Api.Client,this.apiClie
nt.rentalAdPeriodPrice({rentalAdId:window.rentalAdId,price:t,callback:function
(t){return f(t.data)}})},g.slider({min:p,max:o(d),value:o(b),step:100*t.attrib
utes.step,selection:"true",formater:function(t){return Math.round(m(t)/100)}})
.on("slideStop",function(t){return a.attr("value",m(t.value)),r(m(t.value))}),
r(b),l.on("submit",function(t){return v.length&&!v.is(":checked")?(t.preventDe
fault(),alert(v.data("validation-message"))):void 0})})}).call(this);
(function(){$(function(){var t,e,i,n,o,s,r,a,l,u,c,h,p,d,f,m,g,v,y,_,b,w;retur
n g=$(".js-slider"),a=$(".js-daily-rate"),h=$(".js-price-hourly"),n=$(".js-pri
ce-daily"),y=$(".js-price-3-days"),w=$(".js-price-weekly"),_=$(".js-price-two-
weeks"),c=$(".js-price-four-weeks"),l=$(".js-rental-prices-form"),v=$(".js-ren
tal-terms"),b=a.attr("value"),d=g.data("min"),p=g.data("max"),i=g.data("cutoff
"),s=g.data("factor"),e=new UserCurrencyResolution(window.countries,window.cur
rencies),t=e.getCurrency(currentUser),m=function(t){return t>i?i+(t-i)*s:t},o=
function(t){return t>i?i+(t-i)/s:t},u=function(t){return MoneyHelper.format(10
0*t)},f=function(t){return h.html(u(t.one_hour)),n.html(u(t.one_day)),y.html(u
(t.three_days)),w.html(u(t.one_week)),_.html(u(t.two_weeks)),c.html(u(t.four_w
eeks))},r=function(t){return this.apiClient=new Gomore.Api.Client,this.apiClie
nt.rentalAdPeriodPrice({rentalAdId:window.rentalAdId,price:t,callback:function
(t){return f(t.data)}})},g.slider({min:d,max:o(p),value:o(b),step:100*t.attrib
utes.step,selection:"true",formater:function(t){return Math.round(m(t)/100)}})
.on("slideStop",function(t){return a.attr("value",m(t.value)),r(m(t.value))}),
r(b),l.on("submit",function(t){return v.length&&!v.is(":checked")?(t.preventDe
fault(),alert(v.data("validation-message"))):void 0})})}).call(this);
Again, same character count. They are almost identical, but there are a few differences in the variable names.
So we run the exact same process on both servers but end up with slightly different results. Scary.
Next step: How were these files created? In our Rails configuration, we
choose the :uglifier
method for JS minification:
# Compress JavaScripts and CSS
config.assets.js_compressor = :uglifier
config.assets.css_compressor = :sass
In our Gemfile
we include the uglifier
gem to make this work:
gem 'uglifier', '~> 2.7'
Looking through the readme for Uglifier, we learn that it is simply a Ruby wrapper for UglifyJS.
So this is JavaScript code. To execute it from Ruby, we use the execjs gem:
gem 'execjs'
But execjs
simply delegates to any of a handful of supported
runtimes. We use
therubyracer:
gem 'therubyracer'
The Ruby Racer is also glue code. It depends on the libv8 gem to actually install the V8 engine.
Which is A gem for distributing the v8 runtime libraries and headers. So, another wrapper. There is an impressive number of layers in this setup…
But of course, Bundler has ensured that both servers have the same versions of all these gems. So how can they disagree on how to minify the JS?
Well, as mentioned, libv8
is a wrapper around the binary V8
distribution, meaning that it has a native component. Could there be
differences in the way these binaries were compiled? Could this be
related to versions of other libraries that happen to be installed on
the servers?
Our provisioning setup is not ironclad. We have Chef scripts to install all the software we need on new instances, but versioning is not strict. For some Yum packages, we simply get the newest version when provisioning a new server.
The old server is literally old - more than a year older than the new one. So the Linux distributions are probably not identical either. Could it be that the new libv8 is compiled against different version of some libraries?
Maybe if I did a prisine reinstall of the gems on both machines, it would work?
Wait. This is the time to stop and think. I’m chasing some subtle (or obvious but evading me) difference in our deployment pipeline, and I do have a strong desire to understand what is going on. But even if I find and fix the problem, it reveals a major point of fragility in our setup.
We use a fairly standard Capistrano configuration to deploy our Rails app. This works by precompiling assets in parallel on each app server. And as my problem today showed, this can sometimes cause issues.
In fact, this process has annoyed me for a while, because our CI server deploys 5-10 times per day, and every time we see a huge spike in CPU usage on all servers. Deploys are slow and users experience higher latency while we deploy. This sucks.
Why don’t we just precompile the assets on the build server and rsync them to the app servers? In fact, why is this not the way everyone precompiles assets?
I have no idea. But I will definitely try to spike this solution. It seems like a relatively easy thing to change in the Capistrano setup, and it would solve two problems for us.
Also, it would be an important step on the way to packaging our Rails app as an archive that can be copied to S3 (like we do for our Clojure app). This is a story for another day, but briefly, it would allow us to create an autoscaling group for the Rails app in which new EC2 instances could be spun up by AWS, fetch the newest code from S3 and “deploy” (mainly unpack) it using a simple shell script.
I’ll be sure to post about that once we get it working.