16 January 2017

Adventures in static site generation

Here's a fact: A static web page like 'about-me.html' is stored in a simple text file so when a browser requests it from the web server it just has to open the file and send the contents back. It's a pretty fast operation. But a dynamic page like 'about-me.php' contains code that needs to be pre-processed so before the server responds so it has to give the file to another piece of software, PHP, which takes a little bit of time to create the result.

Static site generation (which is very trendy at the moment) is the idea that if the content of your website's pages will not be changing every few minutes and it also does not need to be personalised with specific details to different users then it is more efficient to occasionally generate your pages as a bunch of static files and save them rather than calling on each request as would happen with dynamic ones. (Spoler alert: This turns out not to be so true in the grand scheme of things. )

To be clear, the scenario we are hoping to improve on is when thousands of requests for a page are coming in every minute from different browsers and all of them -- every single request -- causes PHP to fire up, do a bunch of tasks only to deliver the exact same result. It's a waste of time that we can and should eliminate.

Now I have been clear, lets use a convoluted real-world analogy...

[scroll to next set of square brackets to skip]

Lots of browsers all want a page (imagine the browsers are hungry customers queuing at a small town food stall). The browser tells the web server (picture the web server as a snooty French restaurateur with a waxed mustache) what file (the files are menu items) she wants and the server claps his hands and shouts "Wait! Everyone stop! Zis file requires a pre-processor and therefore we must individually craft what I presume will be some magnificent unique creation especially for you!" and hands the task over to PHP (imagine PHP is a melodramatic 90s magician with a tuxedo and a wand, like before the new breed of street magicians came along with their tattoos, ripped jeans and shaky camera shots).

So PHP struts around a table, waving his hands evocatively over objects concealed under handkerchiefs (yes the analogy is a stage act in a food stall, it's intentionally ridiculous alright, just run with me) and starts by opening a header file that hasn't changed for 2 years. He glues it to an equally old template file, before connecting to a database to extract a paragraph that the site owner hasn't edited for 5 months. He then pulls a list of 15 stock items that are still the same as when the company was acquired, which it then glues to a footer which only changes on the 1st January when the [pointless and legally toothless] year next to the [equally benign] copyright symbol increments by 1 and Voila!, PHP finishes by handing the exact same bloody product back to the web server that it has made for the past 100,000 browser requests.

The web server -- now desperately forcing a smile in a tragic attempt to hide his embarrassment -- passes the hindering clone to the patient browser and the browser glares, unimpressed, directly at the server, claps her hands slowly and says "Great. Well done. Very impressive. Thanks for keeping me waiting. Because you couldn't have just saved a copy of the identical one you made for the previous customer or indeed the identical one you made for me, yesterday? No. You had to make me stand here and waste a precious portion of my 30-minute lunch break, and by the way, when did lunch breaks stop being an hour? What vile spreadsheet-groomer ignored the results of the 7 billion modern studies about employee productivity and decided to implement a Dickensian system of corporate timekeeping that I can't conform to because I have to watch your precious pre-processor prance about before giving me my file! You've wasted my time and I hate you, and I appreciate you're French and everything, and I'm sorry about Brexit, I mean I voted remain so I'm not sorry like personally, just, ya know, like sorry on behalf of the UK, or at least 48% of us, anyway the point is I don't hate you because you are French, but you are a dick and I do hate you. Good bye....  probably see you tomorrow.".  The lunch break represents the 1 second limit that the employer, who is Google, uses to test web servers against when deciding whether to include them in it's list of recommendations or not (the list of recommendations is the site's position in search results).

This situation is utter nonsense! (And not just because of my bloated analogy) By employing PHP's arduous routine on every order the web server is wasting the browser's time and damaging it's reputation. (see, it all came together in the end *smug face*)

[end jokes, begin serious voice]

So why do most sites do it like this?

Most web developers learn HTML and CSS first. They learn it by building static files like homepage.html and style.css. Then, their first use of a pre-processor like PHP is for it's ability to conveniently enable them to take a repeated element that is common to all 50 pages, such as a page header with a logo and menu, define it in a single file and automatically include that element inside all pages. This way, if they update the header, they only have to do it in 1 location instead of trawlling through 50 separate pages. They've eliminated the risk of accidentally missing a page and breaking consistency across the site. That's big win number 1.

The next super helpful feature of a pre-processed page is it's ability to load text content into pages from a database. This allows them to build a private content management system (CMS) where the administrator (usually the client) can edit the words on their website. Win number 2.

Those 2 techniques are so obviously useful the developer starts using them everywhere and becomes conditioned to think every page must be a dynamic (pre-processed) page. They immediately categorise .html files as old fashioned and .php files as the best way (Anecdote: I was explaining static site generation to a dev the other day and he genuinely responded to .html files as "Oh, like the old fashioned way?").  So, out blossoming developer discovers PHP (or any of the other server-side pre-processors) which appears to give convenience and consistency at no cost; free power! And it is free. Sort of. Well, it's not free. It costs time... it costs milliseconds on each page load.

As a generalisation, from my experience, even a well written PHP script can take twice as long as the same page being served from a standard .html file. On a reasonably complicated site this could be adding about hundreds of milliseconds to a server response time.

But what is 100 milliseconds really?

  • The sub-conscience perceives it [citation needed, I think it's in Don't Mak Me Think]. So it's the difference between users perceiving your site (and by association your business) to be fast and responsive or slow and unresponsive. Slow loading frustrates users which is bad because happy users buy more [citation needed, I think that's in Emotional Design].
  • Google always prioritises user experience [citation needed], so it's bots assess server response time when ranking pages. Therefore 100 milliseconds is the difference between your site sitting in the top spots in the search results or your competitors site occupying them (assuming your content is equally relevant).

So in essence, the 100 milliseconds could be critical to your businesses success. But relax, because it's easy to fix.

OK you've convinced me! Now for the love of sweet baby Yeezus how do I fix my own site?!

Rebuild your site's CMS so that when the administrator saves a change it generates the markup that the public see and saves it as static HTML files.

(Note: At this point it is important to point out that most of the stuff out there on the web about static site generation is not content management systems that generate the files up on the live server whenever the admin changes something. Most of the hype about SSGs are talking about the type where you run the generator on your own computer in the comfort of your office, then upload the generated files to your web server. You have to be a developer to do the generation process so it's not appropriate for any site where the client needs to be able to login and change content themselves.)

OK, so your process of building a the site will be remarkably similar: The CMS can still be all dynamic, database-backed pages and that's fine because those pages are only used every few weeks by the administrator on their speedy, desktop computer. Your site's header will still be defined just once in an include file, your pages will still query the database to pull out content, your footer will still use the system date to show a year next to a copyright symbol. The only difference will be that the functions of the CMS that update a piece of content in the database when an administrator saves it will now call another new function which will iterate over a list of pages, executing them and writing the result to a static file that the public see.

Put your money where your mouth is Martin! 

I recently experimented with an old website that I originally built for a client back in 2010 and it's not changed since. I made the small change to the CMS to execute the PHP pages and save the output in the public folder, thus generating a static version of the site. The pages were not particularly slow to respond in the first place, about 50 milliseconds, so this exercise was not about significant gains. It was just an exercise in building a static site generator and releasing it into the wild.

It took me less than a day to do the conversion and I halved the server response time. Chuffed with the progress, I committed my changes, uploaded them to the live server and started writing this blog post to preach the benefits of static site generation.

I wanted some screenshots of load times for comparison so I temporarily put one of the old PHP versions back up and used Chrome developer tools to measure the timings. To my surprise the static files were actually slower. The .html files took 37 milliseconds to load but the PHP equivalent only took 23ms! What the hell was going on?

So static site generators don't work yeah?  

Now I'll be honest with you, I didn't realise my hosting was running xVarnish. Sadly, in hindsight I realise Varnish had always been a bit of a dirty word to me. I first heard it used by a hacky developer to incorrectly justify a lazy decision. I asked why we were including 12 separate quite large .js files in the head of a site and their response was "It's OK, we'll just put Varnish in front of it". You don't have to understand why that answer is incorrect but just trust me, it's really not.

Unfortunately that experience had caused me to steer clear of it, holding the opinion that Varnish was a technology used only by cowboys to patch up bad work. That was wrong of me.

So I am digging around the page headers looking for answers to why responses from pre-processed files were being returned faster than static files and spotted the line "X-Cache: MISS" on the response from the .html file but "X-Cache: HIT" on the .php file. I facepalmed and promptly did some reading.



I hadn't chosen to install xVarnish and was surprised to learn my sites were using it. All xVarnish does is cache the resource in memory after first execution so that when subsequent identical requests are received it need not process the file again, it doesn't even have to access the physical disc! But Varnish was not caching a resource when coming from a static file so requests to .html pages did have to access the disc and thus were taking longer.

Essentially any win I had made by using static files was being undermined by the existence of an in-memory cache that was doing an even better job of solving the problem. I'd been busy improving the aerodynamics around the wheel arches but hadn't noticed my competitor had fitted a nitrous oxide system. Or if you don't like motoring metaphors: I'd been so busy trying to find a less heavy pair of jeans I hadn't noticed my competitor was naked. Or if you want it a bit more Zen: He who looks too close at the size of the fallen leaves, fails to see the stream upon which they float.

Summary: Static site generation is still good but in context

You gotta make sure the static files are benefiting from all the bells and whistles that your server is providing to the dynamic ones. If your Varnish settings are not bothering with .html files you gotta sort that out before anything else!

Moral of the story: Measure in the live environment regularly. Don't assume tech trends are necessarily good for your particular case. Also, keep your local dev environment as close to production as possible.

Side lesson: If you hear a bad person mention a tool or technology, research that technology objectively and try not to let your opinion of the thing they introduced you to become associated with your opinion of them as a person.

But above all, keep your blog posts short.