I cofounded a business and spent about 4 years developing a web based product between 2013 and 2017. In this post, I’m writing down my thoughts on the technologies and development practices used for this product, as a kind of retrospective and notes to myself.
One of my goals was to keep a lid on complexity due to the extremely constrained resources (at the start, I was the only developer, although later the team grew to four developers). This dictated a lot of the choices that I made, and I’m glad to say that most of them paid off.
After making a brief start with Ruby on Rails, I realised that all the server would be doing was providing a JSON API to clients implemented as single-page web apps. This meant that Rails had a lot of unnecessary complexity and too many batteries included. Node.js seemed like a lighter alternative, with the added benefit of being able to use a single language (CoffeeScript) throughout the code base.
I switched to Node, and despite having to learn it on the fly, while delivering prototypes to show to the potential customer, I didn’t regret the switch. The combination of Node + Express.js provided a simple solution, worked really well, and was scalable on Heroku.
There are many database choices these days, however I’d had positive experiences with Postgres so I chose to use it again. I never had to wonder whether this was the right choice. I think the combination of Postgres with PostGIS is still the best solution for GIS data.
There are some intriguing choices of extensions and DBs built on top of Postgres specifically for time series data. They weren’t available when I started the project, but anyway, we’d be able to migrate if we needed to.
We had some performance issues as the volume of data we handled increased, but they were primarily driven by the lack of engineering resources and resource constraints caused by Heroku rather than inherent limitations of Postgres. Some of the issues got solved quite easily with a bit of query plan analysis and query optimisation.
The bulk of my experience up to that point consisted of not using ORMs: it was either query builders or plain SQL. More recently, I used ActiveRecord and of course various ORMs were quite popular at that point.
However, in the interest of keeping things simple, I decided not to use an ORM, and instead write plain SQL and do as much work as possible in the database (this was facilitated by choosing Postgres which is extremely capable). The migrations were also written as plain SQL.
This also turned out to be a success. It allowed me to take a good chunk of complexity out of the system. I just had a few simple functions to convert results from Postgres into JSON (so most endpoints were basically a single function call with a query and a few params), provide transaction support etc.
CTEs go a long way towards keeping queries readable - we had queries in the range of 100-200 lines (I’d say 200 lines is the upper limit for a single query, it becomes too complex to deal with otherwise). I think the corresponding CoffeeScript would have been more lines, and additionally, doing all the work in a single query meant we only needed a single roundtrip to the database per request.
We used PostGIS a lot, and dealing with that via ORM would have been painful. We could rely on Postgres support for time zones without jumping through any hoops. When a query would start becoming too complex, we could break it up into pieces and move some of the calculations into application code.
Not having an ORM also made it easy to add support for DB followers and to route queries to the master or the follower as we pleased. I suspect an ORM would make us jump through hoops to achieve it.
I also discussed ORMs in another post.
I can’t remember how prominent the alternatives were. Koa was probably around. Anyway, Express.js worked fine for us. At some point we transitioned from 3.x to 4.x and it was mostly painless. The conceptual simplicity of Express meshed well with the rest of the project. I still quite like how few concepts are introduced in Express.
For the most part, it worked as promised, so I still subscribe to the idea of abstracting away the computational resources as a way of freeing up developer time when it’s the primary constraint. Sure, it may get expensive as the product grows, but hiring developers is also expensive!
One limitation I found was that DB performance on Heroku was a lot worse than I’d expected. I had to add a follower DB and optimise queries a lot earlier than I would have liked. On the other hand, having essentially zero devops was a huge plus as we could mostly focus on development; logging and monitoring integration was also easy.
Nowadays, the computational resources can be abstracted even more via things like AWS Lambda, so that would definitely be worth exploring.
I initially looked for existing Express middleware but it was all awkward, needed to support Postgres, created a session table with different naming conventions etc. In the end, realising that my requirements for sessions were actually quite simple, I wrote my own implementation, and it worked just fine.
I remember spending some time reading up on possible API versioning schemes (should the version be in the URL? In the headers? Before or after specifying the resource? How does it align with REST?). I chose to go with a global (ie not per resource) version in the URL and it didn’t cause any major issues over the course of a few years.
My implementation had potential downsides because the way I did it was to copy all the routes and corresponding handlers when adding a new version, and then start changing them in the new version. But the rest of the code was shared, which could become an issue (although in practice it didn’t).
It sounds quaint now, but when I started the project, the
async library was quite popular; it provided various ways of combining asynchronous function calls. Despite that, I chose the newfangled promises, specifically the Bluebird library.
I chose to go with promises because of their composability. Luckily, Bluebird is an amazing project that has been maintained and developed over the years, and has excellent documentation.
One weird issue that I had with promises somehow acquiring an ever-growing list of
then handlers and subsequently causing a memory leak was fixed by upgrading from Bluebird 2.x to 3.x.
The mobile client we built had to work offline, which meant that it needed to cache data. The amount of data was significant for cellular connections, so I had to send partial updates whenever the server data changed. The initial implementation relied on sync timestamps and was of course buggy (see my PostgreSQL book for a discussion of why that approach doesn’t work). I had to introduce extra checks based on hashing IDs and record timestamps. If client hashes didn’t agree with the server hashes, the server would send the full dataset. This held up well due to the nature of data changes: the individual records could contain a lot of data, but updates were infrequent. However, this wasn’t a foolproof, scalable solution (and I didn’t get to implement one).
The unfortunate thing about jQuery Mobile was that its development stalled some time after 2012. On the other hand, it didn’t have a lot of issues and mostly got out of the way and provided a bunch of useful widgets which we combined into a decent looking UI. I wouldn’t use it on a new project, and the migration would be painful, but on the other hand, in 2017 there still wasn’t a lot to migrate to unless you wanted to buy into a framework like React which is a whole lot more than just a set of UI widgets.
To begin with, I used Backbone as a way of structuring UI, because jQuery by itself was way too messy and at the time Backbone was still fairly popular. It was good because it provided the concept of views, although later it got phased out.
Based on prior experience, I chose to use SCSS rather than plain CSS. It worked as advertised and didn’t cause any trouble. A+, would use again.
Back when we started, Mapbox was tiny and virtually unknown. But it did what we needed (in particular, it allowed us to upload our customer’s drone imagery and overlay it on the map), and it was cheaper than Google Maps. It acquired a lot more features over the years, although Mapbox marketers were definitely prone to overhyping barely-alpha API releases. We started moving to WebGL maps when they became available because it was the only pathway towards showing really large amounts of data (thousands of points, multiple megabytes when represented in KML format).
Back in 2009, I concluded that an SPA was architecturally superior to the traditional method of serving separate HTML pages from the backend; I still think so. While there is overhead in terms of having a separate framework and a separate app architecture on the front end, overall it makes managing state easier, allows for more complex UIs, sidesteps latency issues (relevant when hosting an app on Heroku for customers in Australia and New Zealand) and generally makes a broader range of functionality possible. Working offline is one example - it was an essential capability for our mobile client.
Initially I was quite pleased with Cordova as it was more straightforward compared to Steroids - a framework on top of Cordova which I’d started out with. I developed a number of reservations about Cordova after working with it for a few years (see this post) but overall it served its purpose of allowing us to develop an application that would work on both iOS and Android with the smallest possible development effort, and being able to reuse code between the web and mobile applications.
There were a few options for “hybrid” apps at the time but I’m pretty sure that Cordova was the best choice as it was the most popular and the most mature, and it’s still being developed and used.
This held up well and was a very simple solution (don’t underestimate the benefits of simplicity!). However, a “proper” solution would have been to use SQLite, and it would have advantages when we started doing more complicated things with the data. I could have used SQLite from the start, but the plain files were a holdover from the prototype phase that stayed because the advantage of SQLite would be limited.
For a while, we used Underscore.js, until I saw a talk about Ramda and was convinced that it was a more powerful tool. The premise of Ramda was that Underscore.js functions took arguments in the wrong order, and Ramda fixed that. The argument order actually goes hand in hand with currying, because without partial application there would be no way to realise the benefit of the argument order. It seems like a small thing, but it made a big difference, because it allowed me to compose functions and build pipelines of data transformations easily, which eventually led to a broader transition towards functional programming throughout the code base, which in turn allowed us to solve fairly complex issues with a small amount of code and a fairly small amount of debugging and bug fixing.
This change happened later in the game, mostly because Redux didn’t appear on my radar until 2016, I think. The change was motivated by developing a complex piece of UI which allowed replay of recorded data with filters, different speeds, a slider to jump to different times of day etc. It was daunting to implement it without some kind of state management system, so this was a great opportunity to try out Redux. Redux made the UI surprisingly robust - I was able to keep adding features with hardly any breakage.
Both Ramda and Redux allowed us to write code in a more functional style, and to dream about having a fully FP code base. The gradual move in that direction was motivated by the complexity of managing state and the resulting bugs.
We started realising benefits in complex reports and complex UI; both Ramda and Redux made the code easier to manage. At one point, I was able to develop a report which was roughly twice as complex as the previous one, and I did it with very few bugs - practially all the tweaks were about refining the requirements, not bug fixes. It was a real surprise to me (a pleasant one!).
While the technology choices I made worked out quite well, overall it wasn’t smooth sailing at all! Other technological stuff where there was no choice caused significant problems. For example, we struggled with keeping our app running in the background on iOS a lot. The Bluetooth beacons that we used also didn’t perform anything like the marketing spiel suggested. Bluetooth didn’t work properly on some Android phones. Some cheap Android phones would even report an invalid GPS position. I could go on!
However, in terms of decision making, this experience reinforced my belief in the importance of limiting complexity - not just when writing code, but in a broader sense: when choosing technologies to use, and even when setting up development processes or setting the direction of the product.
I also remained in favour of PaaS products as a way of saving developer time, I kept my faith in the capabilities of PostgreSQL (I later wrote a book based on my work with Postgres), and I became very interested in functional programming (to the extent of writing a book about it).