Generating sitemaps with Elm
2020-02-25

I’m using Elm to create a website, and one thing websites need is a sitemap to help search engines index the content. A sitemap is an XML file with a list of URLs representing all the different “pages” on the site. An entry looks like this:

<url>
  <loc>https://example.com/</loc> 
  <changefreq>monthly</changefreq>
</url>

One option is to update this file manually as the set of pages changes, but this would create duplication (as all page URLs are effectively defined by my Elm code) and it would of course be error-prone (I don’t want to have to remember to update the sitemap).

So I wanted to find a way to generate the sitemap directly from my Elm code. Conveniently, I’ve already defined the page links in a single list for the purposes of displaying navigation:

navigationLinks : List NavItem
navigationLinks =
    [ NavGroup
        { label = text "For drivers"
        , items =
            [ { url = "/", label = text "The app" }
            , { url = "/rent-ev", label = text "Rent an EV" }
            ]
        }
    , NavGroup
        { label = text "For businesses"
        , items =
            [ { url = "/why-support-evs", label = text "Why support EVs?" }
            , { url = "/terms-and-conditions", label = text "Terms, privacy and details" }
            ]
        }
    , NavLink { url = "/contact", label = text "Contact" }
    , NavLink { url = "/licences", label = text "Licences" }
    ]

I’d be using Node to generate the sitemap, which meant that somehow I needed to make this information available to JavaScript.

One option would be to extract navigationLinks into a module, read in the .elm file from JavaScript and parse out the links. However, that seemed a bit messy.

Another option would be to define the links in a JavaScript file instead, and pass them in via flags. This way, I’d be able to import the links into my sitemap generation code. However, then I’d need to write a bunch of extra code to convert flags back into navigationLinks, so this wasn’t ideal either.

What if I could generate the sitemap XML directly in Elm? After all, I already generate HTML from Elm code in Elmstatic.

The XML generation part is easy because the html package can be pressed into service to produce XML tags:

module Sitemap exposing (main)

import Browser
import Html exposing (..)
import Html.Attributes exposing (..)
import Main exposing (navigationLinks)

paths =
    let
        extractUrls navItem =
            case navItem of
                NavLink { url } ->
                    [ url ]

                NavGroup { items } ->
                    List.map .url items
    in
    navigationLinks
        |> List.concatMap extractUrls

urlNode siteUrl path =
    node "url"
        []
        [ node "loc" [] [ text <| siteUrl ++ path ]
        , node "changefreq" [] [ text "monthly" ]
        ]

main =
    Browser.document
        { init = \siteUrl -> ( siteUrl, Cmd.none )
        , view =
            \siteUrl ->
                { title = ""
                , body =
                    [ node "urlset" [ attribute "xmlns" "http://www.sitemaps.org/schemas/sitemap/0.9" ] <|
                        List.map (urlNode siteUrl) paths
                    ]
                }
        , update = \msg model -> ( model, Cmd.none )
        , subscriptions = \_ -> Sub.none
        }

When the Sitemap app is instantiated, it will populate the <body> of the HTML document with the sitemap XML.

But how could this content be extracted written to disk? Here, I reused the approach from Elmstatic, mainly because I already had the code. I could probably just require elm.js and instantiate the Sitemap app, but then I’d need to add ports to the app to get the sitemap out, and it would be extra work.

So here is the Node side of things, sitemap.js, which turned out to be really straightforward:

const Fs = require("fs-extra")
const JsDom = require("jsdom").JSDOM
const { Script } = require("vm")

function generateXml(elmJs, siteUrl) {
    const script = new Script(`
    ${elmJs}; let app = Elm.Sitemap.init({flags: "${siteUrl}"})
    `)
    const dom = new JsDom(`<!doctype html><html><body></body></html>`, {
        runScripts: "outside-only"
    })

    dom.runVMScript(script)
    if (dom.window.document.title == "error") 
        throw new Error(`Error:\n${dom.window.document.body.firstChild.attributes[0].value}`)
    else 
        return `<?xml version="1.0" encoding="UTF-8"?>${dom.window.document.body.innerHTML}`
}

console.log("Generating new sitemap")
const elmJs = Fs.readFileSync("./sitemap-elm.js").toString() 
const newSitemap = generateXml(elmJs, process.env.URL)
Fs.writeFileSync("public/sitemap.xml", newSitemap)

generateXml relies on the jsdom package to execute a JavaScript program consisting of the JavaScript produced by compiling Sitemap.elm together with an extra line that instantiates the app (let app = Elm.Sitemap.init({flags: "${siteUrl}"})).

Then it’s just a matter of reading the contents of <body> from the associated DOM object, and writing it to disk.

Pinging the search engines from Netlify

This part isn’t Elm-related but I thought I’d include it for completeness (and for my own future reference).

It appears that when a sitemap is updated, it’s useful to ping the search engines to let them know of any new pages, by making a GET request like this:

http://www.google.com/ping?sitemap=https://example.com/sitemap.xml

But there is a bit of a problem: according to Google’s documentation, I’m only allowed to do that when the sitemap has actually changed. The script above generates the sitemap but tells me nothing about whether it’s different from before. Besides, as my site is getting deployed to Netlify, I needed to ping the search engines only on a successful deploy. In case of a deployment error, I’d be pinging the search engines with an unchanged sitemap. This meant that I’d need to ping them after the deploy completed, which was a bit of a conundrum because at that point I wouldn’t have access to the filesystem of the build or the previous version of the sitemap (for comparison).

The solution I came up with was to use Netlify Functions which would allow me to execute a bit of JavaScript that would be triggered on a successful deploy.

In order for this additional script to know that the sitemap has in fact changed, in sitemap.js I had to output an extra file, sitemap-updated, when the sitemap was different from the previous version:

console.log("Retrieving current sitemap")
Axios.get(process.env.URL + "/sitemap.xml", { responseType: "text" })
    .then((res) => {
        if (res.data.localeCompare(newSitemap) != 0) {
            console.log("The sitemap has changed, writing sitemap-updated")
            // The presence of this file tells the post-deploy script
            // that it should ping the search engines
            Fs.writeFileSync("public/sitemap-updated", "true")             
        }
        else {
            console.log("Sitemap unchanged")
        }                    
    })
    .catch((err) => {
        if (!R.isNil(err.response) && err.response.status == 404) {  
            // Assume that the sitemap hasn't been generated yet
            console.log("No sitemap on the website, writing sitemap-updated")
            Fs.writeFileSync("public/sitemap-updated", "true")             
        }
        else {
            console.error("Failed to get sitemap: ", err)
            process.exitCode = 1
        }
    })

The final step was to write the post-deploy script (based on a Netlify tutorial):

const Axios = require("axios")
const R = require("ramda")

const contextCondition = "production"
const stateCondition = "ready"
const sitemapUrl = process.env.URL + "/sitemap.xml"
const sitemapUpdatedUrl = process.env.URL + "/sitemap-updated"


exports.handler = (event, context, callback) => {
    try {
        const {payload} = JSON.parse(event.body)
        const {state, context} = payload

        if (state === stateCondition && context === contextCondition) {
            Axios.get(sitemapUpdatedUrl)
                .then(() => {
                    console.log(`Sending sitemap pings for ${sitemapUrl}`)
                    return Axios.all([
                        Axios.get(`http://www.google.com/ping?sitemap=${sitemapUrl}`),
                        Axios.get(`http://www.google.com/ping?sitemap=${encodeURIComponent(sitemapUrl)}`)
                    ])
                        .then(() => {
                            console.log("Submitted sitemap successfully")
                            return callback(null, {statusCode: 200, body: "Submitted sitemap successfully"})
                        })

                })
                .catch((err) => {
                    if (!R.isNil(err.response) && err.response.status == 404) {  
                        // Assume the sitemap is unchanged
                        console.log("Sitemap unchanged, not submitting")
                    }
                    else {
                        console.error("Failed to get sitemap-updated: ", err)
                    }
                })
        }
        else {
            console.log("Conditions not met, not submitting sitemap")
            return callback(null, {statusCode: 200, body: "Conditions not met, not submitting sitemap"})
        }
    } catch (err) {
        console.log(err)
        throw err
    }
}

As a result, the sitemap would only be submitted on a successful deploy and only if it changed from the previous deploy.

Would you like to dive further into Elm?
📢 My book
Practical Elm
skips the basics and gets straight into the nuts-and-bolts of building non-trivial apps.
🛠 Things like building out the UI, communicating with servers, parsing JSON, structuring the application as it grows, testing, and so on.
Practical Elm