Generating elm/json decoders with elm/json decoders

I’ve recently released json2elm, a tool which allows you to generate elm/json JSON decoders and encoders (along with relevant type definitions) from a sample JSON string.

A curiously self-referencing thing about json2elm is that it itself uses elm/json decoders to generate elm/json decoders. How does it work?

The thing about elm/json is that it allows you to parse JSON into arbitrary data structures. In my case, I need to parse arbitrary JSON strings into a data structure. This means that I need to come up with a data structure which can represent the structure of any JSON string, rather than anything specific like User or StockItem.

The JSON specification in RFC 8259 is a good place to look for a most generic description of a JSON value. In there, it’s defined as:

value = false / null / true / object / array / number / string

So there are four primitives (null, Boolean, Number, String) and two structures (Array and Object) which can in turn contain JSON values.

I can represent this as a custom Elm type:

type JsonValue
    = JString String
    | JFloat Float
    | JBool Bool
    | JNull
    | JList (List JsonValue)
    | JObj (List (String, JsonValue))

Note that this type is defined in terms of itself, which is fine for a custom type.

Correspondingly, I can define an elm/json decoder to produce a JsonValue:

import Json.Decode as Decode

jsonDecoder : Decode.Decoder JsonValue
jsonDecoder =
        [ JString Decode.string
        , JFloat Decode.float
        , JBool Decode.bool
        , Decode.null JNull
        , JList <| Decode.list <| Decode.lazy (\_ -> jsonDecoder)
        , JObj <| Decode.keyValuePairs <| Decode.lazy (\_ -> jsonDecoder)

Decode.oneOf will try a list of decoders given to it one by one until it finds one which succeeds. So here, instead of decoding a specific thing like a user name, I’m giving oneOf a list of decoders which addresses all possible JSON values (and matches the JsonValue type). is used to apply a function to the decoding result, which I’m using to turn these results into JsonValue values with the help of appropriate constructors.

In the last two decoders for lists and objects, I have to use jsonDecoder recursively because a list or an object can in turn contain any kind of JSON values. Since a value isn’t allowed to be defined directly in terms of itself, I have to wrap jsonDecoder with Decode.lazy (\_ -> jsonDecoder).

It’s all I need to start parsing JSON strings:

sample = """
[ 1, "str", { "id": 1, "name": "Pete" } ]

main =
    case Decode.decodeString jsonDecoder sample  of
        Err err -> 
            div [] [ text <| Decode.errorToString err ]

        Ok tree ->        
            div [] 
                [ text <| Debug.toString tree

-- Output: JList [JFloat 1,JString "str",JObj [("id",JFloat 1),("name",JString "Pete")]]

The next step is to turn these kinds of values into bits of Elm code which describe decoders. For the sample JSON in the above example, assuming we call the top level value MyList, the decoders should look something like this:

decodeMyList : Json.Decode.Decoder (List MyList)
decodeMyList = 
    Json.Decode.list decodeMyListMember

decodeMyListMember : Json.Decode.Decoder MyList
decodeMyListMember = 
        [ MyList0 <| decodeEntity
        , MyList1 <|
        , MyList2 <| Json.Decode.string

decodeEntity : Json.Decode.Decoder Entity
decodeEntity = 
    Json.Decode.map2 Entity
        (Json.Decode.field "id"
        (Json.Decode.field "name" Json.Decode.string)

As the JsonValue type is recursive, it means that it represents a tree. If you look closely at the decoders above, they mirror the tree structure in the decoded JSON:

  • decodeMyList is produced from the root node,
  • decodeMyListMember deals with the first level of child nodes, one of which happens to be an object
  • decodeEntity decodes this object (this is a second-level child node)

So essentially, by traversing the tree recursively and outputting specific strings for each kind of node, I can produce the decoders.

The variants in JsonValue map easily to the elm/json decoder names. What remains is dealing with heterogeneous array (by decoding them to custom types) and a lot of name management: in the Elm code above, there are type names, constructor names, decoder value names and argument names.

The input JSON provides limited information for naming: aside from field names in objects, we can get index values for items in arrays, but that’s it. Consider how we could name the decoder for the last address in this JSON example:

    "account": {
        "id": 1, 
        "users" : [
                "name": "abc",
                "alias": "def",
                "addresses": [
                        "num": 1, 
                        "street": "High St", 
                        "postcode": 5024
                        "num": 15, 
                        "street": "Low St", 
                        "postcode": 2346

Using field names and array indexes, we can construct the following chain to refer to the last address:

["account", "users", 0, "addresses", 1]

This can then be turned into a decoder name decodeAccountUsersObjectAddressesItem, where Object and Item are obtained via Array.get index indexNoun from an array of nouns like “item”, “instance” and so on. The reason for this somewhat unwieldy naming convention is to avoid name clashes. For example, multiple objects can have a “users” field, so using simply decodeUsers is going to be problematic.

The rest of the implementation boils down to a lot of string manipulation to produce nicely formatted Elm code, which I’m not going to go into in this post as it isn’t very interesting.

Would you like to dive further into Elm?
📢 My book
Practical Elm
skips the basics and gets straight into the nuts-and-bolts of building non-trivial apps.
🛠 Things like building out the UI, communicating with servers, parsing JSON, structuring the application as it grows, testing, and so on.
Practical Elm