Decoding JSON to nested record fields in Elm
2018-01-23

This post has been updated to reflect more recent changes in Json.Decode.Pipeline.

I’m working on a tool that handles PostgreSQL EXPLAIN output in JSON format.

The data consists of a tree of nodes representing different parts of a query execution plan:

Sort on zone_idHash JoinSeq Scan on zonesHashSeq Scan on projects

Each node has a lot of attributes (more than 10), but with a significant portion of attributes common to all nodes.

The large number of attributes led me to use the Json.Decode.Pipeline package because it makes them easier to handle.

First attempt: universal decoder

My first attempt was to have a single decoder that could handle any type of node. This decoder would have a huge number of optional fields which are only present for a specific node type.

import Json.Decode as Decode
import Json.Decode.Pipeline exposing (..)

Decode.succeed PlanNode
    |> required "Actual Loops" Decode.int
    |> required "Actual Rows" Decode.int
    |> required "Actual Startup Time" Decode.float
    |> required "Actual Total Time" Decode.float
    |> optional "Alias" Decode.string ""
    |> optional "CTE Name" Decode.string "" 
    |> required "Local Dirtied Blocks" Decode.int
    |> required "Local Hit Blocks" Decode.int
    |> required "Local Read Blocks" Decode.int
    |> required "Local Written Blocks" Decode.int
    |> required "Node Type" Decode.string
    |> required "Output" (Decode.list Decode.string)
    |> required "Parallel Aware" Decode.bool
    |> optional "Parent Relationship" Decode.string ""
    |> required "Plan Rows" Decode.int
    |> optional "Plans" (Decode.lazy (\_ -> decodePlans)) (Plans [])
    |> required "Plan Width" Decode.int
    |> optional "Relation Name" Decode.string ""
    |> optional "Schema" Decode.string ""
    |> required "Shared Dirtied Blocks" Decode.int
    |> required "Shared Hit Blocks" Decode.int
    |> required "Shared Read Blocks" Decode.int
    |> required "Shared Written Blocks" Decode.int
    |> required "Startup Cost" Decode.float
    |> optional "Subplan Name" Decode.string ""
    |> required "Temp Read Blocks" Decode.int
    |> required "Temp Written Blocks" Decode.int
    |> required "Total Cost" Decode.float
    -- There are still more fields which are not shown here

Of course, I’d be forgoing the benefits of types as a result, and I’d have to set missing fields to some default values. For string fields, empty strings are OK as a default, but in case of eg integers, things get decidedly icky. I wanted to find a better approach.

Second attempt: extensible records

Since each node would have a bunch of common attributes with the addition of a few node-specific attributes, this seemed like a good scenario for employing Elm’s extensible records:

type alias GenericNode a =
    { a | actualLoops : Int
    , actualRows : Int
    , actualStartupTime : Float
    , actualTotalTime : Float
    , localDirtiedBlocks : Int
    , localHitBlocks : Int
    , localReadBlocks : Int
    , localWrittenBlocks : Int
    , nodeType : String
    , output : List String
    , parallelAware : Bool
    , planRows : Int
    , plans : Plans
    , planWidth : Int
    , relationName : String
    , schema : String
    , sharedDirtiedBlocks : Int
    , sharedHitBlocks : Int
    , sharedReadBlocks : Int
    , sharedWrittenBlocks : Int
    , startupCost : Float
    , subplanName : String
    , tempReadBlocks : Int
    , tempWrittenBlocks : Int
    , totalCost : Float
    }

type alias SortNode 
    = GenericNode 
    { sortKey : List String
    , sortMethod : String
    , sortSpaceUsed : Int
    , sortSpaceType : String
    }

type alias ResultNode 
    = GenericNode 
    { parentRelationship : String
    }

However, after some experimentation and research I learned that extensible records have a fatal flaw: they don’t get constructors generated for them by the compiler, rendering them unusable in a decoder:

Decode.succeed GenericNode
    |> required "Actual Loops" Decode.int
    |> required "Actual Rows" Decode.int
    |> required "Actual Startup Time" Decode.float
    -- ... more decoding steps

-- Error: Cannot find variable `GenericNode`

The only workaround is to write a constructor function yourself, but due to the large number of attributes involved, this wasn’t feasible.

Third attempt: nested fields

Evan Czaplicki has expressed a strong preference for nested fields as a solution in situations like this. So I had to find a way to structure my decoders to direct one group of attributes into a nested field, while decoding the rest of them into top level fields.

Once I worked it out, the solution is actually simple. It’s a matter of using a custom decoder to populate the nested fields:

type alias GenericFields =
    { actualLoops : Int
    , actualRows : Int
    , actualStartupTime : Float
    , actualTotalTime : Float
    , localDirtiedBlocks : Int
    , localHitBlocks : Int
    , localReadBlocks : Int
    , localWrittenBlocks : Int
    , nodeType : String
    , output : List String
    , parallelAware : Bool
    , planRows : Int
    , plans : Plans
    , planWidth : Int
    , relationName : String
    , schema : String
    , sharedDirtiedBlocks : Int
    , sharedHitBlocks : Int
    , sharedReadBlocks : Int
    , sharedWrittenBlocks : Int
    , startupCost : Float
    , subplanName : String
    , tempReadBlocks : Int
    , tempWrittenBlocks : Int
    , totalCost : Float
    }


type alias ResultNode =
    { generic : GenericFields
    , parentRelationship : String
    }


type alias CteNode =
    { generic : GenericFields
    , alias_ : String
    , cteName : String
    }


type alias SortNode =
    { generic : GenericFields
    , sortKey : List String
    , sortMethod : String
    , sortSpaceUsed : Int
    , sortSpaceType : String
    }

type Plan
    = PCte CteNode
    | PResult ResultNode
    | PSort SortNode

-- Decoder for common fields
decodeGenericFields : Decode.Decoder GenericFields
decodeGenericFields =
    Decode.succeed GenericFields
        |> required "Actual Loops" Decode.int
        |> required "Actual Rows" Decode.int
        |> required "Actual Startup Time" Decode.float
        |> required "Actual Total Time" Decode.float
        |> required "Local Dirtied Blocks" Decode.int
        |> required "Local Hit Blocks" Decode.int
        |> required "Local Read Blocks" Decode.int
        |> required "Local Written Blocks" Decode.int
        |> required "Node Type" Decode.string
        |> required "Output" (Decode.list Decode.string)
        |> required "Parallel Aware" Decode.bool
        |> required "Plan Rows" Decode.int
        |> optional "Plans" (Decode.lazy (\_ -> decodePlans)) (Plans [])
        |> required "Plan Width" Decode.int
        |> optional "Relation Name" Decode.string ""
        |> optional "Schema" Decode.string ""
        |> required "Shared Dirtied Blocks" Decode.int
        |> required "Shared Hit Blocks" Decode.int
        |> required "Shared Read Blocks" Decode.int
        |> required "Shared Written Blocks" Decode.int
        |> required "Startup Cost" Decode.float
        |> optional "Subplan Name" Decode.string ""
        |> required "Temp Read Blocks" Decode.int
        |> required "Temp Written Blocks" Decode.int
        |> required "Total Cost" Decode.float

-- Decoder for a specific node record with a nested field for common fields
decodeSortNode : Decode.Decoder Plan
decodeSortNode =
    let
        innerDecoder =
            Decode.succeed SortNode
                |> custom decodeGenericFields
                |> required "Sort Key" (Decode.list Decode.string)
                |> required "Sort Method" Decode.string
                |> required "Sort Space Used" Decode.int
                |> required "Sort Space Type" Decode.string
    in
        Decode.map PSort innerDecoder

There is still some duplication between my decoders for specific node types:

decodeCteNode : Decode.Decoder Plan
decodeCteNode =
    let
        innerDecoder =
            Decode.succeed CteNode
                |> custom decodeGenericFields
                |> required "Alias" Decode.string
                |> required "CTE Name" Decode.string
    in
        Decode.map PCte innerDecoder


decodeSortNode : Decode.Decoder Plan
decodeSortNode =
    let
        innerDecoder =
            Decode.succeed SortNode
                |> custom decodeGenericFields
                |> required "Sort Key" (Decode.list Decode.string)
                |> required "Sort Method" Decode.string
                |> required "Sort Space Used" Decode.int
                |> required "Sort Space Type" Decode.string
    in
        Decode.map PSort innerDecoder

Can this be generalised further?

It’s tempting to extract the common structure into a polymorphic function which takes the node-specific portion of the decoder as an argument, something like this:

decodeSomeNode nodeType planId decoderChain =
    let
        genericDecoder =
            custom decodeGenericFields (Decode.succeed nodeType)

        innerDecoder =
            decoderChain genericDecoder
    in
        Decode.map planId innerDecoder

However, because there is no way for me to tell the compiler that each of my node types has a field for common attributes, I cannot express the relationship between nodeType and GenericFields and so this function cannot compile. This is a typical tradeoff in Elm: if some code duplication is required in the absence of a more advanced type system, then so be it - it’s better to keep the language conceptually simple. It remains to be seen whether I’m fully on board with this but at least it’s a clearly expressed goal of the language.

Comments or questions? I’m @alexkorban on Twitter.

Looking for the nuts-and-bolts guide to creating non-trivial real world apps in Elm?

My book, Practical Elm for a Busy Developer, skips the basics and gets right into explaining how to do practical stuff. Things like building out the UI, communicating with servers, parsing JSON, structuring the application as it grows, testing, and so on. No handholding — the focus is on giving you more substance.

It’s up to date with Elm 0.19.

Pop in your email to get a sample chapter.

(You will also get notifications of new posts along with other mailing list only freebies.)

Book cover