Decoding JSON to nested record fields in Elm

2018-01-23 •

software

This post has been updated to reflect more recent changes in Json.Decode.Pipeline.

I'm working on a tool that handles PostgreSQL EXPLAIN output in JSON format.

The data consists of a tree of nodes representing different parts of a query execution plan:

Each node has a lot of attributes (more than 10), but with a significant portion of attributes common to all nodes.

The large number of attributes led me to use the Json.Decode.Pipeline package because it makes them easier to handle.

First attempt: universal decoder

My first attempt was to have a single decoder that could handle any type of node. This decoder would have a huge number of optional fields which are only present for a specific node type.

import Json.Decode as Decode
import Json.Decode.Pipeline exposing (..)

Decode.succeed PlanNode
    |> required "Actual Loops" Decode.int
    |> required "Actual Rows" Decode.int
    |> required "Actual Startup Time" Decode.float
    |> required "Actual Total Time" Decode.float
    |> optional "Alias" Decode.string ""
    |> optional "CTE Name" Decode.string ""
    |> required "Local Dirtied Blocks" Decode.int
    |> required "Local Hit Blocks" Decode.int
    |> required "Local Read Blocks" Decode.int
    |> required "Local Written Blocks" Decode.int
    |> required "Node Type" Decode.string
    |> required "Output" (Decode.list Decode.string)
    |> required "Parallel Aware" Decode.bool
    |> optional "Parent Relationship" Decode.string ""
    |> required "Plan Rows" Decode.int
    |> optional "Plans" (Decode.lazy (\_ -> decodePlans)) (Plans [])
    |> required "Plan Width" Decode.int
    |> optional "Relation Name" Decode.string ""
    |> optional "Schema" Decode.string ""
    |> required "Shared Dirtied Blocks" Decode.int
    |> required "Shared Hit Blocks" Decode.int
    |> required "Shared Read Blocks" Decode.int
    |> required "Shared Written Blocks" Decode.int
    |> required "Startup Cost" Decode.float
    |> optional "Subplan Name" Decode.string ""
    |> required "Temp Read Blocks" Decode.int
    |> required "Temp Written Blocks" Decode.int
    |> required "Total Cost" Decode.float
    -- There are still more fields which are not shown here

Of course, I'd be forgoing the benefits of types as a result, and I'd have to set missing fields to some default values. For string fields, empty strings are OK as a default, but in case of eg integers, things get decidedly icky. I wanted to find a better approach.

Second attempt: extensible records

Since each node would have a bunch of common attributes with the addition of a few node-specific attributes, this seemed like a good scenario for employing Elm's extensible records:

type alias GenericNode a =
    { a | actualLoops : Int
    , actualRows : Int
    , actualStartupTime : Float
    , actualTotalTime : Float
    , localDirtiedBlocks : Int
    , localHitBlocks : Int
    , localReadBlocks : Int
    , localWrittenBlocks : Int
    , nodeType : String
    , output : List String
    , parallelAware : Bool
    , planRows : Int
    , plans : Plans
    , planWidth : Int
    , relationName : String
    , schema : String
    , sharedDirtiedBlocks : Int
    , sharedHitBlocks : Int
    , sharedReadBlocks : Int
    , sharedWrittenBlocks : Int
    , startupCost : Float
    , subplanName : String
    , tempReadBlocks : Int
    , tempWrittenBlocks : Int
    , totalCost : Float
    }

type alias SortNode
    = GenericNode
    { sortKey : List String
    , sortMethod : String
    , sortSpaceUsed : Int
    , sortSpaceType : String
    }

type alias ResultNode
    = GenericNode
    { parentRelationship : String
    }

However, after some experimentation and research I learned that extensible records have a fatal flaw: they don't get constructors generated for them by the compiler, rendering them unusable in a decoder:

Decode.succeed GenericNode
    |> required "Actual Loops" Decode.int
    |> required "Actual Rows" Decode.int
    |> required "Actual Startup Time" Decode.float
    -- ... more decoding steps

-- Error: Cannot find variable `GenericNode`

The only workaround is to write a constructor function yourself, but due to the large number of attributes involved, this wasn't feasible.

Third attempt: nested fields

Evan Czaplicki has expressed a strong preference for nested fields as a solution in situations like this. So I had to find a way to structure my decoders to direct one group of attributes into a nested field, while decoding the rest of them into top level fields.

Once I worked it out, the solution is actually simple. It's a matter of using a custom decoder to populate the nested fields:

type alias GenericFields =
    { actualLoops : Int
    , actualRows : Int
    , actualStartupTime : Float
    , actualTotalTime : Float
    , localDirtiedBlocks : Int
    , localHitBlocks : Int
    , localReadBlocks : Int
    , localWrittenBlocks : Int
    , nodeType : String
    , output : List String
    , parallelAware : Bool
    , planRows : Int
    , plans : Plans
    , planWidth : Int
    , relationName : String
    , schema : String
    , sharedDirtiedBlocks : Int
    , sharedHitBlocks : Int
    , sharedReadBlocks : Int
    , sharedWrittenBlocks : Int
    , startupCost : Float
    , subplanName : String
    , tempReadBlocks : Int
    , tempWrittenBlocks : Int
    , totalCost : Float
    }


type alias ResultNode =
    { generic : GenericFields
    , parentRelationship : String
    }


type alias CteNode =
    { generic : GenericFields
    , alias_ : String
    , cteName : String
    }


type alias SortNode =
    { generic : GenericFields
    , sortKey : List String
    , sortMethod : String
    , sortSpaceUsed : Int
    , sortSpaceType : String
    }

type Plan
    = PCte CteNode
    | PResult ResultNode
    | PSort SortNode

-- Decoder for common fields
decodeGenericFields : Decode.Decoder GenericFields
decodeGenericFields =
    Decode.succeed GenericFields
        |> required "Actual Loops" Decode.int
        |> required "Actual Rows" Decode.int
        |> required "Actual Startup Time" Decode.float
        |> required "Actual Total Time" Decode.float
        |> required "Local Dirtied Blocks" Decode.int
        |> required "Local Hit Blocks" Decode.int
        |> required "Local Read Blocks" Decode.int
        |> required "Local Written Blocks" Decode.int
        |> required "Node Type" Decode.string
        |> required "Output" (Decode.list Decode.string)
        |> required "Parallel Aware" Decode.bool
        |> required "Plan Rows" Decode.int
        |> optional "Plans" (Decode.lazy (\_ -> decodePlans)) (Plans [])
        |> required "Plan Width" Decode.int
        |> optional "Relation Name" Decode.string ""
        |> optional "Schema" Decode.string ""
        |> required "Shared Dirtied Blocks" Decode.int
        |> required "Shared Hit Blocks" Decode.int
        |> required "Shared Read Blocks" Decode.int
        |> required "Shared Written Blocks" Decode.int
        |> required "Startup Cost" Decode.float
        |> optional "Subplan Name" Decode.string ""
        |> required "Temp Read Blocks" Decode.int
        |> required "Temp Written Blocks" Decode.int
        |> required "Total Cost" Decode.float

-- Decoder for a specific node record with a nested field for common fields
decodeSortNode : Decode.Decoder Plan
decodeSortNode =
    let
        innerDecoder =
            Decode.succeed SortNode
                |> custom decodeGenericFields
                |> required "Sort Key" (Decode.list Decode.string)
                |> required "Sort Method" Decode.string
                |> required "Sort Space Used" Decode.int
                |> required "Sort Space Type" Decode.string
    in
        Decode.map PSort innerDecoder

There is still some duplication between my decoders for specific node types:

decodeCteNode : Decode.Decoder Plan
decodeCteNode =
    let
        innerDecoder =
            Decode.succeed CteNode
                |> custom decodeGenericFields
                |> required "Alias" Decode.string
                |> required "CTE Name" Decode.string
    in
        Decode.map PCte innerDecoder


decodeSortNode : Decode.Decoder Plan
decodeSortNode =
    let
        innerDecoder =
            Decode.succeed SortNode
                |> custom decodeGenericFields
                |> required "Sort Key" (Decode.list Decode.string)
                |> required "Sort Method" Decode.string
                |> required "Sort Space Used" Decode.int
                |> required "Sort Space Type" Decode.string
    in
        Decode.map PSort innerDecoder

Can this be generalised further?

It's tempting to extract the common structure into a polymorphic function which takes the node-specific portion of the decoder as an argument, something like this:

decodeSomeNode nodeType planId decoderChain =
    let
        genericDecoder =
            custom decodeGenericFields (Decode.succeed nodeType)

        innerDecoder =
            decoderChain genericDecoder
    in
        Decode.map planId innerDecoder

However, because there is no way for me to tell the compiler that each of my node types has a field for common attributes, I cannot express the relationship between nodeType and GenericFields and so this function cannot compile. This is a typical tradeoff in Elm: if some code duplication is required in the absence of a more advanced type system, then so be it - it's better to keep the language conceptually simple. It remains to be seen whether I'm fully on board with this but at least it's a clearly expressed goal of the language.