This post has been updated to reflect more recent changes in Json.Decode.Pipeline
.
I’m working on a tool that handles PostgreSQL EXPLAIN output in JSON format.
The data consists of a tree of nodes representing different parts of a query execution plan:
Each node has a lot of attributes (more than 10), but with a significant portion of attributes common to all nodes.
The large number of attributes led me to use the Json.Decode.Pipeline
package because it makes them easier to handle.
My first attempt was to have a single decoder that could handle any type of node. This decoder would have a huge number of optional
fields which are only present for a specific node type.
import Json.Decode as Decode
import Json.Decode.Pipeline exposing (..)
Decode.succeed PlanNode
|> required "Actual Loops" Decode.int
|> required "Actual Rows" Decode.int
|> required "Actual Startup Time" Decode.float
|> required "Actual Total Time" Decode.float
|> optional "Alias" Decode.string ""
|> optional "CTE Name" Decode.string ""
|> required "Local Dirtied Blocks" Decode.int
|> required "Local Hit Blocks" Decode.int
|> required "Local Read Blocks" Decode.int
|> required "Local Written Blocks" Decode.int
|> required "Node Type" Decode.string
|> required "Output" (Decode.list Decode.string)
|> required "Parallel Aware" Decode.bool
|> optional "Parent Relationship" Decode.string ""
|> required "Plan Rows" Decode.int
|> optional "Plans" (Decode.lazy (\_ -> decodePlans)) (Plans [])
|> required "Plan Width" Decode.int
|> optional "Relation Name" Decode.string ""
|> optional "Schema" Decode.string ""
|> required "Shared Dirtied Blocks" Decode.int
|> required "Shared Hit Blocks" Decode.int
|> required "Shared Read Blocks" Decode.int
|> required "Shared Written Blocks" Decode.int
|> required "Startup Cost" Decode.float
|> optional "Subplan Name" Decode.string ""
|> required "Temp Read Blocks" Decode.int
|> required "Temp Written Blocks" Decode.int
|> required "Total Cost" Decode.float
-- There are still more fields which are not shown here
Of course, I’d be forgoing the benefits of types as a result, and I’d have to set missing fields to some default values. For string fields, empty strings are OK as a default, but in case of eg integers, things get decidedly icky. I wanted to find a better approach.
Since each node would have a bunch of common attributes with the addition of a few node-specific attributes, this seemed like a good scenario for employing Elm’s extensible records:
type alias GenericNode a =
{ a | actualLoops : Int
, actualRows : Int
, actualStartupTime : Float
, actualTotalTime : Float
, localDirtiedBlocks : Int
, localHitBlocks : Int
, localReadBlocks : Int
, localWrittenBlocks : Int
, nodeType : String
, output : List String
, parallelAware : Bool
, planRows : Int
, plans : Plans
, planWidth : Int
, relationName : String
, schema : String
, sharedDirtiedBlocks : Int
, sharedHitBlocks : Int
, sharedReadBlocks : Int
, sharedWrittenBlocks : Int
, startupCost : Float
, subplanName : String
, tempReadBlocks : Int
, tempWrittenBlocks : Int
, totalCost : Float
}
type alias SortNode
= GenericNode
{ sortKey : List String
, sortMethod : String
, sortSpaceUsed : Int
, sortSpaceType : String
}
type alias ResultNode
= GenericNode
{ parentRelationship : String
}
However, after some experimentation and research I learned that extensible records have a fatal flaw: they don’t get constructors generated for them by the compiler, rendering them unusable in a decoder:
Decode.succeed GenericNode
|> required "Actual Loops" Decode.int
|> required "Actual Rows" Decode.int
|> required "Actual Startup Time" Decode.float
-- ... more decoding steps
-- Error: Cannot find variable `GenericNode`
The only workaround is to write a constructor function yourself, but due to the large number of attributes involved, this wasn’t feasible.
Evan Czaplicki has expressed a strong preference for nested fields as a solution in situations like this. So I had to find a way to structure my decoders to direct one group of attributes into a nested field, while decoding the rest of them into top level fields.
Once I worked it out, the solution is actually simple. It’s a matter of using a custom
decoder to populate the nested fields:
type alias GenericFields =
{ actualLoops : Int
, actualRows : Int
, actualStartupTime : Float
, actualTotalTime : Float
, localDirtiedBlocks : Int
, localHitBlocks : Int
, localReadBlocks : Int
, localWrittenBlocks : Int
, nodeType : String
, output : List String
, parallelAware : Bool
, planRows : Int
, plans : Plans
, planWidth : Int
, relationName : String
, schema : String
, sharedDirtiedBlocks : Int
, sharedHitBlocks : Int
, sharedReadBlocks : Int
, sharedWrittenBlocks : Int
, startupCost : Float
, subplanName : String
, tempReadBlocks : Int
, tempWrittenBlocks : Int
, totalCost : Float
}
type alias ResultNode =
{ generic : GenericFields
, parentRelationship : String
}
type alias CteNode =
{ generic : GenericFields
, alias_ : String
, cteName : String
}
type alias SortNode =
{ generic : GenericFields
, sortKey : List String
, sortMethod : String
, sortSpaceUsed : Int
, sortSpaceType : String
}
type Plan
= PCte CteNode
| PResult ResultNode
| PSort SortNode
-- Decoder for common fields
decodeGenericFields : Decode.Decoder GenericFields
decodeGenericFields =
Decode.succeed GenericFields
|> required "Actual Loops" Decode.int
|> required "Actual Rows" Decode.int
|> required "Actual Startup Time" Decode.float
|> required "Actual Total Time" Decode.float
|> required "Local Dirtied Blocks" Decode.int
|> required "Local Hit Blocks" Decode.int
|> required "Local Read Blocks" Decode.int
|> required "Local Written Blocks" Decode.int
|> required "Node Type" Decode.string
|> required "Output" (Decode.list Decode.string)
|> required "Parallel Aware" Decode.bool
|> required "Plan Rows" Decode.int
|> optional "Plans" (Decode.lazy (\_ -> decodePlans)) (Plans [])
|> required "Plan Width" Decode.int
|> optional "Relation Name" Decode.string ""
|> optional "Schema" Decode.string ""
|> required "Shared Dirtied Blocks" Decode.int
|> required "Shared Hit Blocks" Decode.int
|> required "Shared Read Blocks" Decode.int
|> required "Shared Written Blocks" Decode.int
|> required "Startup Cost" Decode.float
|> optional "Subplan Name" Decode.string ""
|> required "Temp Read Blocks" Decode.int
|> required "Temp Written Blocks" Decode.int
|> required "Total Cost" Decode.float
-- Decoder for a specific node record with a nested field for common fields
decodeSortNode : Decode.Decoder Plan
decodeSortNode =
let
innerDecoder =
Decode.succeed SortNode
|> custom decodeGenericFields
|> required "Sort Key" (Decode.list Decode.string)
|> required "Sort Method" Decode.string
|> required "Sort Space Used" Decode.int
|> required "Sort Space Type" Decode.string
in
Decode.map PSort innerDecoder
There is still some duplication between my decoders for specific node types:
decodeCteNode : Decode.Decoder Plan
decodeCteNode =
let
innerDecoder =
Decode.succeed CteNode
|> custom decodeGenericFields
|> required "Alias" Decode.string
|> required "CTE Name" Decode.string
in
Decode.map PCte innerDecoder
decodeSortNode : Decode.Decoder Plan
decodeSortNode =
let
innerDecoder =
Decode.succeed SortNode
|> custom decodeGenericFields
|> required "Sort Key" (Decode.list Decode.string)
|> required "Sort Method" Decode.string
|> required "Sort Space Used" Decode.int
|> required "Sort Space Type" Decode.string
in
Decode.map PSort innerDecoder
It’s tempting to extract the common structure into a polymorphic function which takes the node-specific portion of the decoder as an argument, something like this:
decodeSomeNode nodeType planId decoderChain =
let
genericDecoder =
custom decodeGenericFields (Decode.succeed nodeType)
innerDecoder =
decoderChain genericDecoder
in
Decode.map planId innerDecoder
However, because there is no way for me to tell the compiler that each of my node types has a field for common attributes, I cannot express the relationship between nodeType
and GenericFields
and so this function cannot compile.
This is a typical tradeoff in Elm: if some code duplication is required in the absence of a more advanced type system, then so be it - it’s better to keep the language conceptually simple. It remains to be seen whether I’m fully on board with this but at least it’s a clearly expressed goal of the language.