This post has been updated to reflect more recent changes in Json.Decode.Pipeline
.
I'm working on a tool that handles PostgreSQL EXPLAIN output in JSON format.
The data consists of a tree of nodes representing different parts of a query execution plan:
Each node has a lot of attributes (more than 10), but with a significant portion of attributes common to all nodes.
The large number of attributes led me to use the Json.Decode.Pipeline
package because it makes them easier to handle.
First attempt: universal decoder
My first attempt was to have a single decoder that could handle any type of node. This decoder would have a huge number of optional
fields which are only present for a specific node type.
import Json.Decode as Decode
import Json.Decode.Pipeline exposing (..)
Decode.succeed PlanNode
|> required "Actual Loops" Decode.int
|> required "Actual Rows" Decode.int
|> required "Actual Startup Time" Decode.float
|> required "Actual Total Time" Decode.float
|> optional "Alias" Decode.string ""
|> optional "CTE Name" Decode.string ""
|> required "Local Dirtied Blocks" Decode.int
|> required "Local Hit Blocks" Decode.int
|> required "Local Read Blocks" Decode.int
|> required "Local Written Blocks" Decode.int
|> required "Node Type" Decode.string
|> required "Output" (Decode.list Decode.string)
|> required "Parallel Aware" Decode.bool
|> optional "Parent Relationship" Decode.string ""
|> required "Plan Rows" Decode.int
|> optional "Plans" (Decode.lazy (\_ -> decodePlans)) (Plans [])
|> required "Plan Width" Decode.int
|> optional "Relation Name" Decode.string ""
|> optional "Schema" Decode.string ""
|> required "Shared Dirtied Blocks" Decode.int
|> required "Shared Hit Blocks" Decode.int
|> required "Shared Read Blocks" Decode.int
|> required "Shared Written Blocks" Decode.int
|> required "Startup Cost" Decode.float
|> optional "Subplan Name" Decode.string ""
|> required "Temp Read Blocks" Decode.int
|> required "Temp Written Blocks" Decode.int
|> required "Total Cost" Decode.float
-- There are still more fields which are not shown here
Of course, I'd be forgoing the benefits of types as a result, and I'd have to set missing fields to some default values. For string fields, empty strings are OK as a default, but in case of eg integers, things get decidedly icky. I wanted to find a better approach.
Second attempt: extensible records
Since each node would have a bunch of common attributes with the addition of a few node-specific attributes, this seemed like a good scenario for employing Elm's extensible records:
type alias GenericNode a =
{ a | actualLoops : Int
, actualRows : Int
, actualStartupTime : Float
, actualTotalTime : Float
, localDirtiedBlocks : Int
, localHitBlocks : Int
, localReadBlocks : Int
, localWrittenBlocks : Int
, nodeType : String
, output : List String
, parallelAware : Bool
, planRows : Int
, plans : Plans
, planWidth : Int
, relationName : String
, schema : String
, sharedDirtiedBlocks : Int
, sharedHitBlocks : Int
, sharedReadBlocks : Int
, sharedWrittenBlocks : Int
, startupCost : Float
, subplanName : String
, tempReadBlocks : Int
, tempWrittenBlocks : Int
, totalCost : Float
}
type alias SortNode
= GenericNode
{ sortKey : List String
, sortMethod : String
, sortSpaceUsed : Int
, sortSpaceType : String
}
type alias ResultNode
= GenericNode
{ parentRelationship : String
}
However, after some experimentation and research I learned that extensible records have a fatal flaw: they don't get constructors generated for them by the compiler, rendering them unusable in a decoder:
Decode.succeed GenericNode
|> required "Actual Loops" Decode.int
|> required "Actual Rows" Decode.int
|> required "Actual Startup Time" Decode.float
-- ... more decoding steps
-- Error: Cannot find variable `GenericNode`
The only workaround is to write a constructor function yourself, but due to the large number of attributes involved, this wasn't feasible.
Third attempt: nested fields
Evan Czaplicki has expressed a strong preference for nested fields as a solution in situations like this. So I had to find a way to structure my decoders to direct one group of attributes into a nested field, while decoding the rest of them into top level fields.
Once I worked it out, the solution is actually simple. It's a matter of using a custom
decoder to populate the nested fields:
type alias GenericFields =
{ actualLoops : Int
, actualRows : Int
, actualStartupTime : Float
, actualTotalTime : Float
, localDirtiedBlocks : Int
, localHitBlocks : Int
, localReadBlocks : Int
, localWrittenBlocks : Int
, nodeType : String
, output : List String
, parallelAware : Bool
, planRows : Int
, plans : Plans
, planWidth : Int
, relationName : String
, schema : String
, sharedDirtiedBlocks : Int
, sharedHitBlocks : Int
, sharedReadBlocks : Int
, sharedWrittenBlocks : Int
, startupCost : Float
, subplanName : String
, tempReadBlocks : Int
, tempWrittenBlocks : Int
, totalCost : Float
}
type alias ResultNode =
{ generic : GenericFields
, parentRelationship : String
}
type alias CteNode =
{ generic : GenericFields
, alias_ : String
, cteName : String
}
type alias SortNode =
{ generic : GenericFields
, sortKey : List String
, sortMethod : String
, sortSpaceUsed : Int
, sortSpaceType : String
}
type Plan
= PCte CteNode
| PResult ResultNode
| PSort SortNode
-- Decoder for common fields
decodeGenericFields : Decode.Decoder GenericFields
decodeGenericFields =
Decode.succeed GenericFields
|> required "Actual Loops" Decode.int
|> required "Actual Rows" Decode.int
|> required "Actual Startup Time" Decode.float
|> required "Actual Total Time" Decode.float
|> required "Local Dirtied Blocks" Decode.int
|> required "Local Hit Blocks" Decode.int
|> required "Local Read Blocks" Decode.int
|> required "Local Written Blocks" Decode.int
|> required "Node Type" Decode.string
|> required "Output" (Decode.list Decode.string)
|> required "Parallel Aware" Decode.bool
|> required "Plan Rows" Decode.int
|> optional "Plans" (Decode.lazy (\_ -> decodePlans)) (Plans [])
|> required "Plan Width" Decode.int
|> optional "Relation Name" Decode.string ""
|> optional "Schema" Decode.string ""
|> required "Shared Dirtied Blocks" Decode.int
|> required "Shared Hit Blocks" Decode.int
|> required "Shared Read Blocks" Decode.int
|> required "Shared Written Blocks" Decode.int
|> required "Startup Cost" Decode.float
|> optional "Subplan Name" Decode.string ""
|> required "Temp Read Blocks" Decode.int
|> required "Temp Written Blocks" Decode.int
|> required "Total Cost" Decode.float
-- Decoder for a specific node record with a nested field for common fields
decodeSortNode : Decode.Decoder Plan
decodeSortNode =
let
innerDecoder =
Decode.succeed SortNode
|> custom decodeGenericFields
|> required "Sort Key" (Decode.list Decode.string)
|> required "Sort Method" Decode.string
|> required "Sort Space Used" Decode.int
|> required "Sort Space Type" Decode.string
in
Decode.map PSort innerDecoder
There is still some duplication between my decoders for specific node types:
decodeCteNode : Decode.Decoder Plan
decodeCteNode =
let
innerDecoder =
Decode.succeed CteNode
|> custom decodeGenericFields
|> required "Alias" Decode.string
|> required "CTE Name" Decode.string
in
Decode.map PCte innerDecoder
decodeSortNode : Decode.Decoder Plan
decodeSortNode =
let
innerDecoder =
Decode.succeed SortNode
|> custom decodeGenericFields
|> required "Sort Key" (Decode.list Decode.string)
|> required "Sort Method" Decode.string
|> required "Sort Space Used" Decode.int
|> required "Sort Space Type" Decode.string
in
Decode.map PSort innerDecoder
Can this be generalised further?
It's tempting to extract the common structure into a polymorphic function which takes the node-specific portion of the decoder as an argument, something like this:
decodeSomeNode nodeType planId decoderChain =
let
genericDecoder =
custom decodeGenericFields (Decode.succeed nodeType)
innerDecoder =
decoderChain genericDecoder
in
Decode.map planId innerDecoder
However, because there is no way for me to tell the compiler that each of my node types has a field for common attributes, I cannot express the relationship between nodeType
and GenericFields
and so this function cannot compile.
This is a typical tradeoff in Elm: if some code duplication is required in the absence of a more advanced type system, then so be it - it's better to keep the language conceptually simple. It remains to be seen whether I'm fully on board with this but at least it's a clearly expressed goal of the language.