I’ve recently released json2elm, a tool which allows you to generate elm/json
JSON decoders and encoders (along with relevant type definitions) from a sample JSON string.
A curiously self-referencing thing about json2elm is that it itself uses elm/json
decoders to generate elm/json
decoders. How does it work?
The thing about elm/json
is that it allows you to parse JSON into arbitrary data structures. In my case, I need to parse arbitrary JSON strings into a data structure. This means that I need to come up with a data structure which can represent the structure of any JSON string, rather than anything specific like User
or StockItem
.
The JSON specification in RFC 8259 is a good place to look for a most generic description of a JSON value. In there, it’s defined as:
value = false / null / true / object / array / number / string
So there are four primitives (null
, Boolean, Number, String) and two structures (Array and Object) which can in turn contain JSON values.
I can represent this as a custom Elm type:
type JsonValue
= JString String
| JFloat Float
| JBool Bool
| JNull
| JList (List JsonValue)
| JObj (List (String, JsonValue))
Note that this type is defined in terms of itself, which is fine for a custom type.
Correspondingly, I can define an elm/json
decoder to produce a JsonValue
:
import Json.Decode as Decode
jsonDecoder : Decode.Decoder JsonValue
jsonDecoder =
Decode.oneOf
[ Decode.map JString Decode.string
, Decode.map JFloat Decode.float
, Decode.map JBool Decode.bool
, Decode.null JNull
, Decode.map JList <| Decode.list <| Decode.lazy (\_ -> jsonDecoder)
, Decode.map JObj <| Decode.keyValuePairs <| Decode.lazy (\_ -> jsonDecoder)
]
Decode.oneOf
will try a list of decoders given to it one by one until it finds one which succeeds. So here, instead of decoding a specific thing like a user name, I’m giving oneOf
a list of decoders which addresses all possible JSON values (and matches the JsonValue
type).
Decode.map
is used to apply a function to the decoding result, which I’m using to turn these results into JsonValue
values with the help of appropriate constructors.
In the last two decoders for lists and objects, I have to use jsonDecoder
recursively because a list or an object can in turn contain any kind of JSON values. Since a value isn’t allowed to be defined directly in terms of itself, I have to wrap jsonDecoder
with Decode.lazy (\_ -> jsonDecoder)
.
It’s all I need to start parsing JSON strings:
sample = """
[ 1, "str", { "id": 1, "name": "Pete" } ]
"""
main =
case Decode.decodeString jsonDecoder sample of
Err err ->
div [] [ text <| Decode.errorToString err ]
Ok tree ->
div []
[ text <| Debug.toString tree
]
-- Output: JList [JFloat 1,JString "str",JObj [("id",JFloat 1),("name",JString "Pete")]]
The next step is to turn these kinds of values into bits of Elm code which describe decoders. For the sample JSON in the above example, assuming we call the top level value MyList
, the decoders should look something like this:
decodeMyList : Json.Decode.Decoder (List MyList)
decodeMyList =
Json.Decode.list decodeMyListMember
decodeMyListMember : Json.Decode.Decoder MyList
decodeMyListMember =
Json.Decode.oneOf
[ Json.Decode.map MyList0 <| decodeEntity
, Json.Decode.map MyList1 <| Json.Decode.int
, Json.Decode.map MyList2 <| Json.Decode.string
]
decodeEntity : Json.Decode.Decoder Entity
decodeEntity =
Json.Decode.map2 Entity
(Json.Decode.field "id" Json.Decode.int)
(Json.Decode.field "name" Json.Decode.string)
As the JsonValue
type is recursive, it means that it represents a tree. If you look closely at the decoders above, they mirror the tree structure in the decoded JSON:
decodeMyList
is produced from the root node, decodeMyListMember
deals with the first level of child nodes, one of which happens to be an objectdecodeEntity
decodes this object (this is a second-level child node)So essentially, by traversing the tree recursively and outputting specific strings for each kind of node, I can produce the decoders.
The variants in JsonValue
map easily to the elm/json
decoder names. What remains is dealing with heterogeneous array (by decoding them to custom types) and a lot of name management: in the Elm code above, there are type names, constructor names, decoder value names and argument names.
The input JSON provides limited information for naming: aside from field names in objects, we can get index values for items in arrays, but that’s it. Consider how we could name the decoder for the last address in this JSON example:
{
"account": {
"id": 1,
"users" : [
{
"name": "abc",
"alias": "def",
"addresses": [
{
"num": 1,
"street": "High St",
"postcode": 5024
},
{
"num": 15,
"street": "Low St",
"postcode": 2346
}
]
}
]
}
}
Using field names and array indexes, we can construct the following chain to refer to the last address:
["account", "users", 0, "addresses", 1]
This can then be turned into a decoder name decodeAccountUsersObjectAddressesItem
, where Object
and Item
are obtained via Array.get index indexNoun
from an array of nouns like “item”, “instance” and so on. The reason for this somewhat unwieldy naming convention is to avoid name clashes. For example, multiple objects can have a “users” field, so using simply decodeUsers
is going to be problematic.
The rest of the implementation boils down to a lot of string manipulation to produce nicely formatted Elm code, which I’m not going to go into in this post as it isn’t very interesting.