Analyze JSON library

JSON files shouldn’t contain a lot of data, but sometimes they do. Recently I’ve received huge JSON file. It consisted of more than 100k lines and had more than 300 unique keys (attributes if you will). In memoQ you can create a filter for JSON files and then you can define which keys should be translated. You need to add your JSON to the filter and then populate the list of keys, then you can edit this list either in flat or structural view. In flat mode you simply set which keys should be translated and which not. You’re receiving a list of unique keys, so in case of 300 keys you need to make 300 decisions. Probably even less as default is to translate given key, so you only need to mark what shouldn’t be translated. In structural mode you’re dealing with the structure as present in given JSON, so if given key has been used 10 times in JSON, you’ll have to make 10 decision. Multiply this by 300 keys and… It’s a nightmare. And it works like memoQ’s XML filter (no matter the key editing mode), so if you’ll mark as non-translatable key which contains another key which should be translated, the value of inner key won’t be imported. Which is as expected and correct. You can circumvent it by using strongly translatable option, but I wouldn’t go this way. You can loose track pretty quickly.

I’ve figured that if I can determine all leafs, nodes which don’t contain other nodes, then I can just decide which of them should be translated. And then in flat view just mark the rest as not translatable. But how to do that?

Let’s start with listing of all unique keys. I’ve found out that my favourite JSON tool, jq, has a solution for it jq keys your.json. Problem is that for JSON as below:

{
"something": {
"inner": "text"
},
"anotherSomething": "text"
}

It’ll give you:

[
 "anotherSomething",
 "something"
]

So, not all keys. OK, I’ve thought, I’ll make my own library for that. It’ll be good exercise and I would definitely use it a lot, so it’s worth it. Even if it wouldn’t, I would’ve still done it, just for opportunity to code it;) Recently on IRC somebody mentioned that they would want to save go.mod files as JSON. I have no idea why anybody would need it, but I’ve made a parser quickly and then hooked it to serialiser, just for fun. I know people for whom programming is no longer fun, but for me it still is, and I hope it’ll stay this way.

I’ve thought it’ll be easier. But once I’ve figured out how to start it wasn’t so hard. I’ve discarded System.Text.Json pretty quickly as maybe it’s faster than Newtonsoft’s Json.NET. However, it’s still not very user friendly, or flexible. At first I’ve thought I’ll be able to deserialise everything into Dictionary and then iterate over it. No such luck. Even thought it’s my usual way of handling JSONs, it didn’t work for me this time. JSON was too complex.

Anyway, I’ve focused on JObject, and to be specific on JToken. From this point it was quite easy. I’ve created a method which accepts Action<JToken> and then traverses given JSON file executing given action on each node. This way one can easily add new functionality. All what has to be done is to create a new action which would do what we want.

That was my next step. I’ve created AnalyzeStructure class which is doing exactly what I wanted to achieve. It returns collection of nodes and leafs (I know that leaf is also a node) with some additional information. I’ve also created Statistics class which is focusing on tokens of type string and counts words in their values. Pretty handy for localisation.

I’ve also had a chance to play with records a bit and must admit they’re pretty handy. Actually, I’ve started to use them a lot, and maybe a bit to abuse them, in all my new projects.

It was a lot of fun creating it and hopefully it would be useful to somebody else as it’s useful to me. The code can be found here.