All Topics

#864 Serialization: Why tree based?

DanielFath Fri 11 Dec 2009

I hope the question is obvious from the title, but a more explicit version:

What are pros and cons for tree/graph based serialization?
Was this type of serialization chosen for easier "JSON-like" serialization?
Wouldn't graph based serialization make more sense if one wanted to serialize information directly into a database.

casperbang Fri 11 Dec 2009

Graph based serialization suffers from the problem of cycles, as anyone who tried serializing Java beans into XML or JSON will have experienced. It works reasonable well as a BLOB, however that has all kind of other problems related to versioning and transparency.

So I for one like Fantom's approach to this. If your hierarchy contains cycles (or costly redundancies), you can use pre- og post- processing to handle this wiring. Fantom's once modifier comes in handy here I find.

But are you really asking, why there's no way to use tree node references (as i.e. XPath's relative references or XStream's ID references)?

qualidafial Sat 12 Dec 2009

I think the number one reason serialization is tree-based is because currently the serialization format is a proper subset of the Fantom language. So serialized Fan objects are valid Fan code. That aspect comes in extremely handy.

In order to support graph-based serialization, there must be:

Some way to identify which objects in the graph are references more than once -- which will probably make the serialization pipeline two-pass instead of single-pass. Probably no big deal.
A whiteboard mechanism whereby the object is serialized fully and stored to the whiteboard the first time it is encountered, and serialized as a back reference to the whiteboard each successive time it is encountered.

Therein lies the sticking point--the whiteboard mechanism must be supported by the serialization engine, either as:

A new language syntax which must necessarily be valid anywhere in Fan code in order to remain a proper subset of the language--"My beautiful Fantom language, what have they done to you?!"--or,
A core API which the serialization engine must be taught to recognize. However then we have a class laying around in core API purely to support tree-based serialization. Unless we can formulate this API into something generically useful.

Choose your poison. Either way, you are likely to be eaten by a grue.

I haven't got my mind wrapped around it but it might be possible to achieve tree-based serialization by adding a new option OutStream.writeObj that acts as a strategy for marshalling / unmarshalling object references. This way you might be able to define your own id/idref syntax at least, but this is less ideal than having it supported directly by the language.

brian Sat 12 Dec 2009

Tree serialization is simpler, but much more flexible.

I've taken the approach with Fantom that when you annotate your classes and slots as serializable that any format can be used: Fantom's syntax, JSON, XML, etc. The days of CORBA/RMI are gone, and things are definitely moving in a direction of text formats such as JSON and XML which are tree based.

That is not to say that you can't cross reference objects. But referencing objects to construct a true graph requires a naming system, and I don't want to dictate any particular naming system - that tends to be an application level function. For example if you are stuffing an object in a database, graphs are constructed using foreign keys. In XML graphs are typically built with document identifiers internally and URIs externally.

DanielFath Sat 12 Dec 2009

Thanks, that's what I needed.