#766 Problem with deserialization?

ivan Thu 1 Oct 2009

Hello, Looks like there is a problem with sys::InStream.readObj. Here is an dummy example:

out := `/home/ivan/output`.toFile.out
3.times { out.writeObj(`/home/ivan/`) }
out.flush.close

in := `/home/ivan/output`.toFile.in
3.times { echo(in.readObj()) }

This example succesfully reads first object, but then fails with IOErr: Unexpected symbol: / (0x2f)

However everything works fine when there is EOS after first object.

Fandoc for sys::Instream.readObj says:

This method may consume bytes/chars past the end of the serialized object (we may want to add a "full stop" token at some point to support compound object streams).

Is there any news regarding adding such token?

ivan Thu 1 Oct 2009

Found temporary workaround:

out := `/home/ivan/output`.toFile.out
3.times 
{ 
  buf := StrBuf()
  buf.out.writeObj(`/home/ivan/`) 
  out.writeUtf(buf.toStr)
}
out.flush.close

in := `/home/ivan/output`.toFile.in
3.times { echo(in.readUtf().in.readObj) }

brian Thu 1 Oct 2009

Yeah, it is difficult to parse text off a stream without some sort of look ahead (which tends to be on a token basis, not necessarily a char basis).

The way I typically handle it is to create some record separator myself in the stream, then use that to chunk the stream into serialized objects.

Although I am open to trying to improve the current design with some "stop token".

SlimerDude Tue 17 Jun 2014

Related to the above, it still seems to be the case that sys::InStream.readObj can still only read one Obj from a Stream.

I ran into this when writing the Binary object in BSON. Wanting to serialise it, I thought "Easy! Just provide a toStr() and a fromStr() to write / read the values and mark it as @Serializable {simple=true}"

Essentially all I had was an Int and Str, so I tried this:

override Str toStr() {
  Buf().writeObj(myInt).writeObj(myStr).flip.readAllStr
}

static new fromStr(Str str) {
  buf   := str.toBuf
  myInt := buf.readObj
  myStr := (Str) buf.readObj
  return Binary(myInt, myStr)
}

But then I got an EOS Err when reading myStr, presumably due to readObj():

This method may consume bytes/chars past the end of the serialized object

My work around was to seek() to the end of the first Obj, and continue reading. It seems to work fine:

static new fromStr(Str str) {
  buf   := str.toBuf
  myInt := buf.readObj

  // this next line is horrible, but works!
  buf.seek(Buf().writeObj(myInt).pos)

  myStr := (Str) buf.readObj
  return Binary(myInt, myStr)
}

I was just wondering if this method of seeking to end of an Object could be utilised by InStream.readObj() so it can read multiple objects from the same stream. (For it would be really useful!)

brian Tue 17 Jun 2014

Its really just a text tokenizing thing that you are typically looking ahead at a few tokens. So that code was all designed to suck in the entire stream or else use some other breaking mechanism to combine multiple objects together. The seek trick only works off a random access file (wouldn't work off a socket stream say)

SlimerDude Tue 2 Sep 2014

I've been re-(looking at / thinking about) this.

I can see that all the code is in the java fanx.serial package and that the tokenising you talk about is in the aptly named Tokenizer class. I was trying to understand why when reading an Obj you would need to read beyond the end of the Obj.

Complex objects seem easy enough - they end with a } - so just don't read beyond the last }!

I guess the problem is with (simple!) literals, especially numerical ones. For example, is 42 one number or is it two numbers, a 4 followed by a 2?

The idea of a stop token seems a bit cludgy to me, but with the above literal problem I don't see a way around it. Unless ObjEncoder.java when encoding top level Objs wrote out literals long hand, say 42 became sys::Int("42") or similar.

Going back to the stop token, would a ; char work? It's readable and understandable. It already represents end of statement, so for it to further represent end of object isn't such a big leap.

rasa Wed 3 Sep 2014

@brian

Its really just a text tokenizing thing that you are typically looking ahead at a few tokens. So that code was all designed to suck in the entire stream or else use some other breaking mechanism to combine multiple objects together. The seek trick only works off a random access file (wouldn't work off a socket stream say)

You made so many pros&cons compromises during the Fantom design, so I don't understand why do you bother with such things like file deserialization through sockets. I find it so rare so the solution might be to copy the stream to temp file and then use random access to desearialize objects from it. If someone wants deserializtion over stream than it can switch to Java serialization capabilities. Besides, what's the benefit of having txt serialized files at the remote place?

Login or Signup to reply.