#873 InStream / OutStream endianness

liamstask Wed 16 Dec 2009

It doesn't seem like there are currently any options for specifying a little endian stream. Not having to do a byte swap on each writeI4() etc would be pretty handy :) Please let me know if I've missed a better way to do this.

brian Wed 16 Dec 2009

I have not added any conveniences for little endian just because I don't think java.io has any support for that. Although I must admit I've needed it in the past so I don't mind adding it.

There are a couple ways to support this functionality:

  1. add a flag to InStream/OutStream which is implicitly used for I2, I4, I8, F4, F8
  2. add Bool param to those methods
  3. add new methods

The first option is nicest to use, but adds a field load and branch for every call which is non-trivial overhead for serious binary I/O. But you could make the argument that we already take that hit for text with the Charset overhead (which probably much more heavily used).

liamstask Wed 16 Dec 2009

From an API perspective, I think option 1 is nicest, although I could most definitely live with any of those options that proved most efficient.

Is it not possible/desirable to use java.nio for some reason? That seems to have support for ByteOrder.

tactics Wed 16 Dec 2009

I also ran into this problem last week.

I'd opt for adding new methods. I think the file format usually specifies what endianness you need.

We could also do both 1 and 3, where we have I2, I2BigEndian, and I3LittleEndian (but with better names ;-) Then, you use I2 when you need to be able to change the endianness on the fly or you want to let the system specify it.

alexlamsl Wed 16 Dec 2009

How about having a seperate class which does Little Endian? That way we can choose at construction time (which is handy in the case when endian-ness should be transparent to consumers) and it would not incur any runtime overhead for every I/O call.

brian Thu 17 Dec 2009

@liamstask - just curious, what are you doing that requires little endian?

The problem with adding new methods is that its 7 methods to classes which are fairly big already. So I'm thinking the API will be most usable by adding an endian flag.

My proposal is 3 new fields on Buf, InStream, OutStream:

Str endian := endianBig
static const Str endianBig    := "big"
static const Str endianLittle := "little"

Or we could actually declare a full Endian enum - but I want to avoid polluting the sys namespace with an enum for such a niche API.

Buf mode will work like charset - it implicitly sets both in/out streams.

Then all the following methods will use the mode: I2/U2/S2, I4/U4/S4, I8/S8, F4, F8

How does that work for everybody?

ivan Thu 17 Dec 2009

I dislike using strs as endian type. Probabaly there should be a flag like bigEndian and method littleEndian() { !bigEndian }? I doubt there may appear one more type of endianess :-)

tactics Thu 17 Dec 2009

just curious, what are you doing that requires little endian? I don't know about liamstask, but I was using a homemade readU4littleEndian to inspect .dex files created for the Dalvik VM/Android platform.

liamstask Thu 17 Dec 2009

I've been working on a pure Fan MongoDB driver. I've got most of the serialization working, decent scaffolding of the basic DB operations, and some OK test coverage, but I'm not actually talking to the DB yet since it (for some strange reason) speaks little endian on the wire.

I like the idea of a flag too as opposed to string values - nice and simple.

brian Thu 17 Dec 2009

Promoted to ticket #873 and assigned to brian

brian Thu 17 Dec 2009

I've been working on a pure Fan MongoDB driver

I find it utterly unbelievable that MongoDB's BSON format uses little endian - that is just plain wrong in this day in age.

The reason I like using a Str for the flag is that it makes it easy to dump. There is no difference b/w using an Int vs a Str for a flag other than readability. Using two different fields like littleEndian and bigEndian seems a un-DRY; I don't like two fields to which basically do one thing.

liamstask Fri 18 Dec 2009

Yeah - I couldn't quite figure that out. Maybe it's an attempt at a slight optimization, thinking that most platforms these days are little endian, and not having to actually swap the bytes might save a bit of work? I also noticed Protocol Buffers seem to be little endian. Interesting.

Str works fine for me.

brian Fri 18 Dec 2009

Ticket resolved in 1.0.48

I decided the cleanest solution was to just create a sys::Endian enum. New fields:

InStream.endian
OutStream.endian
Buf.endian

The binary I/O methods will now use the configured endian.

liamstask Mon 21 Dec 2009

Cool - thanks again for this.

One issue I found was that TcpSocket seems to use its own In/OutStreams internally, meaning they typically won't function correctly when they're set to little endian. To test, set a TcpSocket's streams to little endian before connecting and then try to connect - I always get an IoErr.

Unfortunately, this makes it difficult to make use of the nice endianness controls for the user data flowing through the streams. I can work around this a bit in my current use case, but this seems problematic. Any thoughts?

brian Mon 21 Dec 2009

You can't get the IO streams until after you connect. But once you are connected you should be able to configure charset/endian. Here is a quick script that shows endian working:

Actor(ActorPool()) |->|
{
  s := TcpListener().bind(null, 12345).accept
  echo(s.in.readU4.toHex)
  s.in.endian = Endian.little
  echo(s.in.readU4.toHex)
  Sys.exit(0)
}.send("start")

c := TcpSocket().connect(IpAddress("localhost"), 12345)
c.out.endian = Endian.little
c.out.writeI4(0xaabbccdd).writeI4(0xaabbccdd).flush
Actor.sleep(10sec)

liamstask Mon 21 Dec 2009

Hm - I'm having no problem reading little endian data from the InStream configured as you've shown above, but I'm seeing some bogus length values on the other end of a connection with a TcpSocket configured with a little endian OutStream. When I first write the data to a little endian Buf and then send it through via writeBuf, it works as expected. Maybe there's something specific to Fantom sockets here?

brian Mon 21 Dec 2009

Pretty much everything including the inet code boils down to a single OutStream and InStream class in the fan.sys package. So unless something is messed up reading individual bytes/chars then everything should work exactly the same b/w IO streams.

liamstask Mon 21 Dec 2009

Sure enough - was an error on my side. Sorry for the extra noise.

Login or Signup to reply.