InStream / OutStream endianness #873
brian
16 Dec 2009
I have not added any conveniences for little endian just because I don't think java.io has any support for that. Although I must admit I've needed it in the past so I don't mind adding it.
There are a couple ways to support this functionality:
- add a flag to InStream/OutStream which is implicitly used for I2, I4, I8, F4, F8
- add Bool param to those methods
- add new methods
The first option is nicest to use, but adds a field load and branch for every call which is non-trivial overhead for serious binary I/O. But you could make the argument that we already take that hit for text with the Charset overhead (which probably much more heavily used).
liamstask
16 Dec 2009
From an API perspective, I think option 1 is nicest, although I could most definitely live with any of those options that proved most efficient.
Is it not possible/desirable to use java.nio for some reason? That seems to have support for ByteOrder.
tactics
16 Dec 2009
I also ran into this problem last week.
I'd opt for adding new methods. I think the file format usually specifies what endianness you need.
We could also do both 1 and 3, where we have I2, I2BigEndian, and I3LittleEndian (but with better names ;-) Then, you use I2 when you need to be able to change the endianness on the fly or you want to let the system specify it.
alexlamsl
16 Dec 2009
How about having a seperate class which does Little Endian? That way we can choose at construction time (which is handy in the case when endian-ness should be transparent to consumers) and it would not incur any runtime overhead for every I/O call.
brian
17 Dec 2009
@liamstask - just curious, what are you doing that requires little endian?
The problem with adding new methods is that its 7 methods to classes which are fairly big already. So I'm thinking the API will be most usable by adding an endian flag.
My proposal is 3 new fields on Buf, InStream, OutStream:
Str endian := endianBig static const Str endianBig := "big" static const Str endianLittle := "little"
Or we could actually declare a full Endian enum - but I want to avoid polluting the sys namespace with an enum for such a niche API.
Buf mode will work like charset - it implicitly sets both in/out streams.
Then all the following methods will use the mode: I2/U2/S2, I4/U4/S4, I8/S8, F4, F8
How does that work for everybody?
ivan
17 Dec 2009
I dislike using strs as endian type. Probabaly there should be a flag like bigEndian and method littleEndian() { !bigEndian }? I doubt there may appear one more type of endianess :-)
tactics
17 Dec 2009
just curious, what are you doing that requires little endian? I don't know about liamstask, but I was using a homemade
readU4littleEndianto inspect .dex files created for the Dalvik VM/Android platform.
liamstask
17 Dec 2009
I've been working on a pure Fan MongoDB driver. I've got most of the serialization working, decent scaffolding of the basic DB operations, and some OK test coverage, but I'm not actually talking to the DB yet since it (for some strange reason) speaks little endian on the wire.
I like the idea of a flag too as opposed to string values - nice and simple.
brian
17 Dec 2009
Promoted to ticket #873 and assigned to brian
brian
17 Dec 2009
I've been working on a pure Fan MongoDB driver
I find it utterly unbelievable that MongoDB's BSON format uses little endian - that is just plain wrong in this day in age.
The reason I like using a Str for the flag is that it makes it easy to dump. There is no difference b/w using an Int vs a Str for a flag other than readability. Using two different fields like littleEndian and bigEndian seems a un-DRY; I don't like two fields to which basically do one thing.
liamstask
18 Dec 2009
Yeah - I couldn't quite figure that out. Maybe it's an attempt at a slight optimization, thinking that most platforms these days are little endian, and not having to actually swap the bytes might save a bit of work? I also noticed Protocol Buffers seem to be little endian. Interesting.
Str works fine for me.
brian
18 Dec 2009
Ticket resolved in 1.0.48
I decided the cleanest solution was to just create a sys::Endian enum. New fields:
InStream.endian OutStream.endian Buf.endian
The binary I/O methods will now use the configured endian.
liamstask
21 Dec 2009
Cool - thanks again for this.
One issue I found was that TcpSocket seems to use its own In/OutStreams internally, meaning they typically won't function correctly when they're set to little endian. To test, set a TcpSocket's streams to little endian before connecting and then try to connect - I always get an IoErr.
Unfortunately, this makes it difficult to make use of the nice endianness controls for the user data flowing through the streams. I can work around this a bit in my current use case, but this seems problematic. Any thoughts?
brian
21 Dec 2009
You can't get the IO streams until after you connect. But once you are connected you should be able to configure charset/endian. Here is a quick script that shows endian working:
Actor(ActorPool()) |->|
{
s := TcpListener().bind(null, 12345).accept
echo(s.in.readU4.toHex)
s.in.endian = Endian.little
echo(s.in.readU4.toHex)
Sys.exit(0)
}.send("start")
c := TcpSocket().connect(IpAddress("localhost"), 12345)
c.out.endian = Endian.little
c.out.writeI4(0xaabbccdd).writeI4(0xaabbccdd).flush
Actor.sleep(10sec)
liamstask
21 Dec 2009
Hm - I'm having no problem reading little endian data from the InStream configured as you've shown above, but I'm seeing some bogus length values on the other end of a connection with a TcpSocket configured with a little endian OutStream. When I first write the data to a little endian Buf and then send it through via writeBuf, it works as expected. Maybe there's something specific to Fantom sockets here?
brian
21 Dec 2009
Pretty much everything including the inet code boils down to a single OutStream and InStream class in the fan.sys package. So unless something is messed up reading individual bytes/chars then everything should work exactly the same b/w IO streams.
liamstask
21 Dec 2009
Sure enough - was an error on my side. Sorry for the extra noise.
liamstask
16 Dec 2009
It doesn't seem like there are currently any options for specifying a little endian stream. Not having to do a byte swap on each
writeI4()etc would be pretty handy :) Please let me know if I've missed a better way to do this.