It doesn't seem like there are currently any options for specifying a little endian stream. Not having to do a byte swap on each writeI4() etc would be pretty handy :) Please let me know if I've missed a better way to do this.
brianWed 16 Dec 2009
I have not added any conveniences for little endian just because I don't think java.io has any support for that. Although I must admit I've needed it in the past so I don't mind adding it.
There are a couple ways to support this functionality:
add a flag to InStream/OutStream which is implicitly used for I2, I4, I8, F4, F8
add Bool param to those methods
add new methods
The first option is nicest to use, but adds a field load and branch for every call which is non-trivial overhead for serious binary I/O. But you could make the argument that we already take that hit for text with the Charset overhead (which probably much more heavily used).
liamstaskWed 16 Dec 2009
From an API perspective, I think option 1 is nicest, although I could most definitely live with any of those options that proved most efficient.
Is it not possible/desirable to use java.nio for some reason? That seems to have support for ByteOrder.
tacticsWed 16 Dec 2009
I also ran into this problem last week.
I'd opt for adding new methods. I think the file format usually specifies what endianness you need.
We could also do both 1 and 3, where we have I2, I2BigEndian, and I3LittleEndian (but with better names ;-) Then, you use I2 when you need to be able to change the endianness on the fly or you want to let the system specify it.
alexlamslWed 16 Dec 2009
How about having a seperate class which does Little Endian? That way we can choose at construction time (which is handy in the case when endian-ness should be transparent to consumers) and it would not incur any runtime overhead for every I/O call.
brianThu 17 Dec 2009
@liamstask - just curious, what are you doing that requires little endian?
The problem with adding new methods is that its 7 methods to classes which are fairly big already. So I'm thinking the API will be most usable by adding an endian flag.
My proposal is 3 new fields on Buf, InStream, OutStream:
Or we could actually declare a full Endian enum - but I want to avoid polluting the sys namespace with an enum for such a niche API.
Buf mode will work like charset - it implicitly sets both in/out streams.
Then all the following methods will use the mode: I2/U2/S2, I4/U4/S4, I8/S8, F4, F8
How does that work for everybody?
ivanThu 17 Dec 2009
I dislike using strs as endian type. Probabaly there should be a flag like bigEndian and method littleEndian() { !bigEndian }? I doubt there may appear one more type of endianess :-)
tacticsThu 17 Dec 2009
just curious, what are you doing that requires little endian? I don't know about liamstask, but I was using a homemade readU4littleEndian to inspect .dex files created for the Dalvik VM/Android platform.
liamstaskThu 17 Dec 2009
I've been working on a pure Fan MongoDB driver. I've got most of the serialization working, decent scaffolding of the basic DB operations, and some OK test coverage, but I'm not actually talking to the DB yet since it (for some strange reason) speaks little endian on the wire.
I like the idea of a flag too as opposed to string values - nice and simple.
brianThu 17 Dec 2009
Promoted to ticket #873 and assigned to brian
brianThu 17 Dec 2009
I've been working on a pure Fan MongoDB driver
I find it utterly unbelievable that MongoDB's BSON format uses little endian - that is just plain wrong in this day in age.
The reason I like using a Str for the flag is that it makes it easy to dump. There is no difference b/w using an Int vs a Str for a flag other than readability. Using two different fields like littleEndian and bigEndian seems a un-DRY; I don't like two fields to which basically do one thing.
liamstaskFri 18 Dec 2009
Yeah - I couldn't quite figure that out. Maybe it's an attempt at a slight optimization, thinking that most platforms these days are little endian, and not having to actually swap the bytes might save a bit of work? I also noticed Protocol Buffers seem to be little endian. Interesting.
Str works fine for me.
brianFri 18 Dec 2009
Ticket resolved in 1.0.48
I decided the cleanest solution was to just create a sys::Endian enum. New fields:
InStream.endian
OutStream.endian
Buf.endian
The binary I/O methods will now use the configured endian.
liamstaskMon 21 Dec 2009
Cool - thanks again for this.
One issue I found was that TcpSocket seems to use its own In/OutStreams internally, meaning they typically won't function correctly when they're set to little endian. To test, set a TcpSocket's streams to little endian before connecting and then try to connect - I always get an IoErr.
Unfortunately, this makes it difficult to make use of the nice endianness controls for the user data flowing through the streams. I can work around this a bit in my current use case, but this seems problematic. Any thoughts?
brianMon 21 Dec 2009
You can't get the IO streams until after you connect. But once you are connected you should be able to configure charset/endian. Here is a quick script that shows endian working:
Hm - I'm having no problem reading little endian data from the InStream configured as you've shown above, but I'm seeing some bogus length values on the other end of a connection with a TcpSocket configured with a little endian OutStream. When I first write the data to a little endian Buf and then send it through via writeBuf, it works as expected. Maybe there's something specific to Fantom sockets here?
brianMon 21 Dec 2009
Pretty much everything including the inet code boils down to a single OutStream and InStream class in the fan.sys package. So unless something is messed up reading individual bytes/chars then everything should work exactly the same b/w IO streams.
liamstaskMon 21 Dec 2009
Sure enough - was an error on my side. Sorry for the extra noise.
liamstask Wed 16 Dec 2009
It doesn't seem like there are currently any options for specifying a little endian stream. Not having to do a byte swap on each
writeI4()
etc would be pretty handy :) Please let me know if I've missed a better way to do this.brian Wed 16 Dec 2009
I have not added any conveniences for little endian just because I don't think
java.io
has any support for that. Although I must admit I've needed it in the past so I don't mind adding it.There are a couple ways to support this functionality:
The first option is nicest to use, but adds a field load and branch for every call which is non-trivial overhead for serious binary I/O. But you could make the argument that we already take that hit for text with the Charset overhead (which probably much more heavily used).
liamstask Wed 16 Dec 2009
From an API perspective, I think option 1 is nicest, although I could most definitely live with any of those options that proved most efficient.
Is it not possible/desirable to use java.nio for some reason? That seems to have support for ByteOrder.
tactics Wed 16 Dec 2009
I also ran into this problem last week.
I'd opt for adding new methods. I think the file format usually specifies what endianness you need.
We could also do both 1 and 3, where we have
I2
,I2BigEndian
, andI3LittleEndian
(but with better names ;-) Then, you useI2
when you need to be able to change the endianness on the fly or you want to let the system specify it.alexlamsl Wed 16 Dec 2009
How about having a seperate class which does Little Endian? That way we can choose at construction time (which is handy in the case when endian-ness should be transparent to consumers) and it would not incur any runtime overhead for every I/O call.
brian Thu 17 Dec 2009
@liamstask - just curious, what are you doing that requires little endian?
The problem with adding new methods is that its 7 methods to classes which are fairly big already. So I'm thinking the API will be most usable by adding an endian flag.
My proposal is 3 new fields on Buf, InStream, OutStream:
Or we could actually declare a full Endian enum - but I want to avoid polluting the sys namespace with an enum for such a niche API.
Buf mode will work like charset - it implicitly sets both in/out streams.
Then all the following methods will use the mode: I2/U2/S2, I4/U4/S4, I8/S8, F4, F8
How does that work for everybody?
ivan Thu 17 Dec 2009
I dislike using strs as endian type. Probabaly there should be a flag like
bigEndian
and methodlittleEndian() { !bigEndian }
? I doubt there may appear one more type of endianess :-)tactics Thu 17 Dec 2009
liamstask Thu 17 Dec 2009
I've been working on a pure Fan MongoDB driver. I've got most of the serialization working, decent scaffolding of the basic DB operations, and some OK test coverage, but I'm not actually talking to the DB yet since it (for some strange reason) speaks little endian on the wire.
I like the idea of a flag too as opposed to string values - nice and simple.
brian Thu 17 Dec 2009
Promoted to ticket #873 and assigned to brian
brian Thu 17 Dec 2009
I find it utterly unbelievable that MongoDB's BSON format uses little endian - that is just plain wrong in this day in age.
The reason I like using a Str for the flag is that it makes it easy to dump. There is no difference b/w using an Int vs a Str for a flag other than readability. Using two different fields like
littleEndian
andbigEndian
seems a un-DRY; I don't like two fields to which basically do one thing.liamstask Fri 18 Dec 2009
Yeah - I couldn't quite figure that out. Maybe it's an attempt at a slight optimization, thinking that most platforms these days are little endian, and not having to actually swap the bytes might save a bit of work? I also noticed Protocol Buffers seem to be little endian. Interesting.
Str works fine for me.
brian Fri 18 Dec 2009
Ticket resolved in 1.0.48
I decided the cleanest solution was to just create a
sys::Endian
enum. New fields:The binary I/O methods will now use the configured endian.
liamstask Mon 21 Dec 2009
Cool - thanks again for this.
One issue I found was that TcpSocket seems to use its own In/OutStreams internally, meaning they typically won't function correctly when they're set to little endian. To test, set a TcpSocket's streams to little endian before connecting and then try to connect - I always get an IoErr.
Unfortunately, this makes it difficult to make use of the nice endianness controls for the user data flowing through the streams. I can work around this a bit in my current use case, but this seems problematic. Any thoughts?
brian Mon 21 Dec 2009
You can't get the IO streams until after you connect. But once you are connected you should be able to configure charset/endian. Here is a quick script that shows endian working:
liamstask Mon 21 Dec 2009
Hm - I'm having no problem reading little endian data from the InStream configured as you've shown above, but I'm seeing some bogus length values on the other end of a connection with a TcpSocket configured with a little endian OutStream. When I first write the data to a little endian Buf and then send it through via writeBuf, it works as expected. Maybe there's something specific to Fantom sockets here?
brian Mon 21 Dec 2009
Pretty much everything including the inet code boils down to a single OutStream and InStream class in the
fan.sys
package. So unless something is messed up reading individual bytes/chars then everything should work exactly the same b/w IO streams.liamstask Mon 21 Dec 2009
Sure enough - was an error on my side. Sorry for the extra noise.