using util
class TestCsv : Test{
Void test(){
verifyEq(CsvInStream("Русское слово".in).readAllRows[0][0],"Русское слово")
}
}
TEST FAILED
sys::TestErr: Test failed: "CAA:>5 A;>2>" [sys::Str] != "Русское слово" [sys::Str]
katoxWed 24 Nov 2010
What kind of system do you use? Are you sure the encoding is UTF-8?
AkcelistoWed 24 Nov 2010
XP, encoding of TestCsv.fan is UTF-8.
ivanWed 24 Nov 2010
yeah, I can see this issue too, looks like something wrong with CsvInStream:
fansh> using util
Add using: using util
fansh> str := "привет"
привет
fansh> str.in.readLine
привет
fansh> CsvInStream(str.in).readAllRows[0][0]
?@825B
fansh>
Tomorrow I'll try to debug it thoroughly
ivanThu 25 Nov 2010
Huh, the problem is in Str.in. Because using CsvInStream on top of File.in produces correct result. The real issue (and AFAIR I've already seen that either on forum or somewhere in docs) is that when StrInStream (which is java impl of InStream created by Str.in) reads a single byte, it consumes the whole char from Str (which is 2 bytes in your case) and returns it as a single byte (by masking it with 0xFF). So when CsvInStream reads bytes from underlying stream, it can't get the whole picture and therefore produces wrong results.
ivanThu 25 Nov 2010
There's fairly easy workaround if you want to use CsvInStream over Str - use str.toBuf.in instead of str.in:
I thought this was all pretty well covered in the test suite, but guess not. Looks like the real problem is Str.in is not supporting Unicode correctly. Which is strange because I am using Unicode strings all over the place in the SkySpark test suite.
brianFri 26 Nov 2010
Renamed from **[bug] CsvInStream dont read properly from Str with russian letters** to **Str.in not working properly with Unicode**
katoxFri 26 Nov 2010
As @ivan noted in IRC the problem actually lies in Str.in inability to supply bytes correctly. readChar function is OK but if it goes through a decoder then read method is used - and this one truncates bytes in stream using & 0xff bitmask for Str.in.
brianMon 3 Jan 2011
Ticket resolved in 1.0.57
I made two changes:
fixed Str.in to correctly work when wrapped by another InStream
changed Str.in to disallow binary reads
The second change is a breaking change, but I think it much safer behavior which is consistent with how StrBuf works when attempting binary writes.
If you are using Str.in to read binary data, then the fix is to convert into a binary buffer first using the UTF-8 enconding:
Akcelisto Wed 24 Nov 2010
katox Wed 24 Nov 2010
What kind of system do you use? Are you sure the encoding is UTF-8?
Akcelisto Wed 24 Nov 2010
XP, encoding of TestCsv.fan is UTF-8.
ivan Wed 24 Nov 2010
yeah, I can see this issue too, looks like something wrong with CsvInStream:
Tomorrow I'll try to debug it thoroughly
ivan Thu 25 Nov 2010
Huh, the problem is in
Str.in
. Because usingCsvInStream
on top ofFile.in
produces correct result. The real issue (and AFAIR I've already seen that either on forum or somewhere in docs) is that when StrInStream (which is java impl of InStream created by Str.in) reads a single byte, it consumes the whole char from Str (which is 2 bytes in your case) and returns it as a single byte (by masking it with0xFF
). So when CsvInStream reads bytes from underlying stream, it can't get the whole picture and therefore produces wrong results.ivan Thu 25 Nov 2010
There's fairly easy workaround if you want to use CsvInStream over Str - use
str.toBuf.in
instead ofstr.in
:Akcelisto Thu 25 Nov 2010
Thanks. You helped me.
brian Fri 26 Nov 2010
Promoted to ticket #1328 and assigned to brian
I thought this was all pretty well covered in the test suite, but guess not. Looks like the real problem is Str.in is not supporting Unicode correctly. Which is strange because I am using Unicode strings all over the place in the SkySpark test suite.
brian Fri 26 Nov 2010
Renamed from **[bug] CsvInStream dont read properly from Str with russian letters** to **Str.in not working properly with Unicode**
katox Fri 26 Nov 2010
As @ivan noted in IRC the problem actually lies in
Str.in
inability to supply bytes correctly.readChar
function is OK but if it goes through a decoder thenread
method is used - and this one truncates bytes in stream using& 0xff
bitmask forStr.in
.brian Mon 3 Jan 2011
Ticket resolved in 1.0.57
I made two changes:
The second change is a breaking change, but I think it much safer behavior which is consistent with how StrBuf works when attempting binary writes.
If you are using Str.in to read binary data, then the fix is to convert into a binary buffer first using the UTF-8 enconding: