P.S. Obviously I couldn't use this character in the post, so I replaced it with a <magic> :-)
KevinKelleyWed 19 Jun 2013
That 4-byte sequence [-16, -97, -116, -128] is uF0978C80 which looks like (from the utf8 wikipedia) 21 bits of data; UniSearcher calls it "cyclone" in miscellaneous symbols and pictographs; I guess it's outside of the basic plane anyway.
I guess that's why Java's treating it as 2 codepoints.
Quick look in Fantom source, src/sys/java/fan/sys/InStream.java at the readUtf method, appears only to recognize up to 3-byte encodings, and reports that error for anything else.
Yuri Strot Wed 19 Jun 2013
I have found some interesting character which can't be parsed by Fantom streams.
There is a simple Java program:
import java.io.IOException; import java.util.Arrays; public class JP { private static final String MAGIC = <magic>; public static void main(String[] args) throws IOException { System.out.println(MAGIC); for (int i = 0; i < MAGIC.length(); i++) { System.out.println(i + ": " + MAGIC.codePointAt(i)); } System.out.println(Arrays.toString(MAGIC.getBytes())); System.out.println(Arrays.toString(MAGIC.getBytes("UTF-8"))); } }Which works correctly and prints this:
Now if I try to use this character in Fantom I will get this:
bld:~ ystrot$ fansh Fantom Shell v1.0.64 ('?' for help) fansh> echo("<magic>") sys::IOErr: Invalid UTF-8 encoding fan.sys.Charset$Utf8Decoder.decode (Charset.java:142) fan.sys.InStream.rChar (InStream.java:78) fan.sys.InStream.readLine (InStream.java:436) fan.sys.InStream.readLine (InStream.java:399) fansh::Shell.run (Shell.fan:33) fansh::Main.main (Shell.fan:225) java.lang.reflect.Method.invoke (Method.java:601) fan.sys.Method.invoke (Method.java:559) fan.sys.Method$MethodFunc.callList (Method.java:198) fan.sys.Method.callList (Method.java:138) fanx.tools.Fan.callMain (Fan.java:173) fanx.tools.Fan.executeType (Fan.java:140) fanx.tools.Fan.execute (Fan.java:41) fanx.tools.Fan.run (Fan.java:298) fanx.tools.Fan.main (Fan.java:336)The same problem with this symbol in a file.
P.S. Obviously I couldn't use this character in the post, so I replaced it with a <magic> :-)
KevinKelley Wed 19 Jun 2013
That 4-byte sequence
[-16, -97, -116, -128]isuF0978C80which looks like (from the utf8 wikipedia) 21 bits of data; UniSearcher calls it "cyclone" inmiscellaneous symbols and pictographs; I guess it's outside of the basic plane anyway.I guess that's why Java's treating it as 2 codepoints.
Quick look in Fantom source, src/sys/java/fan/sys/InStream.java at the readUtf method, appears only to recognize up to 3-byte encodings, and reports that error for anything else.