#2156 Invalid UTF-8 encoding

Yuri Strot Wed 19 Jun 2013

I have found some interesting character which can't be parsed by Fantom streams.

There is a simple Java program:

import java.io.IOException;
import java.util.Arrays;

public class JP {

	private static final String MAGIC = <magic>;

	public static void main(String[] args) throws IOException {
		System.out.println(MAGIC);
		for (int i = 0; i < MAGIC.length(); i++) {
			System.out.println(i + ": " + MAGIC.codePointAt(i));
		}
		System.out.println(Arrays.toString(MAGIC.getBytes()));
		System.out.println(Arrays.toString(MAGIC.getBytes("UTF-8")));
	}

}

Which works correctly and prints this:

<magic>
0: 127744
1: 57088
[-16, -97, -116, -128]
[-16, -97, -116, -128]

Now if I try to use this character in Fantom I will get this:

bld:~ ystrot$ fansh
Fantom Shell v1.0.64 ('?' for help)
fansh> echo("<magic>")
sys::IOErr: Invalid UTF-8 encoding
  fan.sys.Charset$Utf8Decoder.decode (Charset.java:142)
  fan.sys.InStream.rChar (InStream.java:78)
  fan.sys.InStream.readLine (InStream.java:436)
  fan.sys.InStream.readLine (InStream.java:399)
  fansh::Shell.run (Shell.fan:33)
  fansh::Main.main (Shell.fan:225)
  java.lang.reflect.Method.invoke (Method.java:601)
  fan.sys.Method.invoke (Method.java:559)
  fan.sys.Method$MethodFunc.callList (Method.java:198)
  fan.sys.Method.callList (Method.java:138)
  fanx.tools.Fan.callMain (Fan.java:173)
  fanx.tools.Fan.executeType (Fan.java:140)
  fanx.tools.Fan.execute (Fan.java:41)
  fanx.tools.Fan.run (Fan.java:298)
  fanx.tools.Fan.main (Fan.java:336)

The same problem with this symbol in a file.

P.S. Obviously I couldn't use this character in the post, so I replaced it with a <magic> :-)

KevinKelley Wed 19 Jun 2013

That 4-byte sequence [-16, -97, -116, -128] is uF0978C80 which looks like (from the utf8 wikipedia) 21 bits of data; UniSearcher calls it "cyclone" in miscellaneous symbols and pictographs; I guess it's outside of the basic plane anyway.

I guess that's why Java's treating it as 2 codepoints.

Quick look in Fantom source, src/sys/java/fan/sys/InStream.java at the readUtf method, appears only to recognize up to 3-byte encodings, and reports that error for anything else.

Login or Signup to reply.