All Topics

#1074 FFI pod syntax

brian Mon 12 Apr 2010

The last big breaking change we have planned for Fantom is the syntax for pod names.

There is an open ticket #934 with some proposals and a long discussion of using URIs. Original proposal was #720.

Since that time, tactics raised some excellent ideas in #1014 which argues for avoiding putting location information into a pod name, but rather should delegate to the Env. I think this principle is exactly what we should strive for.

However, using a simple identifier to name a Fantom pod is not flexible enough. At the very least we need a syntax to identify FFI types. This syntax should clearly delineate the FFI "platform" such as Java or .NET. It is also desirable to have a standard escape mechanism for encoding arbitrary chars which might be used in conjunction with the Env loader.

The current syntax of [ffi]foo.bar was an extremely poor choice since it has a lot of ambiguity with [key:val] map syntax with an complicated look ahead.

There is a lot of pending tickets which hinge on formalizing our pod name syntax, so I'd like to get this nailed down. Please share any ideas.

tactics Mon 12 Apr 2010

Do the dots raise the difficulty in parsing much?

To throw a couple options out there for the FFI prefix:

java^javax.swing::JFrame
java|javax.swing::JFrame
java~javax.swing::JFrame

By escaping, do you mean being able to include FFI types which include characters which are illegal in Fantom's identifier syntax? (Such as a Java Main$Foo type?)

tcolar Mon 12 Apr 2010

Of those 3 I like java~javax.swing::JFrame the best.

Can't the compiler take care of the escaping, seems like a pain to have to escape the import.

If necessary i think the good old backslash makes sense.

brian Mon 12 Apr 2010

Just to set (or reset) the discussion. There are a couple places where a pod name may appear.

In source code pod names may appear in two places:

in a using statement
in a fully qualified type such as pod::Type

In a using statement we have a lot of flexibility. However in order to allow a fully qualified type name where the pod name isn't a simple identifier things are a lot tougher (in fact I don't even allow this today for Java FFI). Ideally the syntax could be expressed as a token which would make things like token::Type easy to parse.

The other place where we end up parsing this stuff in reflection and fcode. Both the runtime and compiler have to parse qnames of pods, types, and slots. In general the only real issue here is that a pod name can't have ::. Although ideally it would be best to avoid ambiguity with Type.slot.

Because of the qnames in source code, the ideal format would be a "wrapper" more akin to Uri or String literals that would be easy to tokenize and parse. Having a special first char would also make other parsing cases easy.

tcolar Mon 12 Apr 2010

Can't it just be simplified to say that ffi(non fantom)types always have to be imported (ie: not used as qnames in the code), since import has the as option that can be used to cover the odd "duplicated name" case.

I don't know how often would anybody used a non imported FFI qname in the code, but thats seem like a very unusual need (fantom source code doesn't have any), not really worth the extra parsing complexity.

brian Mon 12 Apr 2010

Can't it just be simplified to say that ffi(non fantom)types always have to be imported (ie: not used as qnames in the code)

Well that is what I have today and its always seemed sort of lame to me. But I don't have a strong opinion. If no one has a problem with that restriction it would certainly simplify life tremendously.

tcolar Mon 12 Apr 2010

Well lame is a strong opinion :)

Anyway let's see if anybody else thinks it's useful, personally most of the Fantom code I've done so far as involved Java FFI's and it's been a non issue.

Maybe it would be more useful for some other future FFI support (and maybe JScript)

katox Tue 13 Apr 2010

Brian mentioned that using statements control the namespace of a compilation unit (s single source file) which are defined (for assembly) in the pod definition. In this regard there is a single pod-wide implicit namespace containing all available types (with possibly conflicting names). There should be no need to specify anything unless one needs to resolve a conflict or shorten a name - using FQN would be enough to identify a type. However this requires FQNs to be valid type identifiers in Fantom code for all FFI types.

Having to use using statements in order to be able to actually write some code with FFI types feels wrong. I'd prefer if Fantom could do the neccessary name mangling for me.

I'd also prefer if I could alter the global namespace in the pod definition. Every type conflict would be resolved there (conflicts of types which are actuallly used of course) and using statements would be totally optional in pod source files.

For instance, I could say that [java]javax.swing::JFrame (or whatever alternate syntax) maps to jswing::JFrame in the pod definition and then use it as such anywhere in the pod. Or I could alter the namespace more at source file level by saying that jswing::JFrame would have also an alias JFrame.

andy Tue 13 Apr 2010

Having to use using statements in order to be able to actually write some code with FFI types feels wrong. I'd prefer if Fantom could do the neccessary name mangling for me.

I'm on the fence with requiring using at both pod and source level for Fantom pods - but I don't think that is viable for FFI code. Java's package-mess would make that pretty painful to use. Either way, I think that is a separate discussion, since as you noted it will always be at least optional.

Can't it just be simplified to say that ffi(non fantom)types always have to be imported (ie: not used as qnames in the code), since import has the as option that can be used to cover the odd "duplicated name" case.

I always come back to this as the most sane approach here. Every time we go down another road it gets really ugly really fast. I would also consider (re katox) overloading the as keyword to map the package name to a pseudo-pod name:

using [java] javax.swing as jswing
jswing::JFrame

Though not sure how valuable that is in practice (over normal as for type names).

katox Tue 13 Apr 2010

requiring using at both pod and source level for Fantom pods

Actually both statement blocks could be optional. If you don't wish to alter the default namespace you don't have to. The only difference of altering namespaces would be scoping (for the whole pod or for a single source file).

Basically you can think of it as a big map of namespace names (what types are available is determined by assembly pod dependencies) and using statements would only create aliases for (a) pods and for objects, <podId>::<objectId>, (b) or simply Str -> Str map.

The bright side of this approach would be the same behaviour and syntax for native and FFI classes (even for introspection stuff). Done properly it would also allow to replace a native (java or whatever) implementation with Fantom code later with no changes in the code that is using it.

jodastephen Tue 13 Apr 2010

I would like to see a single syntax usable in all parts of the source file. How about a modification to the qname syntax:

sys::Str   // as is

ffi::java::javax.swing::JFrame

As I see it, the parser should be able to parse this OK based on the single common prefix.

An alternative is:

java::javax.swing.JFrame

Where java would be a real Pod that could be inspected at compile time. The compiler would obtain meta-data that describes that the Pod is actually FFI and what the syntax is for parsing the type name.

KevinKelley Tue 13 Apr 2010

@katox, interesting point: that since a pod that uses Java (for example) has to load the packages, resolve the names to objects, that ought to be a pod-level mechanism, defined in the build somehow, and within the pod, Fantom source code would only refer to the resolved names.

Some kind of random thoughts:

It goes deeper than syntax; we haven't worked out the semantics of modules, yet. Is a name referring to the same type if it's coming back from a browser tomorrow, and meanwhile I've restarted the server? We've got some neat possibilities with the interop between java, fantom, and js, but without a solid understanding of what it means for an application to be spread across machines and time...

I've got lots of questions, I've been all over the map lately thinking about execution models, from on-the-fly recompilation to distributed webapps. Lots of questions, not so much with the answers.

Seems like we need to try hard to keep the Fantom model simple and clean... the more complicated and special-cased the model gets, the harder it is to do things with it. A Fantom strength is it's clean yet powerful object model.

Names carry meaning. That's just as true for FFI as for philosophy. When we name a thing that's ours, the name's a shorthand for what we already know: "Bert" instead of "third son of my second wife, who is dirty and twelve". When we name something from outside, we need a way to bring along the implied semantics of the thing we're naming.

The more things you can name, the more complicated your world gets. Interfaces are useful, but they don't encode semantics: an interface says how you can poke it, not what happens when you do. Eclipse, for an example, is built of nothing but pluggable interfaces, with the plug points specified in various xml, and you can download and run a whole bunch of different configurations of the eclipse platform, various combinations of the universe of components for various purposes, and that's great. But, they're still limited by the lack of what OSGI's trying to address: runtime plugging. If you write a Hello World eclipse plugin, to run it you start a separate, fresh instance of Eclipse.

Servlet containers do it with a fixed, static interface and secured, sandboxed classloading; there's a stable, long-running server that can be given a jar and will load, start, and stop it on command.

I'm looking at LLVM lately; thinking about a Fantom-style object model running on JIT'd bitcode, with a C FFI. Then the runtime would manage the loading/unloading of pods; on load you'd build a Pod/Type/Slot map of the pod's namespace, and on demand you'd translate slot fcode to bitcode to JIT, with some monitoring to control whether a method gets interpretted or should be compiled, when to run the optimizer passes and so on. There's apparently an HLVM spinoff project, all I know about that so far is that the idea is to keep a higher-level (thus the name) representation of the AST around, supposedly this greatly increases the possibilities for optimization.

Anyway, back to the point, if there is one.

I think, regarding external names (FFI include syntax) -- that since they have no meaning until what they represent is "imported" (marshal method arguments; coerce primitives, whatever); and, since we may at some point want to import arbitrary sorts of names... there's no benefit to trying to set up a fancy foreign-identifier syntax right now. External names should be imported in one place, and probably from some form of string name to a Fantom alias. Currently that's a using statement; but I don't know if I like it there. It might make more sense to build it into the pod-def like katox said; or maybe we should have a generic Foreign type, which could be constructed from a string and loaded by the runtime...

Adding a using [java]... is an automatic fail for portability, but that's fine: we should be able to do non-portable things when we want to. The mechanism for it ought to be general, though; and I think that means the external names can be pretty much arbitrary. I

tcolar Tue 13 Apr 2010

I feel strongly about import statements being in the source file .. not for the compiler, but for the human who reads the code.

It's a huge visual help at what's used / necessary to run that code, so I really don't like the idea to bring it out of it.

Being able to "map" the import outside of the source file might be flexible but to me that also about the door for all kind of confusion (not unlike java classpath stuff).

As far as having two types with the same name you have to use in the same source file ... i work with huge java base with multiple duplicated type names, but it's extremely rare you need to use 2 of those in the same source file.

And when that happens, using the fully qualified name like: my.package.Vector myVector = new my.package.Vector() is ugly, really would like to have the as feature in java and then have MyVector vector = new MyVector()

As far as the import syntax goes, I like what scala did. import java.util.{Collection => JavaCollection, Vector}

katox Tue 13 Apr 2010

It's a huge visual help at what's used / necessary to run that code, so I really don't like the idea to bring it out of it.

Looking at using statements may give you a (vague) idea what is needed, true. But you can't rely on it. There can be FQN stuff, introspection stuff and other generally "source-code-ungrepable" dependencies. If you are thinking in terms what packages/pods are needed to actually run the code you have to go to the pod definition anyway.

@Kevin Hotpluging is tough. I find some ideas about that and module dependencies introduced in Newspeak rather interesting...

alex_panchenko Wed 14 Apr 2010

We import names from the language which is different from Fantom. Keywords are different, punctuation is different too. It means that the safest way is to specify imported names as strings without any syntax sugar. Any special syntax to distinguish packages/class import should be delegated to the target language, it shouldn't be part of Fantom.

If import statements are specified using almost arbitrary strings, it looks naturally if foreign types are allowed only in import statements, also this approach simplifies reading the code.

Talking about the proposed syntax I suggest something like this:

using @java "java.util.*"
using @java "java.util.List" as JList

Serialization to the pod files should also use just strings - language and class name, so #965 could be resolved too.

PS. Today we've tried to import java packages containing internal, which is a reserved word in Fantom. That's why I suggest strings. Strings can be used to specify any import without having to patch the compiler.

brian Wed 14 Apr 2010

I think using multiple levels of :: might actually be a really good idea. That actually looks like the cleanest syntax proposal so far.

The idea of managing the namespace at the pod versus the source level is a good discussion. The major problem with moving everything into the pod level (build file) is that it doesn't work for scripts, and I suspect might be a littler uglier for IDEs. Also as Andy said, I'm not sure how that would work for Java packages.

But no matter how much indirection you might do at compile time, in the end the real question is what is the pod name for a Java FFI class:

javaType := Type.find("[java]java.lang.Class")
javaType.pod
javaType.name

tactics Wed 14 Apr 2010

It's probably worthwhile to keep pod names unique, even in the (unlikely) presence of an FFI pod with the same name. So in Brian's example:

javaType := Type.find("[java]java.lang.Class")
javaType.pod  // => "[java]java.lang"
javaType.name // => "Class"

I think that the discussion of removing the imports from source files probably deserves its own thread. It's an orthogonal issue (and one that will bring out the strong religious viewpoints of all it concerns ;)

andy Wed 14 Apr 2010

If you remove the ability to specify the FFI name as a valid token (outside of using) seems like we can just use the string value everywhere:

javaType := Type.find("[java]java.lang.Class")
javaType.pod  => "[java]java.lang"
javaType.name => "Class"

EDIT: (tactics beat me :) - but also consider this is not a valid pod name under normal cases. For example, I could not use that value in Pod.find. Is it ok to have these restrictions to keep everything simple? Not sure yet.

KevinKelley Wed 14 Apr 2010

But no matter how much indirection you might do at compile time, in the end the real question is what is the pod name for a Java FFI class:

How about if there's a pod named ffi that reflects the available system classes?

javaFfi := Pod.find("ffi")
javaClass := javaFfi.find("java.lang.Class") 
javaMethod := javaClass.slots.find |slot| {slot.name == "toString"}

That pushes the problem to, how to refer to a java type in fan source? Maybe:

using ffi::"java.lang.Object" as JavaObject

echo(JavaObject().toString)

There's problems no matter what way we do it; the mapping isn't direct between the runtime and Fantom; and the different runtimes will cause different problems.

tactics Wed 14 Apr 2010

How about if there's a pod named ffi that reflects the available system classes?

The tradeoff with this is you have to cherry pick every FFI class you want to use.

using ffi::"javax.swing.JFrame" as JFrame
using ffi::"javax.swing.BoxLayout" as BoxLayout
using ffi::"javax.swing.JDialog" as JDialog
using ffi::"javax.swing.JButton" as JButton
using ffi::"javax.swing.JLabel" as JLabel
// ... and many more ...

Versus:

using ffi::javax.swing // imports everything you might need in one line

qualidafial Wed 14 Apr 2010

Why did we abandon uri schemes for identifying ffi namespaces? I can't remember what the argument against that was.

using `java:javax.swing.JFrame` as JFrame

That's just beautiful to look at, and with some work it could even be enhanced with new schemes via compiler plugins.

brian Wed 14 Apr 2010

Why did we abandon uri schemes for identifying ffi namespaces?

As far as I'm concerned, I think using a simple Uri literal instead of a string identifier is still the best solution. Although the Uri is for the Pod, not the type qname:

it works in using pod stmt:
```
using `java:javax.swing`
```
it works in using pod::type stmt:
```
using `java:javax.swing`::JFrame
```
it works cleanly in grammar anytime a FQN is required:
```
`java:javax.swing`::JFrame
```

The issue with URIs is how reflection and stuff works:

Type.find("`java:javax.swing`::JFrame")  // this
Type.find("java:javax.swing::JFrame")    // or this?

Do the ticks show up in the reflection strings? That is where the URI model gets a little ugly.

qualidafial Thu 15 Apr 2010

I would say use Uri for Type.find too. Is the :: delimiter somehow incompatible with proper URI syntax?

brian Thu 15 Apr 2010

The problem with using Uris for everything was the rabbit hole we went down originally. Using :: URIs that also have a scheme: is technically ok, but really sort of shady. More importantly something simple like sys::Int.toStr isn't a good URI, since virtually everyone would parse it into a sys: scheme.

andy Thu 15 Apr 2010

As Brian noted, Uri has its own set of issues, and we have such a dead-simple namespace for types using strings, I lean heavily towards mapping scripts and FFI into that (simple string names), vs complicating everything else.

KevinKelley Thu 15 Apr 2010

Imported names doesn't really fit the URI abstraction; the pieces don't fit, the allowed characters don't match, the extraction methods aren't useful. Only thing that made it convenient was being able to wrap imported name in an unambiguous quoting form.

Another thing that bothers me is, limited or partial interop. I agree that the common cases need to be easy and convenient, but we need to also make sure that there's a way to do the unusual stuff, too. Using an inner class, for instance, or overloaded methods, to name a couple.

We can do the easy and obvious stuff, but I'm worried about the more unusual stuff that nobody's needed to try yet...

brian Thu 6 May 2010

I'm back from a month of travelling, and my top priority is wrapping up the planned breaking changes for Fantom. Besides the storage operator (from * to &), this issue is our biggest breaking change. Andy and I brainstormed on this issue for several hours this morning, and here is the plan...

Major issues:

we need a flexible syntax for pod names because as the top of the namespace it needs to encompass Fan pods, FFI Java packages, FFI .NET namespaces, files, etc.
using a leading [ for FFI is bad because it is difficult to parse type signatures since map types also begin with [
we need a syntax which clearly scopes a namespace such as Java FFI, .NET FFI, or maybe file system
we need to define the legal identifiers for those pod names which can be used for type signatures and qnames

Design proposal:

leave pod names as simple identifiers
leave all APIs as strictly Str based (no public API changes)
change FFI syntax from [java]javax.swing to java:javax.swing

formally define the grammar of a pod name as the following:

<podName> := <identifier> | <ffi>
<ffi>     := <identifier> ":" <ffiName>
<ffiName> := any string of Unicode chars except "::"

tactics Thu 6 May 2010

I like it.

tcolar Thu 6 May 2010

I guess I Like it, except that in my opinion : and :: are possibly even more difficult to deal with (parser) than [ mostly because of the simple maps such as Str::Int and the use of : in ternary expressions.

So if you want to allow qualified syntax anywhere, not just in using statements' it might get nasty.

Example java:String? kinda looks like a ternary and fantom:Str::Int might be a bit tricky too.

I mean it can be dealt with but not sure it's any better to have : rather than [ since both are used in maps.

brian Thu 6 May 2010

So if you want to allow qualified syntax anywhere, not just in using statements' it might get nasty.

Sorry forgot to mention that. My assumption is that fully qualified types only work with simple identifiers. Otherwise you have to use using as. I think everyone pretty much agreed that was an acceptable compromise to keep the language grammar simple. So the only place this comes into play is in the reflection APIs and in fcode.

For using statements I would suggest we do something similar to today for common cases with an easy escape mechanism:

<podSpec>  :=  <id> | <ffi> | <string literal>
<ffi>      :=  <id> ":" (<id> | ".")*

So these would be valid:

using java:javax.swing
using "java:javax.swing"
using "foo:some crazy pod#@!~&* name"

brian Fri 7 May 2010

Any last comments....

I'm starting on this change right now

brian Fri 7 May 2010

Actually now that I'm in the code I'm wondering if we should just leave the syntax alone. There are two issues that make using [java] nice:

I use [java]::int to denote primitives, and don't really like java:::int
It is sort of nice to just check that pod name begins with [ and know it is a FFI (although still have tricky issue with parsing type signatures since [ should be map or FFI pod - but that is solvable)

BTW, here is original discussion about the FFI syntax.

alex_panchenko Thu 13 May 2010

The actual syntax issue is usage of keywords in packages names.

The one we'd experienced here is internal, as I mentioned before.

So, the escaping syntax should be introduced, something like

using [java] "package.name.with.internal"

brian Thu 13 May 2010

Right we still need the escape mechanism to deal with any arbitrary pod name such as a Java package which uses Fantom keywords. I changed the grammar slightly to allow a string literal to be used as the pod name:

using "[java]package.name.with.internal"

Note the "[java]" part is actually part of the pod name and must be included inside the string literal and there can be no spaces in the pod.

alex_panchenko Sat 15 May 2010

If we keep [java] outside of the double quotes then the using syntax can be treated as independent of the internal ffi pod name representation, so, potentially, internal representation can be changed without any impact on the syntax.

brian Sun 16 May 2010

If we keep [java] outside of the double quotes then the using syntax can be treated as independent of the internal ffi pod name representation, so, potentially, internal representation can be changed without any impact on the syntax.

Not sure I follow that. In the end we are just defining a grammar for tokens which eventually become a string literal for the pod name. For example these are the same:

using [java] foo . bar
using "[java]foo.bar"

I was just trying to come up with something simple for the grammar that had the flexibility to allow new pod names without any restrictions.