#438 DSL Proposal

brian Tue 20 Jan 2009

This is a proposal based on the discussion which resulted in a desire to embed arbitrary DSLs within Fan code.

DSL Blocks

DSLs are embedded using the syntax AnchorType <|...|>. We use the tokens <| and |> in a manner similar to the CDATA section of XML. These tokens unambiguously represent a block of code which may contain any character with the exception of |>. This rule allows tools such as an IDE to skip these sections safely.

DSL blocks are treated as an expression with regard to the existing Fan grammar. This means that an DSL block can be used anywhere where an expression is expected. The type of the expression is determined at compile time by the DSL plugin (discussed later).

Anchor Types

Every DSL block must somehow be identified such that we know how to interpret the text between the <|...|>. There are several alternatives, but the one which seems simplest to understand is that the identifier is a type name, which we will call the anchor type.

Using a type name to identify a DSL block has several advantages:

  1. provides a clean, existing mechanism to import dependencies on the type's pod
  2. provides a clean, existing set of mechanisms to import into a source file's namespace via the using statement
  3. resolve type name conflicts via the using as statement
  4. provides an obvious mechanism for both humans and tools to map back to documentation for understanding the DSL

DSL Plugins

In order to compile the DSL, we map the anchor type to the type name of a compiler::DslPlugin subclass. This mapping is done via a facet on the plugin type:

@dslPluginOn=Regex#
class RegexDslPlugin : DslPlugin { ... }

@dslPluginOn=ruby::Ruby#
class RubyDslPlugin : DslPlugin { ... }

The type database is used to lazily resolve the plugin for a given anchor type. If zero or more than one plugin types are installed for the anchor type it is a compiler error (similar to how FFI bridge works).

By convention we name the plugin {Anchor}DslPlugin. The plugin type is responsible for compiling the DSL into an Fan expression.

It is important to note that by design we decouple:

  1. anchor type
  2. plugin type
  3. expression type of the DSL block itself

In many cases such as Regex, the anchor type and the resulting type of the DSL expression is the same:

Regex <|ab*|>   =>  results expr typed as Regex

But there may be cases where the DSL resolves to an expression different from the anchor type:

x := Java <| Runtime.getRuntime() |>  => expr typed as Runtime
obj := Ruby <| File.new("file.txt", "r") |>  => some JRuby interface?

Note the DSL block might evaluate to Void, in which case it could never be used on the RHS of an assignment or passed as an argument.

So the anchor type is really just an identifier to tell humans and tools how to interpret the DSL block. The compiler itself uses the anchor type to find the appropriate compiler plugin to use.

The anchor type and plugin type may exist in the same pod, however a DSL block may never be used in the same pod which declares that DSL plugin.

Compiler Pipeline

I'm still not quite sure what the DslPlugin API looks like yet. I think at the basic level it will input the plain text between the <| |> and will output a compiler::Expr. The plugin may also generate other AST nodes such as helper types and slots.

Because the DslPlugin must resolve the DSL block into a typed expression it probably will have to follow the normal compiler pipeline. This means that it will probably won't have full access to the resolved AST, but will have access to its enclosing lexical scope similar to other expressions in the ResolveExpr step.

This one I'll just have to dig into and see how it turns out.

andy Tue 20 Jan 2009

All sounds good to me.

JohnDG Wed 21 Jan 2009

So the anchor type is really just an identifier to tell humans and tools how to interpret the DSL block. The compiler itself uses the anchor type to find the appropriate compiler plugin to use.

This does imply only one plug-in will be able to respond to the DSL identifier. I think that's probably an OK limitation -- although an approach involving using might yield more flexibility, it's not clear that's needed, and it would be more work using the DSL in each class in which it was referred to.

The plugin may also generate other AST nodes such as helper types and slots.

This is nice. It means, for example, that a Parser plug-in could generate objects for the AST, look up tables, and lots of other stuff -- which is not possible if the only modification permitted by the plug-in is to return a single Expr/source snippet.

This means that it will probably won't have full access to the resolved AST,

I know this is much simpler to implement, but it makes writing plug-ins much more work. I would prefer a two-step pass, where in the first step, an Obj hole is substituted for the DSL block, and everything is typed and resolved. Then the plug-ins are invoked. In the second step, the Obj holes are stuffed with the return value of the plug-in, and everything is again resolved and typed (including any new stuff inserted by the plug-ins).

jodastephen Wed 21 Jan 2009

This is already sounding like a killer feature ;-)

This means that it will probably won't have full access to the resolved AST

I think that working through the feature will figure out what is possible.

As another requirement, I'd like to be able to evaluate and call a Fan snippet (expression) from within the DSL snippet.

alexlamsl Wed 21 Jan 2009

One question - how do we pass objects into these <|...|> blocks?

brian Thu 22 Jan 2009

would prefer a two-step pass, where in the first step, an Obj hole is substituted for the DSL block, and everything is typed and resolved.

I don't think it would ever really matter, I'm thinking DSL resolution happens during the ResolveExpr step. At that point you would have your enclosing type's slots and lexical scope fully resolved. The only thing you wouldn't have resolved is parts of the AST outside your lexical space - and I can't imagine you'd ever need that.

One question - how do we pass objects into these <|...|> blocks?

DSLs have access to their lexical enclosures, so they could work just like closures to close over local variables or slots on the enclosing type:

name := "Brian"
Sql <|select * from Users where name=${name}|>.execute.each |Row r| {...}

As another requirement, I'd like to be able to evaluate and call a Fan snippet (expression) from within the DSL snippet.

Exactly. As the example above suggests, you would often want to pass something like the expression inside the ${...} back to Fan to parse and resolve.

f00biebletch Thu 22 Jan 2009

Kudos to JohnDG, Brian, etc, this is indeed slick, and, to my original question about list comprehensions, I think it works pretty well all things considered, although part of me would like to see list comps as part of the core syntax. But DSLs clearly shine for biggies like regex and sql.

JohnDG also mentioned pattern matching via the DSL plugin, specifically wrt matching incoming messages. I would still like to see pattern matching as part of the core language, so I can do things like the canonical

factorial(0) -> 1;
factorial(1) -> 1;
factorial(N) -> N*factorial(N-1).

I realize this is probably some work to implement and that it is nothing more than an inverted case/if.

JohnDG Thu 22 Jan 2009

JohnDG also mentioned pattern matching via the DSL plugin, specifically wrt matching incoming messages. I would still like to see pattern matching as part of the core language, so I can do things like the canonical

Once we have plug-ins, we can isolate experiments to the plug-in sandbox. If some experiment proves a resounding success, perhaps we can graduate it to core.

Pattern matching is not so simple to add onto an existing language. It may be there is no Fan-esque way to integrate general functional pattern matching into core.

Yet for particular subdomains, such as messages with headers, a pattern-matching DSL can make perfect sense.

For your example, you can imagine a plug-in that would translate,

factorial = FunctionPattern <|
    factorial(0) -> 1;
    factorial(1) -> 1;
    factorial(N) -> N*factorial(N-1).
|>

into

Int factorial(Int N) {
    switch(N) {
        case(0): return 1
        case(1): return 1
        default: return N * factorial(N-1)
    }
}

But generalizing such a plug-in to handle arbitrary object patterns probably wouldn't be easy (nor all that efficient, at least in a naive implementation). Nor am I sure if bolting on distinctly functional features onto a predominantly OOP language like Fan is a good idea.

That said, how about you volunteer to write a FunctionalPattern plug-in? :-)

helium Thu 22 Jan 2009

I don't see much use for pattern matching in Fan unless you add something like F#'s active patterns or Scala's extractors at the same time.

tompalmer Fri 23 Jan 2009

I don't see much use for pattern matching in Fan unless you add something like F#'s active patterns or Scala's extractors at the same time.

Note that haXe has very interesting use of everyday-seeming enum types and switch blocks to support this feature. Not sure if Fan could pull off a similar stunt. In any case, I don't see it as a direct relation to the embedded language feature described here.

For the embedded-language feature here (not sure that "DSL" is the best term), I'm not sure I like it, but I can definitely see the use cases.

helium Fri 23 Jan 2009

Enums in haXe are full algebraic datatypes. Of course pattern matching makes sense in combination with those. You could addADTs to Fan, but I don't think that's the way to go.

pdoubleya Sat 24 Jan 2009

I like the sound of this feature, but I'm wondering how you will be able to ensure that the text in the DSL will continue to be parsable and compilable in the future. If the Fan dev team maintains the DSL plugins, then I guess it's settled, but if you have dependencies on external compilers (e.g. Ruby), my code may stop compiling if the DSL plugin or Ruby parser/compiler is updated at some time in the future. I'm not sure what the correct mechanism would be, but I'd think that if I'd got some embedded DSL code working at some point, I'd like to be able to "freeze" the version of the DSL plugin and related parser/compiler I'm using to avoid it breaking in the future.

JohnDG Sat 24 Jan 2009

I like the sound of this feature, but I'm wondering how you will be able to ensure that the text in the DSL will continue to be parsable and compilable in the future.

If you use a compiler plug-in built into Fan, then it may undergo change as per the evolution of Fan. On the other hand, if you use a third-party plug-in, you just bundle it into your application and it will never change -- unless you specifically upgrade it.

In the case of a Ruby plug-in, it would likely ship with some version of JRuby, which it would use to do the evaluation.

Plug-ins will be pretty much like libraries, in the sense that they won't upgrade themselves. You'll have to do something if you want a newer version.

brian Sun 25 Jan 2009

I like the sound of this feature, but I'm wondering how you will be able to ensure that the text in the DSL will continue to be parsable and compilable in the future.

I think you can draw a parallel b/w DSLs and third party libraries. The only difference is compile time versus a runtime time. If a third party DSL is broken then you can't compile. If a third party library is broken then you can't run. Although in reality I expect the plugin compiler and runtime will always go hand in hand. In the end I think it is all a module management issue.

brian Fri 8 May 2009

Promoted to ticket #438 and assigned to brian

tactics Mon 11 May 2009

Any thoughts on the design of the API? How much access will these plugins have to the guts of the compiler?

brian Mon 11 May 2009

Any thoughts on the design of the API? How much access will these plugins have to the guts of the compiler?

They will have full access to the compiler's AST - I am not sure how else you could really do all the things people will need to do.

Although I don't think I am going to commit to the compiler's 1.0 API for all time - it has too much surface area to lock down.

brian Fri 15 May 2009

Ticket resolved in 1.0.43

The first version of DSLs are ready to go!

If you are going to work on a DSL, you might want to talk to me first. What I have right now is a very basic interface with a hook during the ResolveExpr step of the pipeline. This works well if you are generating a resolved AST tree. It won't work well if you need to generate an unresolved AST which needs to get re-run thru the pipeline.

JohnDG Sat 16 May 2009

Awesome! When I get a chance, I want to do a grammar DSL, which can be used to easily build other DSLs. Something like:

parser := Parser <|
    Value   <- [0-9.]+ | '(' Expr ')'
    Product <- Expr (('*' | '/') Expr)*
    Sum     <- Expr (('+' | '-') Expr)*
    Expr    <- Product | Sum | Value

    Start   <- Expr
|>

tree := parser.parse("2*3+1")

...

Login or Signup to reply.