All Topics

#497 Flux & FWT: 1 bugfix, 2 changes and a request or two

KevinKelley Sat 28 Mar 2009

Request first: I'm working on a folding editor for flux, now -- using compiler::Parser to build an AST, then using that to build a tree widget so as to fold/unfold source code. RichText widget is just fine for displaying source, and it comes with automatic native scrollbars, so I'd like to keep using it, but there's no scroll event available from it. To make my tree widget scroll along with the richtext widget, I need an onScroll event that tells me the new scrollPos -- meaning, the upper-left-most visible point.

Or, I could turn off scrolling in the richtext component, put it and my tree both into a Scrollpane, and manually scroll them both together. But looking at the docs for ScrollPane I get the impression it might not be working, so I thought I'd ask here first.

Okay, another request: In compiler::Parser, the only api available outside the package is the parse method, to parse a compilation unit. I think I'm going to want access to the the lower-level methods too, in particular the ones that create and return a node of the AST, like FieldDef, MethodDef, Block, etc. They're currently marked private. I'd like to be able to re-parse individual elements of the source on the fly, as characters are typed, instead of having to re-parse the entire compilation unit in the background.

Next thing is a little bugfix in Flux: actually flux/fan/Mark.fan. If you have fan in c:/fan/... and you open a file in another directory c:/fansrc/..., then the Console hotlinks are broken, because the Mark parsing gets the fan directory first when it should go to fansrc. There's already code to deal with similar situation in subdirectories by using rsort to look at longest names first, but I guess it got missed when looking at root dirs.

Anyway, here's the diff:

diff -r f963172cc5db src/flux/flux/fan/Mark.fan
--- a/src/flux/flux/fan/Mark.fan	Thu Mar 26 19:45:06 2009 -0400
+++ b/src/flux/flux/fan/Mark.fan	Sat Mar 28 11:25:18 2009 -0500
@@ -128,7 +128,8 @@
     // attempt to match one of the root indices
     Str? root := null
     Int? s := null
--    rootDirs.each |Str rootDir|
+    roots := rootDirs.sortr |Str a, Str b->Int| { return a.size <=> b.size }
+    roots.each |Str rootDir|
     {
       x := text.index(rootDir)
       if (x == null) return

{maybe a bug here on the list, too: that minus sign in the diff breaks the preformatted display of code, when it's the first character on the line. Doubling up the minus keeps it from trying to display as list item}

Last is a couple changes I'd like to have added to system classes;

compiler::Location I think should support comparison operators, so I implemented the following:

=== compiler/fan/Location.fan ===
@@ -75,6 +75,27 @@
     return s.toStr
   }

+  override Bool equals(Obj? o)
+  {
+    if (o isnot Location) return false
+    that := o as Location
+    return file == that.file && line == that.line && col == that.col
+  }
+
+  override Int compare(Obj o)
+  {
+    // throw exception to indicate can't compare with o
+    that := o as Location
+    if (file != that.file) return file <=> that.file
+    if (line != that.line) return line <=> that.line
+    return col <=> that.col
+  }
+
+  override Int hash()
+  {
+    return file?.hash + line?.hash + col?.hash
+  }
+
   Str? file
   Int? line
   Int? col

sys::Point I think should support the basic math operators, so I put in add and subtract from another point, and mult/div by a scalar, like these:

diff -r f963172cc5db src/fwt/fan/Geom.fan
--- a/src/fwt/fan/Geom.fan	Thu Mar 26 19:45:06 2009 -0400
+++ b/src/fwt/fan/Geom.fan	Sat Mar 28 11:40:15 2009 -0500
@@ -39,6 +39,22 @@
   ** Return 'x+tx, y+ty'
   Point translate(Point t) { return make(x+t.x, y+t.y) }

+  ** Return 'x+tx, y+ty'
+  Point plus(Point t) { return make(x+t.x, y+t.y) }
+
+  ** Return 'x-tx, y-ty'
+  Point minus(Point t) { return make(x-t.x, y-t.y) }
+
+  ** Return 'x*n, y*n'
+  Point mult(Num n) { return make(x*n, y*n) }
+
+  ** Return 'x/n, y/n'
+  Point div(Num n) { return make(x/n, y/n) }
+
+  ** Return '-x, -y'
+  Point negate() { return make(-x, -y) }
+
+
   ** Return hash of x and y.
   override Int hash() { return x.hash ^ (y.hash << 16) }

Still having fun...

brian Sat 28 Mar 2009

Hi Kevin,

Thanks for your feedback and patches...

To make my tree widget scroll along with the richtext widget, I need an onScroll event

We definitely need this. I did a brief investigation into the SWT and didn't see the obvious hooks I need. If anyone familiar with SWT can suggest the easiest way to capture scrolling events in Tree, Table, and StyledText that would help greatly (we need to add the event to all these widgets consistently). If I can get some help, I can knock this out in the next week, otherwise I probably won't dig into it until after all the language changes I am working on.

he only api available outside the package is the parse method, to parse a compilation unit. I think I'm going to want access to the the lower-level methods too

Have you tried marking them public in your local repo to see that just changing them public works ok? I suspect there might be some other issues. But if making them public works, I will push that change.

little bugfix in Flux: actually flux/fan/Mark.fan

Thanks for your patch - I pushed this change. Note I did the sort in rootDirs method, instead of doParse, so let me know if I overlooked something.

I think should support comparison operators, so I implemented the following

Good idea. I pushed this change along with a test for it

I think should support the basic math operators, so I put in add and subtract from another point

It seems a little confusing to implement math operations like that on Point - seems a little bit like operator overload abuse. Can you provide some use cases where you think that makes sense?

JohnDG Sat 28 Mar 2009

It seems a little confusing to implement math operations like that on Point - seems a little bit like operator overload abuse.

It makes perfect sense on a Vector class of which Point is a subclass (all points being trivially representable by vectors). But then lots of stuff should go there, like dot, cross, etc., and seeing stuff like that on Point might be confusing for the non-mathematically inclined.

tompalmer Sun 29 Mar 2009

Kevin, awesome working on Flux. By the way, if you manage to put reference hierarchies, implementor lists (go to definition), and basic refactorings (rename, etc.) in Flux, while keeping it fast, you'll be my hero.

qualidafial Sun 29 Mar 2009

If anyone familiar with SWT can suggest the easiest way to capture scrolling events in Tree, Table, and StyledText that would help greatly (we need to add the event to all these widgets consistently).

Have you tried widget.getVerticalBar().addSelectionListener()?

KevinKelley Sun 29 Mar 2009

re onScroll: I haven't looked into the java side yet, not sure how to fix this. Frankly I tend to get confused when I try to figure out how java's mess of APIs expects me to do something. :-) But if nobody else has a suggestion, I'll dig into it sometime in the next few days.

re operator overload on Point: I think it's a natural place to use it; technically there could be a separate Point and Vector, with vector doing the math and Point being the location, but in practice (especially in 3D graphics apps) there's usually one class that represents a point in space (screen space or world space, whatever) along with operations on it (translate into different coordinate system, relate it to another location with dot or cross product, etc). As for use cases, I've already used the Point.plus and .minus in a translateToParent method, it seems clear to me that to translate a point into widget coordinates, you subtract out the widget's position in its parent, and that seems best expressed with a -. Not a big deal, though.

My feeling about Overload Abuse, I guess, would be that the + and - overloads on Point make sense, because those operators are the ones used in the math books for those operations; but that overloading a * or ^ operator to do dot or cross products would be Abuse, since those operations aren't obviously and consistently associated with those operators.

re Flux: I've got three things I want to do to it, then rethink/get feedback for next directions.

Loaded Pods viewer/browser -- working now, with treeview of pod contents and tooltips showing doccomments and various info. Still tweaking for pretty.
Folding code editor -- in progress; using the Parser to get structure info naturally leads to things like instant feedback on syntax errors, etc, and also leads to some sort of incremental or background compile.
code completion in editor -- instead of a simple text lookup, should be able to use parse info and loaded pods to lookup legal tokens and offer autocomplete. I haven't started this yet, but am keeping it in mind as I go.

Tooltips in code editor to show definition of a token, and various other usability things, should fall out pretty easily from what we've got. Refactoring support is a big interest for me, too.

I'm in between jobs at the moment, so it's a good time to get into this. I'll be staying pretty busy with it at least for a while, hopefully get something useful done and out there for feedback.

brian Sun 29 Mar 2009

Have you tried widget.getVerticalBar().addSelectionListener()?

Duh - don't know how I missed that - although one is often overwhelmed by surface area of Java APIs (and SWT is actually really light weight for a Java API).

I pushed a change with scroll bar support. Now RichText, Table, and Tree all sport a hbar and vbar field you can use to access the scroll bar:

RichText
{
  vbar.onModify.add(&onScroll) 
}

Kevin - can't wait to see what you have cooking (especially since I use Flux as my primary editor)

KevinKelley Sun 29 Mar 2009

Excellent, perfect timing, I'm just where I either needed this or had to find a workaround. Thanks!

I'm staying pretty heads-down with this for now, I want to get done the things I know how to do. Soon as I can I'll be wanting to put something up for review and suggestions; I'm really liking the system (Fan+Flux), and I can see that it's got a lot of possibilities. Making it into a top-of-the-line development environment is seeming like it would be almost ridiculously easy. :-)

JohnDG Sun 29 Mar 2009

I'd like to be able to re-parse individual elements of the source on the fly, as characters are typed, instead of having to re-parse the entire compilation unit in the background.

You will not be able to do this with the existing Fan parser. What's needed is an incremental parser with substring parsing.

KevinKelley Sun 29 Mar 2009

What exactly do you mean when you say incremental parser? Obviously the token stream has to be adjusted. Current parser loads entire tokenstream into a private array and walks through it maintaining a curt (current token) and peek (next token). Parser is recursive descent, so there's methods like SlotDef slotDef(...) that creates and returns an AST node describing the slot; it presumes curt is the first token of a slot definition.

So, seems like a quick hack at this would be to make tokens public, so I can swap in a different tokenstream, and make the appropriate methods public, so I can call f.ex. slotDef from outside the package and get a SlotDef AST node for the new tokens.

Of course that's just a first quick hack to make sure it works. There's other state being maintained by Parser, that might affect the parse. I think ideally the parser would be stateless; the various parse methods would take parameters such as the tokens to parse, and the current namespace to lookup typenames, etc.

I need to (and will be) experiment with this a little to find exactly what's needed to get what I want. From what I'm seeing so far, though, it doesn't look like it's too far away from what we're talking about; so far I don't think it'll be hard to make it work.

JohnDG Sun 29 Mar 2009

An incremental parser accepts an CST and an edit to the text file from which it was generated (e.g. insert or delete), and produces from this pair a new CST, from which is derived the new AST (you can also use an AST as input if you maintain metadata).

It's very unlikely you'll be able to use the existing Fan parser to accomplish this in a robust way. Making various units public will do little to further this goal.

tompalmer Mon 30 Mar 2009

Recompiling even whole files even shouldn't be too bad most of the time (speedy CPUs), if all the knowledge about other files/types remains in memory. And if people don't make their files too huge. I doubt things need vastly reworked to get basic compile-as-you-go running.

JohnDG Mon 30 Mar 2009

Compile-as-you-go is not compile-as-you-type, and the larger issue here is not incremental parsing but substring parsing, which enables you to ignore a malformed expression and do a "best fit" to produce a valid AST, which lets you do code completion, refactoring, syntax highlighting, and other operations even while the code is still not perfectly formed.

brian Tue 31 Mar 2009

I'd like to throw out some empirical measurements to frame the discussion. These times are based on Fan compiler performance (after warmed up) on testSys. Currently testSys is our largest pod, so it represents the worse case we have on hand:

Tokenize  456ms total;  8ms per file
Parse     528ms total;  9ms per file
Analysis  477ms total;  8ms per file

Assuming for now you could divide the entire pipeline up per file, you could brute force a recompile each time a file changed in 26ms. But you can easily re-tokenize on a per line basis as you type, so that gets you down to 17ms for the just the parse and analysis pipeline.

I don't know what kind of reaction time something like Eclipse gets, but I would think that sub 50ms would be pretty good performance.

However, when I look at those numbers the big problem isn't tokenizing or parsing, but rather analysis. Tokenizing and parsing is trivial to divide up on a per file basis, so even brute force should be 10ms to 20ms. If anything I hope those numbers illustrate that incremental parsing is probably not effort well spend - dropping the parse step from 9ms to 1ms isn't going to make a big difference.

But the analysis steps of the compiler pipeline don't lend themselves to incremental compiling very easily. However since Fan doesn't support type inference except with in a method body, I think there is likely good opportunity to rework some of those steps to support a more incremental model.

and the larger issue here is not incremental parsing but substring parsing

Great point, I think this is the root issue. In order to get auto-complete as you type functionality, we need to be able to deal with partially formed statements, and potentially unmatched braces. The fact that Fan doesn't require semicolons makes this a little trickier than Java.

If we can solve this problem well (potentially with an incremental tokenizer and fixer-up pre-parser), then I do think we could feed it back into the standard Fan compiler and we would have really good solution.

KevinKelley Tue 31 Mar 2009

As I'm working on code-folding in the editor now, I'm getting a feel for what's easy and what's hard, given Fan and its current compiler pod. Sub-50ms response is a good upper limit for the hardest test cases; that's fast enough to be indistinguishable from instantaneous. It seems that the rough approach of reparsing the full compilation unit is still at least that fast. Handing malformed code is the bigger problem, since in many or most cases the current parser just stops on an error, leaving me with effectively a partial tree and some unparsed tokens.

So I'm looking at a multi-step process, with what I'm thinking of as an overlay of stylings and abstract info.

Raw source - no tokens, no parse, gets default (grayed or italic) styling
Tokenized source - stylings applied according to lexical token type; tokenstream should nearly always successfully complete, this gives a view similar to current flux
Usings and Types - like current parser, preparse tokenstream into maximal sections and collect typenames introduced by this compilation unit. Assign ranges of tokens to associated TypeDef.
Slots - for each type, break its tokenrange into SlotDefs (FieldDef or MethodDef). Attempt with Parser; if it fails then use a naive brace-matching to get as many good SlotDef nodes as possible.

I'm using what's in Compiler and in the build pod as a guide in maintaining state during the parse. It seems so far (disregarding closures for the moment) that the above steps are far enough. At this point I can build a partial pod that has info that can be queried for available types and slot info. Now on an edit I locate the AST leaf in which it occurred, climb the tree until I get to its slotdef, and reparse just that range of tokens. (If the slotdef parse completes and there's still tokens, it means the coder entered an rbrace and kept typing, for example, so go up another level in the tree and start a new slotdef with the remaining tokens.)

The unmatched braces problem I think can best best be solved with an outside-in pre-parse: that is, an initial step that breaks up the tokenstream into maximally- sized blocks. Then, when user types an lbrace, the invalid code is just the lowest-level AST node that doesn't successfully parse.

...

Rereading this post, I guess I sound a little disjointed. I think it's coming clear, and I believe it's going to work out pretty good, but I'm still in process, and it's not all clear yet.

Brian, I agree that with the current parser as fast as it is, changing it to gain a few milliseconds is a waste of time. The issue as you say is robustly handling bad or partial code. What I described above, pushing errored code as far down as possible in the parse tree, is my best attempt at that. More discussion will be helpful, I'm sure.

cheers...

brian Tue 31 Mar 2009

Kevin, not sure I quite followed all of those details, but I think you are on the same page as me regarding how to approach the problem.

One technique which might help deal with malformed syntax is to use the white-space indention to try and match braces.

KevinKelley Tue 31 Mar 2009

Here's a related question; what's the feeling on automatic reformatting of source code?

It seems like this partial parsing thing might be easier if the editor was allowed to put (at least some types of) tokens where it thinks they belong. For example, closing braces. When you type a close-brace, editor puts it at the same indent level as the brace it matches, starting a new line if necessary. So, editor can then attempt to parse contained code at that syntactic level, and the programmer can see what the code he types is doing.

Some people are pretty picky about not letting the editor mess with their code, though, so this kind of reformatting might turn out to be a Love it or Hate it kind of thing. Which would mean, that there'd need to be an option to turn it off; which in turn means that we have to be able to do it the hard way anyway, when reformatting is turned off and we still need to parse non-canonical layout of the source.

That's how I'm reading the situation, anyway. Probably need a working example to play around with, to get a feel for what feels right. Hopefully we'll have something to talk about soon.

brian Tue 31 Mar 2009

Some people are pretty picky about not letting the editor mess with their code

I am solidly in that camp. I think our solution needs to avoid require having the editor muck with actual source text (although certainly it can do so behind the scenes before submitting to the compiler).

KevinKelley Tue 31 Mar 2009

I mostly agree; I've spent some time in Smalltalk and most smalltalks aren't shy about reformatting code, moving comments around, or whatever. When it's done exactly right I like it, but even with Smalltalk's simple syntax there's always corner cases where the autoformatting is messed up. So, I agree with you.

Having "Reformat" as an undo-able menu command would be nice, tho.

Fantom

#497 Flux & FWT: 1 bugfix, 2 changes and a request or two

KevinKelley Sat 28 Mar 2009

brian Sat 28 Mar 2009

JohnDG Sat 28 Mar 2009

tompalmer Sun 29 Mar 2009

qualidafial Sun 29 Mar 2009

KevinKelley Sun 29 Mar 2009

brian Sun 29 Mar 2009

KevinKelley Sun 29 Mar 2009

JohnDG Sun 29 Mar 2009

KevinKelley Sun 29 Mar 2009

JohnDG Sun 29 Mar 2009

tompalmer Mon 30 Mar 2009

JohnDG Mon 30 Mar 2009

brian Tue 31 Mar 2009

KevinKelley Tue 31 Mar 2009

brian Tue 31 Mar 2009

KevinKelley Tue 31 Mar 2009

brian Tue 31 Mar 2009

KevinKelley Tue 31 Mar 2009