#805 Str and Uri interpolation is weird

KevinKelley Thu 29 Oct 2009

Great feature, wouldn't want to live without it, but the implementation is a bit of a hack.

Interpolated strings (and uris) get handled in compiler::Tokenizer, by replacing the single Token.str token you get from a non-interp'd string with a sequence of tokens representing the segments concatenated together with +.

So for plain strings, you get this:

verifyTokens(Str<|"a b c"|>, [str("a b c", 1,1)])  // tok.val, tok.line, tok.col

verifyTokens(Str<|"""a b c"""|>, [str("a b c", 1,1)])

and for interpolated strings, you get (simplest case):

verifyTokens(Str<|"$a"|>, [lparen(1,2), str("",1,3), plus(1,3), ID("a",1,3), rparen(1,5)])

or for bracketed interp:


There's several things about this. From a design standpoint, I'd wonder whether interpolated strings ought to be handled like Dsls are - with a plugin, so that the token could encompass the whole string, and you don't have the whole "virtual" token thing going on.

Or, if we've got "virtual" tokens, at least let them be identifiable. How about a TokenVal subclass VirtualToken or similar, so you don't need a parser just to find your string boundaries?

The other thing is, notice how plain str tokens (and every other token kind) has a line,col location that points to the first character of the token. But in interpolated strings the location is deep inside the string. This I would call broken.

So. I can program around all this, or I can tweak compiler pod with the VirtualToken idea, or I can do something else. My guess is that what we've got is a decent quick implementation that works, not necessarily the ideal design we want to keep. Thoughts?

brian Sat 31 Oct 2009

Hi Kevin,

I guess I am not sure I understand why an internal detail in the compiler matters that much to you? What is your use case?

Interpolated strings are by definition sugar for the way the tokenizer spitting them out. It sort of has to work that way, so that the Parser can process the expressions. For example if you are going to do error reporting or code completion, the tokens in the interpolated expressions matter.

It sounds like you saying the entire string should be treated as one token? If that is your suggestion, then I don't think I agree with that one.

Although I should note, issue #731 is an open issue which was an unintended side effect of my implementation.

KevinKelley Sat 31 Oct 2009

This is for a code editor; it's useful to use the Fan Tokenizer (and Parser too) to interpret the raw source text.

I agree that the interpolated tokens need to come through; my problem is with the line/col info being weird, and there not being a way to differentiate between "real" tokens that represent source, and "virtual" tokens that are added by the tokenizer.

Everywhere else, even in non-interpolated strings, it's an invariant that the TokenVal.line and .col point to the first character of the token in the source text. With interpolated strings, the line/col is all over the place.

Qualidafial's suggestion for TokenVal.synthetic would be fine, and I'd like to see the .line and .col be more consistent.

brian Sun 1 Nov 2009

ok - if the line/col are messed up, I'll defintitely take a look at that when I fix #731

KevinKelley Sun 1 Nov 2009

Sounds fine, thanks.

brian Thu 5 Nov 2009


Try this changeset and see if the line/col numbers of interpolated tokens works better for you.


KevinKelley Thu 5 Nov 2009

Brian, thanks, that looks like it ought to be right. One little nitpick, should this line

line = this.line; col = this.col - 1;  // before quote


line = this.line; col = this.col - (q.isTriple? 3: 1)


Thanks for taking the time to look at it. I'm pulling the changeset now.

KevinKelley Fri 6 Nov 2009


Sorry, nevermind the nitpick comment, I misinterpreted something. I've done some brief testing, and it looks like this is giving me what I wanted. Sorry for the whining. :)

Login or Signup to reply.