Great feature, wouldn't want to live without it, but the implementation is a bit of a hack.
Interpolated strings (and uris) get handled in compiler::Tokenizer, by replacing the single Token.str token you get from a non-interp'd string with a sequence of tokens representing the segments concatenated together with +.
So for plain strings, you get this:
verifyTokens(Str<|"a b c"|>, [str("a b c", 1,1)]) // tok.val, tok.line, tok.col
verifyTokens(Str<|"""a b c"""|>, [str("a b c", 1,1)])
and for interpolated strings, you get (simplest case):
There's several things about this. From a design standpoint, I'd wonder whether interpolated strings ought to be handled like Dsls are - with a plugin, so that the token could encompass the whole string, and you don't have the whole "virtual" token thing going on.
Or, if we've got "virtual" tokens, at least let them be identifiable. How about a TokenVal subclass VirtualToken or similar, so you don't need a parser just to find your string boundaries?
The other thing is, notice how plain str tokens (and every other token kind) has a line,col location that points to the first character of the token. But in interpolated strings the location is deep inside the string. This I would call broken.
So. I can program around all this, or I can tweak compiler pod with the VirtualToken idea, or I can do something else. My guess is that what we've got is a decent quick implementation that works, not necessarily the ideal design we want to keep. Thoughts?
brianSat 31 Oct 2009
Hi Kevin,
I guess I am not sure I understand why an internal detail in the compiler matters that much to you? What is your use case?
Interpolated strings are by definition sugar for the way the tokenizer spitting them out. It sort of has to work that way, so that the Parser can process the expressions. For example if you are going to do error reporting or code completion, the tokens in the interpolated expressions matter.
It sounds like you saying the entire string should be treated as one token? If that is your suggestion, then I don't think I agree with that one.
Although I should note, issue #731 is an open issue which was an unintended side effect of my implementation.
KevinKelleySat 31 Oct 2009
This is for a code editor; it's useful to use the Fan Tokenizer (and Parser too) to interpret the raw source text.
I agree that the interpolated tokens need to come through; my problem is with the line/col info being weird, and there not being a way to differentiate between "real" tokens that represent source, and "virtual" tokens that are added by the tokenizer.
Everywhere else, even in non-interpolated strings, it's an invariant that the TokenVal.line and .col point to the first character of the token in the source text. With interpolated strings, the line/col is all over the place.
Qualidafial's suggestion for TokenVal.synthetic would be fine, and I'd like to see the .line and .col be more consistent.
brianSun 1 Nov 2009
ok - if the line/col are messed up, I'll defintitely take a look at that when I fix #731
KevinKelleySun 1 Nov 2009
Sounds fine, thanks.
brianThu 5 Nov 2009
Kevin,
Try this changeset and see if the line/col numbers of interpolated tokens works better for you.
Brian
KevinKelleyThu 5 Nov 2009
Brian, thanks, that looks like it ought to be right. One little nitpick, should this line
line = this.line; col = this.col - 1; // before quote
be
line = this.line; col = this.col - (q.isTriple? 3: 1)
?
Thanks for taking the time to look at it. I'm pulling the changeset now.
KevinKelleyFri 6 Nov 2009
nitpick
Sorry, nevermind the nitpick comment, I misinterpreted something. I've done some brief testing, and it looks like this is giving me what I wanted. Sorry for the whining. :)
KevinKelley Thu 29 Oct 2009
Great feature, wouldn't want to live without it, but the implementation is a bit of a hack.
Interpolated strings (and uris) get handled in compiler::Tokenizer, by replacing the single Token.str token you get from a non-interp'd string with a sequence of tokens representing the segments concatenated together with
+
.So for plain strings, you get this:
and for interpolated strings, you get (simplest case):
or for bracketed interp:
There's several things about this. From a design standpoint, I'd wonder whether interpolated strings ought to be handled like Dsls are - with a plugin, so that the token could encompass the whole string, and you don't have the whole "virtual" token thing going on.
Or, if we've got "virtual" tokens, at least let them be identifiable. How about a TokenVal subclass
VirtualToken
or similar, so you don't need a parser just to find your string boundaries?The other thing is, notice how plain str tokens (and every other token kind) has a line,col location that points to the first character of the token. But in interpolated strings the location is deep inside the string. This I would call broken.
So. I can program around all this, or I can tweak compiler pod with the VirtualToken idea, or I can do something else. My guess is that what we've got is a decent quick implementation that works, not necessarily the ideal design we want to keep. Thoughts?
brian Sat 31 Oct 2009
Hi Kevin,
I guess I am not sure I understand why an internal detail in the compiler matters that much to you? What is your use case?
Interpolated strings are by definition sugar for the way the tokenizer spitting them out. It sort of has to work that way, so that the Parser can process the expressions. For example if you are going to do error reporting or code completion, the tokens in the interpolated expressions matter.
It sounds like you saying the entire string should be treated as one token? If that is your suggestion, then I don't think I agree with that one.
Although I should note, issue #731 is an open issue which was an unintended side effect of my implementation.
KevinKelley Sat 31 Oct 2009
This is for a code editor; it's useful to use the Fan Tokenizer (and Parser too) to interpret the raw source text.
I agree that the interpolated tokens need to come through; my problem is with the line/col info being weird, and there not being a way to differentiate between "real" tokens that represent source, and "virtual" tokens that are added by the tokenizer.
Everywhere else, even in non-interpolated strings, it's an invariant that the TokenVal.line and .col point to the first character of the token in the source text. With interpolated strings, the line/col is all over the place.
Qualidafial's suggestion for
TokenVal.synthetic
would be fine, and I'd like to see the.line
and.col
be more consistent.brian Sun 1 Nov 2009
ok - if the line/col are messed up, I'll defintitely take a look at that when I fix #731
KevinKelley Sun 1 Nov 2009
Sounds fine, thanks.
brian Thu 5 Nov 2009
Kevin,
Try this changeset and see if the line/col numbers of interpolated tokens works better for you.
Brian
KevinKelley Thu 5 Nov 2009
Brian, thanks, that looks like it ought to be right. One little nitpick, should this line
be
?
Thanks for taking the time to look at it. I'm pulling the changeset now.
KevinKelley Fri 6 Nov 2009
Sorry, nevermind the nitpick comment, I misinterpreted something. I've done some brief testing, and it looks like this is giving me what I wanted. Sorry for the whining. :)