Unless I've missed it, Fan doesn't have multiline string literals. Especially with the declarative notions, I think this is vital. Personally, I prefer this syntax to triple-quoting, because it allows indentation without adding to string content:
adj := "second" // Just for use in the next string, for fun.
text :=
"This is the first line,
"And this the $adj.
// Blank line follows, and note that I can sneak in comments if I want.
"
"This is the third.
"" // And this is how you'd make an empty last line if you want
// This next example shows data without newlines inserted.
data :=
"283748923759879834534283748923759879834534"
"439058093458094380935439058093458094380935"
And those (trailing " or not) could be mixed and matched, also with raw strings.
And there'd have to be a standardized interpretation of newline, independent of platform (either always \n or always \r\n, not sure which at the moment).
tompalmerSun 20 Jul 2008
Also, all trailing whitespace would be ignored in strings with newlines at the end. No point in making something (normally) invisible have meaning.
brianSun 20 Jul 2008
All string literals can be multiline today. You can prefix a string with r such as r"..." to not evaluate the \ as an escape character. See docs.
They are used pretty extensively during compiler and serialization tests, and of course in sidewalk for generating html.
tompalmerSun 20 Jul 2008
That's what I get for skimming too much. Thanks for the answer.
Sounds like indentation could be an issue, still. Scala includes a string method that does something like this (Fan-ized, and I don't remember their method name):
nonIndented := "
|First line
|Second line
|"
.stripIndent
Or something like that. Might be worth considering something along those lines.
brianSun 20 Jul 2008
Yeah the indention thing is annoying, I'd welcome a nice solution to that.
What do other languages do?
tompalmerMon 21 Jul 2008
Scala is the only one that has made an obvious plan that I know of. Most people seem just to disregard the indentation question. But maybe I haven't dug deep enough.
If you want pure serialized objects to allow nice indentation, I recommend my first proposal (prefix each line with "). And I really think it's not so bad:
// We live with comments prefixing our text
// and seem to be okay with it.
**
** Even API docs seem to be fine with things at the
** beginning of each line, and we can type and read
** it.
**
str :=
"I think strings could do the same without
"too much syntax weight."
However, if just full code is going to support it, then some variation of Scala might work. But perhaps instead of needing to start each line with some marker char, you could just have a dedent method that strips all matching whitespace from the beginning of each line:
nonIndented := "
First line
Second line
"
.dedent
.indent(4) // If you then want to indent things again?
heliumMon 21 Jul 2008
My old toy-language with python like indentation did it like this (>>> means a string starts at the next line):
str: >>>
Some text
that spans multiple lines
and has different
levels
of
indentation.
This would result in
"some text"
"that spans multiple lines"
" and has different"
" levels"
" of"
"indentation."
The indentation was taken from the first non-whitespace charachter in the first line and as many whitespaces were stripped from all following lines of the string. But I think this only works in indentation-aware languages.
brianMon 21 Jul 2008
In my own code I've used two approaches - one is to strip the whitespace programmatically (which is useless overhead). And another is to not use indentation with the string literals (which is ugly)
I'm tempted to just say that the compiler automatically strips the indentation. That means you can't have multiline strings with leading whitespace. But seems having the indentation strips seems like what you want 99% of the time.
Comments?
jodastephenMon 21 Jul 2008
I really like Tom's original suggestion - its clear as to where each line of the multi-line starts.
brianMon 21 Jul 2008
I really like Tom's original suggestion
I don't like the original proposal because it forces you to stick " on the front of every line which is often a pain if just pasting in some text.
cbeustTue 22 Jul 2008
I'm tempted to just say that the compiler automatically strips the indentation.
No, this will make it impossible to embed text that has significant spaces (such as Python or XML files you want to indent nicely).
I don't like the original proposal because it forces you to stick " on the front of every line which is often a pain if just pasting in some text.
Please reconsider this, it's really the only way to have our cake and eat it too: freedom to indent in any way the programmer wants without compiler interference.
If you don't like ", use | or another character, it's really not that ugly.
-- Cedric
brianTue 22 Jul 2008
Please reconsider this, it's really the only way to have our cake and eat it too
I guess I'm lost then. The whole point of multiline strings is to avoid stuff like this:
s :=
"line 1" +
"line 2" +
"line 3" +
"line 4"
So that you can just do this:
s :=
"line 1
line 2
line 3
line 4"
I can already to both of these today. What exactly is the proposal that lets me create multiline strings with the right indentation? And remember it has be unambiguous.
tompalmerTue 22 Jul 2008
With regex search and replace, you can add the '"' in automatically ([^ \t] with "\1 in TextPad). An Fan editor could add the '"' prefixes in automatically on paste.
cbeustTue 22 Jul 2008
The point is to be able to specify smaller indents in the string than the code is at. For example (imagine that line for s is already indented quite a bit)
s :=
"line 1
| line 2
|line 3"
would put the string "line 1\n line2\nline3" in s.
Does the current compiler allow me to do that?
-- Cedric
tompalmerTue 22 Jul 2008
I think Brian's proposal (auto-dedent, not current Fan behavior (?)) allows for that like so:
s := "
line 1
line 2
line 3"
I'm more concerned with these issues:
Rarely, you might actually want all the lines to be indented, especially in serialized objects. The syntax doesn't need to be ideal for this. It just needs to be possible.
I think the behavior would be less obvious than what the leading " implies.
People might accidentally mix tabs and spaces (though they shouldn't) leading to subtle bugs. Copy-and-paste might especially cause this kind of problem.
cbeustTue 22 Jul 2008
Maybe I misunderstood the dedent proposal, then. What happens if the whole block gets indented, does the value of s get indented as well?
I'm tempted to just say that the compiler automatically strips the indentation. That means you can't have multiline strings with leading whitespace. But seems having the indentation strips seems like what you want 99% of the time.
cbeustTue 22 Jul 2008
I'm tempted to just say that the compiler automatically strips the indentation. That means you can't have multiline strings with leading whitespace. But seems having the indentation strips seems like what you want 99% of the time.
In my experience, it's quite the opposite: as soon as I need a multiline string that will span over more than two lines, I will most of the time want some of these lines to be indented.
-- Cedric
tompalmerTue 22 Jul 2008
Maybe I did misunderstand that. I had thought he meant to strip just the common indentation (leaving any excess as per line indentation). But rereading it, I'm not sure.
And for the record, I still prefer my original proposal of just including " at the beginning of each line to indicate where the string content starts.
brianTue 22 Jul 2008
My proposal is to eliminate common whitespace from the line which as the least indentation.
So given a string literal:
s :=
" line 1
line 2
line 3"
Would be equivalent to:
s :=
" line1" +
" line 2" +
"line 3"
I think that solves the 80% case of what you really want. If you really need leading whitespace in all lines, then you can just use the second style which you can already do today.
cbeustTue 22 Jul 2008
I thought we were trying to eliminate the need for the latter.
There are two reasons why it's not optimal:
Can't easily copy/paste a chunk of text into a Fan source while preserving its formatting
It doesn't automatically include \n
One question about your first example: if the column where s starts is 10, does s still contain line1 indented by one character or by 11?
-- Cedric
andyTue 22 Jul 2008
This has always bugged me too - and I think Brian's proposal is a good solution and feels very natural.
brianTue 22 Jul 2008
With regex search and replace, you can add the " in automatically ([^ \t] with "\1 in TextPad). An Fan editor could add the " prefixes in automatically on paste.
Just to be clear on my position, if you are relying on a tool then you can just paste both an start and end quote for each line. I'm not sure, but I think Tom and Cedric you are proposing some weird middle ground where each line has a starting mark, but not an ending mark. That doesn't compute for me. To me the whole point of a multiline string is no internal special chars for each line.
One question about your first example: if the column where s starts is 10, does s still contain line1 indented by one character or by 11?
I guess that depends on what conventions we want to support:
s := "
line1
line2
...
Or...
s :=
"line1
line2
...
I can make the compiler figure it out, we just have to define the formal rules. We could also switch to use triple quotes if that made sense.
cbeustTue 22 Jul 2008
The latter would give me what I'm looking for. In this case, the double quote defines where the string starts each line, regardless of the indentation in the Fan source:
s :=
"<package>
<class/>
</package>"
would therefore make s equal to
"<package>\n <class/>\n</package>"
-- Cedric
brianTue 22 Jul 2008
The latter would give me what I'm looking for. In this case, the double quote defines where the string starts each line, regardless of the indentation in the Fan source
Actually this rule seems to be quite simple and universally applicable even when you want leading whitespace. I like it:
s :=
" line1
line 2"
=> " line1\n line2"
Although it can get confusing if you do this:
s :=
"line1
line 2"
s := "line1
line 2"
In both of those cases we have text to the left of the starting quote. Whats the rule for that?
cbeustTue 22 Jul 2008
Indeed, these are a bit problematic. Two options:
Compiler error: "The string must start after the double quote" (seems a bit extreme)
Compiler warning: "The string is not indented, truncating it"
"Truncating" is not the right term here: it's just a nice way to let the user know that the indentation algorithm is being turned off, which means that line2 will be indented by the same amount as in the source file.
Also, we should clarify whether putting the opening double quote on the same line as := or on a line of its own makes any difference.
In short, we need to clarify the following four cases:
s := "abc
def"
s := "abc
def"
s :=
"abc
def"
s:=
"abc
def"
-- Cedric
tompalmerTue 22 Jul 2008
If you go this route, I think any line with insufficient matching indentation should have no indentation stripped. So these are equivalent:
a := "
A
B
C"
b := "\n A\nB\n C"
And I still think that all trailing whitespace followed by no closing " should be stripped by the compiler.
tacticsTue 22 Jul 2008
I think the current system is just fine. If you have a moderately long literal, split it up onto separate lines concatenated with +. A good compiler will optimize it away, so the auto-concat syntax isn't necessary. I really don't like the proposals for automatic space and indentation removal. It solves one problem at the cost of creating another (what if my program wanted those spaces there? Or my line to begin with a pipe?) You can come up with solutions, but they always come out feeling hacky.
If your strings are cluttering up your code, you're better off putting it into an external file anyway. There, you don't need escaping at all, and your indentation is just fine.
alexlamslTue 22 Jul 2008
IMHO Perl-style (i.e. no magical indentation removal) is as far as I can grasp in terms of multi-line string literals. It does make the code a bit less readable, but it does have its benefits when it is needed.
Add more rules than that to it will just add more surprises to new / experienced programmers from time to time.
tompalmerWed 23 Jul 2008
I agree with tactics and alexlamsl that compiler auto dedent is too magical. It's hard to explain, so it will be hard to learn.
My own preference is to keep current Fan behavior unless the prepended " is accepted:
text :=
"I really think that this is clear and easy.
" It allows obvious indentation.
"And it has the same feel as comments.
text2 := StringUtil.dedentAndTrimLeading("
The above is better than this alternative in Fan today.
And it's better for serialization, too.
I just invented 'StringUtil' for this example, by the way.
")
text3 :=
"And either is much better than this alternative.\n" +
" Much less readable and writable than comments.\n" +
"If you know what I mean.\n"
But still, I agree that Fan today is still better than too complicated of auto-dedent rules. Just my own opinion.
brianWed 23 Jul 2008
I agree with tactics and alexlamsl that compiler auto dedent is too magical.
I'm all for simple, but in this case the current design does the wrong thing for me pretty much 100% of the time unless I align my string literals to the left. For example if generating HTML, I'm adding half a dozen useless spaces to the start of every line.
A method to fix it up is a good idea, but is an expensive fix.
My inclination is to just say the string indentation starts at the opening quote. Any lines which begin to the left of the quote are a compiler error. Yeah it is extreme, but it is a simple rule and you can't mess it up.
heliumWed 23 Jul 2008
I realy like that.
tacticsWed 23 Jul 2008
My inclination is to just say the string indentation starts at the opening quote. Any lines which begin to the left of the quote are a compiler error. Yeah it is extreme, but it is a simple rule and you can't mess it up.
Not a bad solution, but it does risk the possibility of infringing on the users' constitutional Right to Tab. If I am using tabs in my source, and I have the code
class Foo
{
Str myBar := "I'm a multiline
__________________string"
}
What do I replace the underscores with? 18 spaces? 4 tabs and two spaces (assuming tab=4 spaces)? Most Fan source is done with tab=2 spaces, so maybe it's 9 tabs.
Once the question of tab=x spaces? comes up, you risk people hating your guts =-P
tompalmerWed 23 Jul 2008
You'd have to tab to the beginning of the previous line and then use spaces, as in:
class Foo
{
<tab>Str myBar := "I'm a multiline
<tab><spaces >string"
}
That's generally good form anyway, if you want things to line up.
And I'm a tabber, so I know where you are coming from.
brianWed 23 Jul 2008
Well personally I think tabs are evil, but I don't want to interfere with anyone else's religion. I think what Tom proposed is correct though - you can use a consistent number of tabs on each line, but after that you need to use spaces to line up with quote. And if you aren't doing that your code is semantically ambiguous anyways. Although in general you're probably safer write multiline strings as follows (especially if you are a tabber):
class Foo
{
<tab>Str myBar :=
<tab>"I'm a multiline
<tab> string"
}
brianSun 27 Jul 2008
I haven't gotten anymore feedback on this proposal, so I'll consider it decided.
The proposed change is that multiline strings are aligned one character to the left of the opening quote. Any lines to the left of the opening quote are considered compiler errors.
If the line of opening quote has tabs, then additional lines must have the exact number of tabs with additional space padding to align to the quote.
andySun 27 Jul 2008
I agree with all that. This will be a welcome change.
brianFri 22 Aug 2008
This feature is implemented for next build.
Getting the code base fixed was hell because the compiler error tests all had to have their column numbers changed.
alexlamslFri 22 Aug 2008
Nice work - now we just have to get our IDE to manage the rest ;-)
tompalmer Sun 20 Jul 2008
Unless I've missed it, Fan doesn't have multiline string literals. Especially with the declarative notions, I think this is vital. Personally, I prefer this syntax to triple-quoting, because it allows indentation without adding to string content:
And those (trailing
"
or not) could be mixed and matched, also with raw strings.And there'd have to be a standardized interpretation of newline, independent of platform (either always
\n
or always\r\n
, not sure which at the moment).tompalmer Sun 20 Jul 2008
Also, all trailing whitespace would be ignored in strings with newlines at the end. No point in making something (normally) invisible have meaning.
brian Sun 20 Jul 2008
All string literals can be multiline today. You can prefix a string with r such as r"..." to not evaluate the \ as an escape character. See docs.
They are used pretty extensively during compiler and serialization tests, and of course in sidewalk for generating html.
tompalmer Sun 20 Jul 2008
That's what I get for skimming too much. Thanks for the answer.
Sounds like indentation could be an issue, still. Scala includes a string method that does something like this (Fan-ized, and I don't remember their method name):
Or something like that. Might be worth considering something along those lines.
brian Sun 20 Jul 2008
Yeah the indention thing is annoying, I'd welcome a nice solution to that.
What do other languages do?
tompalmer Mon 21 Jul 2008
Scala is the only one that has made an obvious plan that I know of. Most people seem just to disregard the indentation question. But maybe I haven't dug deep enough.
If you want pure serialized objects to allow nice indentation, I recommend my first proposal (prefix each line with
"
). And I really think it's not so bad:However, if just full code is going to support it, then some variation of Scala might work. But perhaps instead of needing to start each line with some marker char, you could just have a
dedent
method that strips all matching whitespace from the beginning of each line:helium Mon 21 Jul 2008
My old toy-language with python like indentation did it like this (>>> means a string starts at the next line):
This would result in
The indentation was taken from the first non-whitespace charachter in the first line and as many whitespaces were stripped from all following lines of the string. But I think this only works in indentation-aware languages.
brian Mon 21 Jul 2008
In my own code I've used two approaches - one is to strip the whitespace programmatically (which is useless overhead). And another is to not use indentation with the string literals (which is ugly)
I'm tempted to just say that the compiler automatically strips the indentation. That means you can't have multiline strings with leading whitespace. But seems having the indentation strips seems like what you want 99% of the time.
Comments?
jodastephen Mon 21 Jul 2008
I really like Tom's original suggestion - its clear as to where each line of the multi-line starts.
brian Mon 21 Jul 2008
I don't like the original proposal because it forces you to stick " on the front of every line which is often a pain if just pasting in some text.
cbeust Tue 22 Jul 2008
No, this will make it impossible to embed text that has significant spaces (such as Python or XML files you want to indent nicely).
Please reconsider this, it's really the only way to have our cake and eat it too: freedom to indent in any way the programmer wants without compiler interference.
If you don't like ", use | or another character, it's really not that ugly.
-- Cedric
brian Tue 22 Jul 2008
I guess I'm lost then. The whole point of multiline strings is to avoid stuff like this:
So that you can just do this:
I can already to both of these today. What exactly is the proposal that lets me create multiline strings with the right indentation? And remember it has be unambiguous.
tompalmer Tue 22 Jul 2008
With regex search and replace, you can add the '"' in automatically ([^ \t] with "\1 in TextPad). An Fan editor could add the '"' prefixes in automatically on paste.
cbeust Tue 22 Jul 2008
The point is to be able to specify smaller indents in the string than the code is at. For example (imagine that line for s is already indented quite a bit)
would put the string "line 1\n line2\nline3" in s.
Does the current compiler allow me to do that?
-- Cedric
tompalmer Tue 22 Jul 2008
I think Brian's proposal (auto-dedent, not current Fan behavior (?)) allows for that like so:
I'm more concerned with these issues:
"
implies.cbeust Tue 22 Jul 2008
Maybe I misunderstood the dedent proposal, then. What happens if the whole block gets indented, does the value of s get indented as well?
tompalmer Tue 22 Jul 2008
I was referring to this comment from Brian:
cbeust Tue 22 Jul 2008
In my experience, it's quite the opposite: as soon as I need a multiline string that will span over more than two lines, I will most of the time want some of these lines to be indented.
-- Cedric
tompalmer Tue 22 Jul 2008
Maybe I did misunderstand that. I had thought he meant to strip just the common indentation (leaving any excess as per line indentation). But rereading it, I'm not sure.
And for the record, I still prefer my original proposal of just including
"
at the beginning of each line to indicate where the string content starts.brian Tue 22 Jul 2008
My proposal is to eliminate common whitespace from the line which as the least indentation.
So given a string literal:
Would be equivalent to:
I think that solves the 80% case of what you really want. If you really need leading whitespace in all lines, then you can just use the second style which you can already do today.
cbeust Tue 22 Jul 2008
I thought we were trying to eliminate the need for the latter.
There are two reasons why it's not optimal:
One question about your first example: if the column where s starts is 10, does s still contain line1 indented by one character or by 11?
-- Cedric
andy Tue 22 Jul 2008
This has always bugged me too - and I think Brian's proposal is a good solution and feels very natural.
brian Tue 22 Jul 2008
Just to be clear on my position, if you are relying on a tool then you can just paste both an start and end quote for each line. I'm not sure, but I think Tom and Cedric you are proposing some weird middle ground where each line has a starting mark, but not an ending mark. That doesn't compute for me. To me the whole point of a multiline string is no internal special chars for each line.
I guess that depends on what conventions we want to support:
Or...
I can make the compiler figure it out, we just have to define the formal rules. We could also switch to use triple quotes if that made sense.
cbeust Tue 22 Jul 2008
The latter would give me what I'm looking for. In this case, the double quote defines where the string starts each line, regardless of the indentation in the Fan source:
would therefore make s equal to
-- Cedric
brian Tue 22 Jul 2008
Actually this rule seems to be quite simple and universally applicable even when you want leading whitespace. I like it:
Although it can get confusing if you do this:
In both of those cases we have text to the left of the starting quote. Whats the rule for that?
cbeust Tue 22 Jul 2008
Indeed, these are a bit problematic. Two options:
"Truncating" is not the right term here: it's just a nice way to let the user know that the indentation algorithm is being turned off, which means that line2 will be indented by the same amount as in the source file.
Also, we should clarify whether putting the opening double quote on the same line as := or on a line of its own makes any difference.
In short, we need to clarify the following four cases:
-- Cedric
tompalmer Tue 22 Jul 2008
If you go this route, I think any line with insufficient matching indentation should have no indentation stripped. So these are equivalent:
And I still think that all trailing whitespace followed by no closing
"
should be stripped by the compiler.tactics Tue 22 Jul 2008
I think the current system is just fine. If you have a moderately long literal, split it up onto separate lines concatenated with +. A good compiler will optimize it away, so the auto-concat syntax isn't necessary. I really don't like the proposals for automatic space and indentation removal. It solves one problem at the cost of creating another (what if my program wanted those spaces there? Or my line to begin with a pipe?) You can come up with solutions, but they always come out feeling hacky.
If your strings are cluttering up your code, you're better off putting it into an external file anyway. There, you don't need escaping at all, and your indentation is just fine.
alexlamsl Tue 22 Jul 2008
IMHO Perl-style (i.e. no magical indentation removal) is as far as I can grasp in terms of multi-line string literals. It does make the code a bit less readable, but it does have its benefits when it is needed.
Add more rules than that to it will just add more surprises to new / experienced programmers from time to time.
tompalmer Wed 23 Jul 2008
I agree with tactics and alexlamsl that compiler auto dedent is too magical. It's hard to explain, so it will be hard to learn.
My own preference is to keep current Fan behavior unless the prepended
"
is accepted:But still, I agree that Fan today is still better than too complicated of auto-dedent rules. Just my own opinion.
brian Wed 23 Jul 2008
I'm all for simple, but in this case the current design does the wrong thing for me pretty much 100% of the time unless I align my string literals to the left. For example if generating HTML, I'm adding half a dozen useless spaces to the start of every line.
A method to fix it up is a good idea, but is an expensive fix.
My inclination is to just say the string indentation starts at the opening quote. Any lines which begin to the left of the quote are a compiler error. Yeah it is extreme, but it is a simple rule and you can't mess it up.
helium Wed 23 Jul 2008
I realy like that.
tactics Wed 23 Jul 2008
Not a bad solution, but it does risk the possibility of infringing on the users' constitutional Right to Tab. If I am using tabs in my source, and I have the code
What do I replace the underscores with? 18 spaces? 4 tabs and two spaces (assuming tab=4 spaces)? Most Fan source is done with tab=2 spaces, so maybe it's 9 tabs.
Once the question of tab=x spaces? comes up, you risk people hating your guts =-P
tompalmer Wed 23 Jul 2008
You'd have to tab to the beginning of the previous line and then use spaces, as in:
That's generally good form anyway, if you want things to line up.
And I'm a tabber, so I know where you are coming from.
brian Wed 23 Jul 2008
Well personally I think tabs are evil, but I don't want to interfere with anyone else's religion. I think what Tom proposed is correct though - you can use a consistent number of tabs on each line, but after that you need to use spaces to line up with quote. And if you aren't doing that your code is semantically ambiguous anyways. Although in general you're probably safer write multiline strings as follows (especially if you are a tabber):
brian Sun 27 Jul 2008
I haven't gotten anymore feedback on this proposal, so I'll consider it decided.
The proposed change is that multiline strings are aligned one character to the left of the opening quote. Any lines to the left of the opening quote are considered compiler errors.
If the line of opening quote has tabs, then additional lines must have the exact number of tabs with additional space padding to align to the quote.
andy Sun 27 Jul 2008
I agree with all that. This will be a welcome change.
brian Fri 22 Aug 2008
This feature is implemented for next build.
Getting the code base fixed was hell because the compiler error tests all had to have their column numbers changed.
alexlamsl Fri 22 Aug 2008
Nice work - now we just have to get our IDE to manage the rest ;-)