#429 Triple Quoted Strings and DSL Str/Regex

freddy33 Tue 13 Jan 2009

Going over the code and reading the last discussion on multi-lines strings something is bothering me. I'm using Groovy, and I saw some Python in my youth :), and the FAN string literals seemed to me like they took the best of all worlds, until I saw this piece of code in FanToHtmlTest.fan:

Void testComplex()
{
  verifyHtml(
   "class Foo
    {
      // does this work?
      Int str := \"cool & \\\"foo\\\" > 'rock' < weee!\"
      Int x := 5  // andy rules!
    }
    ",

   "<span class='k'>class</span> Foo
    <span class='b'>{</span>
      <span class='y'>// does this work?</span>
      Int str := <span class='s'>\"cool &amp; \\\"foo\\\" &gt; 'rock' &lt; weee!\"</span>
      Int x := 5  <span class='y'>// andy rules!</span>
    <span class='b'>}</span>
    ")
}

The issue I have is the need to escape " in a multi line strings. May be I got used to groovy triple-double-quote, but the line:

Int str := \"cool & \\\"foo\\\" > 'rock' < weee!\"

Gets very hard to read. The r"" can be multi line also and so avoid the \\, but here it does not help for ".

More than 95% of escaped characters in a string are \r\n\"\t and with the """ (with the good indentation you decided) none of them needs to be escaped. You are left with \\ and \$ which frankly are used less 5% of the time in complex multi line strings. In my experience multi line strings are very useful to paste bunch of code (SQL, DSL, ...), so the need to escape " seems annoying for me. Since I know you like Groovy, the reason for not having """ is to avoid too many notations when only one does the job?

Am I the only one feeling confused?

brian Tue 13 Jan 2009

I guess besides the test code itself, I haven't run into it because XML, HTML, and JavaScript let you use the single quote for string literals/attributes. But I wouldn't be opposed to adding triple quotes. However if we add triple quotes I'd be inclined to get rid of the r" literals to keep things simple.

brian Thu 15 Jan 2009

Here is my proposal:

  1. add support for triple quoted strings
  2. remove support for r" strings

The triple quoted strings would work just like normal strings with the single exception that the quote char would not need to be escaped. All other escape sequences such as \n \t \uXXXX \$ \\ would apply. Both single and triple quotes would continue to use existing multi-line and interpolation rules.

andy Thu 15 Jan 2009

My only reservation for removing r" strings is Regex - that's where I found them most useful.

freddy33 Thu 15 Jan 2009

Thanks for the great feedback :) I'll be very happy to see """ in FAN !

On the side not, I agree with Andy about Regex, but:

  • I personally never did a multi line Regex
  • In a """ strings only $ and \ need to be escaped for Regex

Escaping \ may actually be the most annoying part.

jodastephen Sat 17 Jan 2009

Fan definitely needs to retain a string literal that doesn't have escaping of regex. Having to escape anything in a regex is very annoying, and gets negative points in language comparisons.

brian Sat 17 Jan 2009

I agree on the regex issue - although the r" literal wasn't all that great either because you often match quotes which forced you back to normal string literals. We need to solve regex somehow either with a literal representation or maybe by the new discussion on syntax plugins.

JohnDG Sat 17 Jan 2009

A compiler plug-in will solve the issue nicely.

emailPattern := Regexp <:\w+@\.\w{2,4}:>

This does compile-time validation of the regular expression pattern (if there's an error you'll see exactly where when you compile the code) and precompiles the pattern using the Fan regular expression library (but only if possible; maybe you can reference Fan variables inside such a block, using syntax like, ${foo}, and in such cases, validation can be done, but perhaps not precompilation).

With the above plug-in, it's only necessary to escape the symbol :> because this is used as the closing symbol (it could be escaped, for example, as [:]>). Unless you're detecting smileys, however, I don't think you'd ever see that particular combination of symbols used in a regular expression.

JohnDG Sat 17 Jan 2009

Actually, seeing this regexp example makes me like <| |> more, because it's less visual noise.

emailPattern := Regexp <|\w+@\.\w{2,4}|>

The vertical bar is a stronger delimiter than the colon.

brian Sun 18 Jan 2009

After some more thought, I'm not sure Regex belongs in the plugin discussion, especially as a core sys API.

I'm a bit in the Python camp where Regex doesn't deserve its own regex literal. But then we are back to something like the r" string literals which seem like a bit of a compromise since they exist primarily for Regex, but don't let you use the quote char.

So I think we:

  1. keep r" strings and are left with ", """, and r" string literals
  2. add new Regex liteal and only keep " and """ strings

If we add Regex literal, what is the right syntax? While I like the Ruby syntax of /.../, I find it impractical since you have to escape the / char. I was thinking of using:

re := '|....|'
re := '|\w+@\.\w{2,4}|'

freddy33 Sun 18 Jan 2009

+1 Looks really good So quoted literals are:

String literals multi lines with ${} = ".." """..."""
Regex no multi lines String literals = '|...|'
Int literals = '.'
Uri literals = `...`

I'm for it.

jodastephen Sun 18 Jan 2009

I can't say I'm excited about Fan inventing a new symbol for string literals. I'd much prefer to reuse one of the symbols from Groovy, Scala, Python etc.

I'd say that r"" and "" are probably all that are really needed. You could choose to allow r with single quotes however, which allows the double quote at the expense of the single quote (and appears to be in Python).

JohnDG Sun 18 Jan 2009

After some more thought, I'm not sure Regex belongs in the plugin discussion, especially as a core sys API.

Consider that with a plug-in, you get compile time pattern checking and compilation. There's no way r" can compete with that. A literal could compete, but in my opinion it's better to keep the compiler lightweight. There's no significant difference in typing or comprehension between:

re := '|\w+@\.\w{2,4}|'

and,

re := Regexp <|\w+@\.\w{2,4}|>

And both have the same safety and performance profile (much better than r" with simple strings).

On an unrelated note, I'm curious if the Regexp class yields identical results across JVM and CLR?

andy Sun 18 Jan 2009

On an unrelated note, I'm curious if the Regexp class yields identical results across JVM and CLR?

Since we currently just wrap the native Java/.NET Regex implementations, the results will not be identical. If I remember correctly, there are a few subtle differences in the grammar, though I don't recall the exact details.

brian Mon 19 Jan 2009

Well having 3 different string literals seems a bit crufty to me.

I think we are all in agreement that triple quotes should be added (I didn't hear any nays).

Regarding r" versus Regex literal versus Regex compiler plugin, let's leave that as an open issue until we get the compiler plugin prototyped. I agree with you John that might be an acceptable solution.

alexlamsl Sun 8 Feb 2009

Just come across this article that is quite worth a read:

http://www.codinghorror.com/blog/archives/001223.html

In Fan's case, the proposed DSL approach might even do some good here...

brian Fri 8 May 2009

Renamed from The different String literals to Triple Quoted Strings and DSL Str/Regex

brian Fri 8 May 2009

This is to track string literal changes associated with DSLs. Once we have DSLs, the plan is to:

  • Add triple quoted strings (same as current strings except " doesn't need to be escaped)
  • Remove r"" support for raw strings
  • Allow Str <|...|> literals to replace raw strings
  • Allow Regex <|...|> literals for Regex patterns

brian Fri 8 May 2009

Promoted to ticket #429 and assigned to brian

brian Fri 15 May 2009

Ticket resolved in 1.0.43

This work has been completed as part of the DSL feature:

Fan now supports triple quoted literals with the same semantics as normal string literals except you don't need to escape a normal quote:

echo("""Do you know "What lies beneath the shadow of the statue"?""")

Raw strings using the format r"..." are no longer supported. Instead use a Str DSL:

echo(Str <|no \ or $ escapes need, and
           multi-line works too|>)

These work like a XML CDATA section or here-document in Ruby, Python.

There is also a Regex DSL plugin for constructing a sys::Regex instance:

Regex <|foo|foo/(\d*)|>

alexlamsl Sat 16 May 2009

so triple quoted literals are basically Str <||> without multi-line support?

brian Sat 16 May 2009

so triple quoted literals are basically Str <||> without multi-line support?

No every Fan string literal supports multi-line including Str<||>. The difference is whether \ and $ are escape/interpolation chars. They are in normal strings, they are not in Str<||>.

Login or Signup to reply.