#2302 How about... Regex.quote(Str str)

SlimerDude Sun 22 Jun 2014

Another nice addition might be Regex.quote(Str str); a static method on Regex that returns a Regex that matches the given literal string. Literal as in, no characters or escape sequences in the string have any special meaning - it would be matched as is.

It is similar to Pattern.quote() in Java and has 2 main usages.

  1. The first is to escape (or quote) a string that has a lot of dodgy characters that you're unsure about:
    regex := Regex.quote("<div>i := arg ?: -1</div>")
  2. The other is to construct larger regex patterns from Strs that you have no control over and that may (or may not) contain dodgy chars. Example:
    Bool containsCaseInsensitive(Str contains) {
       quoted := Regex.quote(contains)
       return "(?i)${quoted}".toRegex.matcher(this.text).find
    }

Taking it further, it may also be nice to add a convenience method to Str:

Str.toRegexQuoted() or
Str.toRegexLiteral()

For those who need it now, my current implementation is an altered form of Regex.glob() and looks like:

static Regex quote(Str str) {
    quoted := StrBuf()
    str.each |c| {   
        if (!c.isAlphaNum)
            quoted.addChar('\\')
        quoted.addChar(c)
    }
    return quoted.toStr.toRegex
}

brian Sun 22 Jun 2014

By convention we have a toCode method. Good example is the sys::Str.toCode method which provides different options for quoting/escaping strings. I think we'd want to do same thing for Regex (I agree its currently a gap). I am not sure I fully understand your different use cases, could you propose a single Regex.toCode method with options that might solve the different problems?

SlimerDude Tue 24 Jun 2014

I don't think toCode() is a good fit, for the idea (of toCode()) is that it represents the same object. The idea of quoted() (which may be a bad name also) is that creates a different Regex.

A better, and more explicit, example would be trying to match the char sequence .* in a string.

// won't work, 'cos it matches everthing!
regex := Regex.fromStr(".*")

// will work
regex := Regex.quoted(".*")  // --> Regex.fromStr("\\.\\*")

As for the use cases, I was just saying that you can't always escape the Str yourself, as sometimes the Str is passed to you and it could contain anything.

One cannot just use Str.index() because you usually want to embed the output of quoted() in a larger expression (see use case 2. for a lame example).

To be honest, for most usages, the result doesn't need to be a Regex but a Str. For it's the Str that gets used in a Regex expression:

pattern := ".*".toRegexLiteral   // --> "\\.\\*"
regex   := "find ${pattern} me".toRegex

So going back to toCode(), the problem I see with it, is that any Regex generated from the quoted version, will be different to the original.

brian Tue 24 Jun 2014

Okay I see now. I'm fine with Regex.quote which performs just like Java's Pattern.quote

SlimerDude Tue 24 Jun 2014

But now you mention it, Regex is lacking a toCode() method! ;)

And Regex.fromStr() is (still?) a static method static Regex fromStr(), and not a ctor. Is that correct? As in I get a compiler warning with:

// warn: Using static method 'sys::Regex.fromStr' as constructor
reg := Regex("wotever")

Other sys objects like Date define their fromStr() as static new fromStr().

brian Tue 24 Jun 2014

Other sys objects like Date define their fromStr() as static new fromStr().

If you are working on a patch, then let's fix that. It should be a static new

SlimerDude Wed 25 Jun 2014

Cool, I'll add it to the patch.

Login or Signup to reply.