I expected the result to be like the result of the following command:
~ echo "aaaa" | sed s/a*/b/g
b
Any idea?
tcolarWed 26 Feb 2014
That seem correct "a*" means a then anything, so it will match "aa", twice
I think you probably wanted Regex("a+") if you want to replace all series of a by b. or maybe Regex("a.*") if you want to replace a followed by anything.
in sed s/a*/b/g you used /g which means greedy, so the * will behave like a .*
SlimerDudeWed 26 Feb 2014
Yep, Tcolar is right:
Regex.fromStr("a*").matcher("aaaa").replaceAll("b") // --> bb
Regex.fromStr("a+").matcher("aaaa").replaceAll("b") // --> b
Or to force a match on the entire Str, you can use the start ^ and end $ anchors:
Regex.fromStr("^a*\$").matcher("aaaa").replaceAll("b") // --> b
elyashivWed 26 Feb 2014
tcolar: According to this site the g doesn't mean greedy, but globally. The matching of aa is illogical - if the matching is greedy I will expect the matching to be aaaa, and if the matching is not greedy I will expect the matching to be a or epsilon (an empty string).
What I think have happened is that the matching matched aaaa and then matched the empty string in the end. This behavior is incorrect.
Underneath, Fantom is just using Java's Macther.replaceAll() so other than writing a new regex implementation, I don't think a lot can be done about it.
With regards to your example, you're right - Fantex shows the same results:
I also found this question on StackOverflow, posted 2 years ago - String.replaceAll() anomaly with greedy quantifiers in regex. The answer explains why the result is valid, and why it is different in sed. (In essence .* matches an empty string, which is replaced with b.)
elyashiv Wed 26 Feb 2014
I tried out the
Regex API
running the following code:I expected the result to be like the result of the following command:
Any idea?
tcolar Wed 26 Feb 2014
That seem correct "a*" means a then anything, so it will match "aa", twice
I think you probably wanted Regex("a+") if you want to replace all series of
a
by b. or maybe Regex("a.*") if you want to replace a followed by anything.in
sed s/a*/b/g
you used /g which means greedy, so the*
will behave like a.*
SlimerDude Wed 26 Feb 2014
Yep, Tcolar is right:
Or to force a match on the entire Str, you can use the start
^
and end$
anchors:elyashiv Wed 26 Feb 2014
tcolar: According to this site the
g
doesn't mean greedy, but globally. The matching ofaa
is illogical - if the matching is greedy I will expect the matching to beaaaa
, and if the matching is not greedy I will expect the matching to bea
or epsilon (an empty string).What I think have happened is that the matching matched
aaaa
and then matched the empty string in the end. This behavior is incorrect.A little testing proves me right:
SlimerDude Wed 26 Feb 2014
Underneath, Fantom is just using Java's Macther.replaceAll() so other than writing a new regex implementation, I don't think a lot can be done about it.
With regards to your example, you're right - Fantex shows the same results:
I also found this question on StackOverflow, posted 2 years ago - String.replaceAll() anomaly with greedy quantifiers in regex. The answer explains why the result is valid, and why it is different in
sed
. (In essence.*
matches an empty string, which is replaced withb
.)SlimerDude Wed 26 Feb 2014
A bit more reading suggests it's all about zero-width matches. This article tells us it's not consistent, even between browsers! - Watch Out for Zero-Length Matches
It really does seem to be a case of:
Some people, when confronted with a problem, think "I know, I'll use regular expressions." Now they have two problems.