admin 管理员组

文章数量: 1086019

While learning regular expressions in javascript using JavaScript: The Definitive Guide, I was confused by this passage:

But /a+?/ matches one or more occurrences of the letter a, matching as few characters as necessary. When applied to the same string, this pattern matches only the first letter a.

Now let’s use the nongreedy version: /a+?b/. This should match the letter b preceded by the fewest number of a’s possible. When applied to the same string “aaab”, you might expect it to match only one a and the last letter b. In fact, however, this pattern matches the entire string, just like the greedy version of the pattern.

Why is this so?

This is the explanation from the book:

This is because regular-expression pattern matching is done by finding the first position in the string at which a match is possible. Since a match is possible starting at the first character of the string,shorter matches starting at subsequent characters are never even considered.

I don't understand. Can anyone give me a more detailed explanation?

While learning regular expressions in javascript using JavaScript: The Definitive Guide, I was confused by this passage:

But /a+?/ matches one or more occurrences of the letter a, matching as few characters as necessary. When applied to the same string, this pattern matches only the first letter a.

Now let’s use the nongreedy version: /a+?b/. This should match the letter b preceded by the fewest number of a’s possible. When applied to the same string “aaab”, you might expect it to match only one a and the last letter b. In fact, however, this pattern matches the entire string, just like the greedy version of the pattern.

Why is this so?

This is the explanation from the book:

This is because regular-expression pattern matching is done by finding the first position in the string at which a match is possible. Since a match is possible starting at the first character of the string,shorter matches starting at subsequent characters are never even considered.

I don't understand. Can anyone give me a more detailed explanation?

Share Improve this question edited Jul 23, 2014 at 6:33 zx81 41.9k10 gold badges92 silver badges106 bronze badges asked Jul 23, 2014 at 0:33 kursk.yekursk.ye 5491 gold badge5 silver badges14 bronze badges 3
  • Read up on FSMs, en.wikipedia/wiki/Finite-state_machine – elclanrs Commented Jul 23, 2014 at 0:37
  • or they are both lazy. – bumbumpaw Commented Jul 23, 2014 at 0:38
  • "matching as few characters as necessary" - pare with a+(?!=a) – user2864740 Commented Jul 23, 2014 at 0:42
Add a ment  | 

4 Answers 4

Reset to default 5

Okay, so you have your search space, "aaabc", and your pattern, /a+?b/

Does /a+?b/ match "a"? No.

Does /a+?b/ match "aa"? No.

Does /a+?b/ match "aaa"? No.

Does /a+?b/ match "aaab"? Yes.

Since you're matching literal characters and not any sort of wildcard, the regular expression a+?b is effectively the same as a+b anyway. The only type of sequence either one will match is a string of one or more a characters followed by a single b character. The non-greedy modifier makes no difference here, as the only thing an a can possibly match is an a.

The non-greedy qualifier bees interesting when it's applied to something that can take on lots of different values, like .. (edit or cases where there's interesting stuff to the left of something like a+?)

edit — if you're expecting a+?b to match just the last a before the b in aaab, well that's not how it works. Searching for a pattern in a string implicitly means to search for the earliest occurrence of the pattern. Thus, though starting from the last a does give a substring that matches the pattern, it's not the first substring that matches.

The Engine Attempts a Match at the Beginning of the String

Can anyone give me a more detailed explanation?

Yes.

In short: .+? does not look for a shortest match globally, at the level of the entire string, but locally, from the position in the string where the engine is currently positioned.

How the Engine Works

When you try a regex against the string aaab, the engine first tries to find a match starting at the very first position in the string. That position is the position before the first a. If the engine cannot find a match at the first position, it moves on and tries again starting from the second position (between the first and second a)

So can a match be found by the regex a+?b at the first position? Yes.

  • a matches the first a
  • The +? quantifiers tells the engine to match the fewest number of a chars necessary. Since we are looking to return a match, necessary means that the following tokens (in this case) have to be allowed to match. In this case, the fewest number of a chars needed to allow the b to match is all the remaining a chars.
  • b matches

In the details the second point is a bit more plex (the engine tries to match b against the second a, fails, backtracks...) but you don't need to worry about that.

'?' after a+ means minimum number of characters to satisfy expression. /a+/ means one 'a' or as many as you can encounter before some other character. In order to satisfy /a+?/ (since it's nogreedy) it only needs single 'a'.

In order to satisfy /a+?b/, since we have 'b' at the end, in order to satisfy this expression it needs to match one or more 'a' before it hits 'b'. It has to hit that 'b'. /a+/ doesn't have to hit b because RegEx doesn't ask for that. /a+?b/ has to hit that 'b'.

Just think about it. What other meaning /a+?b/ could have?

Hope this helps

本文标签: javascriptSince quotaquot is Lazy Why does quotabquot Match quotaaabquotStack Overflow