Invert match with regexp [duplicate]

Okay, I have refined my regular expression based on the solution you came up with (which erroneously matches strings that start with ‘test’). ^((?!foo).)*$ This regular expression will match only strings that do not contain foo. The first lookahead will deny strings beginning with ‘foo’, and the second will make sure that foo isn’t found … Read more

Unicode Regex; Invalid XML characters

I know this isn’t exactly an answer to your question, but it’s helpful to have it here: Regular Expression to match valid XML Characters: [\u0009\u000a\u000d\u0020-\uD7FF\uE000-\uFFFD] So to remove invalid chars from XML, you’d do something like // filters control characters but allows only properly-formed surrogate sequences private static Regex _invalidXMLChars = new Regex( @”(?<![\uD800-\uDBFF])[\uDC00-\uDFFF]|[\uD800-\uDBFF](?![\uDC00-\uDFFF])|[\x00-\x08\x0B\x0C\x0E-\x1F\x7F-\x9F\uFEFF\uFFFE\uFFFF]”, RegexOptions.Compiled); … Read more

RegEx BackReference to Match Different Values

Note that \g{N} is equivalent to \1, that is, a backreference that matches the same value, not the pattern, that the corresponding capturing group matched. This syntax is a bit more flexible though, since you can define the capture groups that are relative to the current group by using – before the number (i.e. \g{-2}, … Read more

What’s the technical reason for “lookbehind assertion MUST be fixed length” in regex?

Lookahead and lookbehind aren’t nearly as similar as their names imply. The lookahead expression works exactly the same as it would if it were a standalone regex, except it’s anchored at the current match position and it doesn’t consume what it matches. Lookbehind is a whole different story. Starting at the current match position, it … Read more

Is it possible to define a pattern and reuse it to capture multiple groups?

To reuse a pattern, you could use (?n) where n is the number of the group to repeat. For example, your actual pattern : (PAT),(PAT), … ,(PAT) can be replaced by: (PAT),(?1), … ,(?1) (?1) is the same pattern as (PAT)whatever PAT is. You may have multiple patterns: (PAT1),(PAT2),(PAT1),(PAT2),(PAT1),(PAT2),(PAT1),(PAT2) may be reduced to: (PAT1),(PAT2),(?1),(?2),(?1),(?2),(?1),(?2) or: … Read more

PHP Regex: How to match \r and \n without using [\r\n]?

PCRE and newlines PCRE has a superfluity of newline related escape sequences and alternatives. Well, a nifty escape sequence that you can use here is \R. By default \R will match Unicode newlines sequences, but it can be configured using different alternatives. To match any Unicode newline sequence that is in the ASCII range. preg_match(‘~\R~’, … Read more

Match a^n b^n c^n (e.g. “aaabbbccc”) using regular expressions (PCRE)

Inspired by NullUserExceptions answer (which he already deleted as it failed for one case) I think I have found a solution myself: $regex = ‘~^ (?=(a(?-1)?b)c) a+(b(?-1)?c) $~x’; var_dump(preg_match($regex, ‘aabbcc’)); // 1 var_dump(preg_match($regex, ‘aaabbbccc’)); // 1 var_dump(preg_match($regex, ‘aaabbbcc’)); // 0 var_dump(preg_match($regex, ‘aaaccc’)); // 0 var_dump(preg_match($regex, ‘aabcc’)); // 0 var_dump(preg_match($regex, ‘abbcc’)); // 0 Try it yourself: … Read more