This pattern works:
$pattern = '~ \( (?: [^()]+ | (?R) )*+ \) ~x';
The content inside parenthesis is simply describe:
“all that is not parenthesis OR recursion (= other parenthesis)” x 0 or more times
If you want to catch all substrings inside parenthesis, you must put this pattern inside a lookahead to obtain all overlapping results:
$pattern = '~(?= ( \( (?: [^()]+ | (?1) )*+ \) ) )~x';
preg_match_all($pattern, $subject, $matches);
print_r($matches[1]);
Note that I have added a capturing group and I have replaced (?R)
by (?1)
:
(?R) -> refers to the whole pattern (You can write (?0) too)
(?1) -> refers to the first capturing group
What is this lookahead trick?
A subpattern inside a lookahead (or a lookbehind) doesn’t match anything, it’s only an assertion (a test). Thus, it allows to check the same substring several times.
If you display the whole pattern results (print_r($matches[0]);
), you will see that all results are empty strings. The only way to obtain the substrings found by the subpattern inside the lookahead is to enclose the subpattern in a capturing group.
Note: the recursive subpattern can be improved like this:
\( [^()]*+ (?: (?R) [^()]* )*+ \)