Character class subtraction, converting from Java syntax to RegexBuddy

Like most regex flavors, java.util.regex.Pattern has its own specific features with syntax that may not be fully compatible with others; this includes character class union, intersection and subtraction:

  • [a-d[m-p]] : a through d, or m through p: [a-dm-p] (union)
  • [a-z&&[def]] : d, e, or f (intersection)
  • [a-z&&[^bc]] : a through z, except for b and c: [ad-z] (subtraction)

The most important “caveat” of Java regex is that matches attempts to match a pattern against the whole string. This is atypical of most engines, and can be a source of confusion at times.

See also

  • regular-expressions.info/Flavor Comparison and Java Flavor Notes

On character class subtraction

Subtraction allows you to define for example “all consonants” in Java as [a-z&&[^aeiou]].

This syntax is specific to Java. In XML Schema, .NET, JGSoft and RegexBuddy, it’s [a-z-[aeiou]]. Other flavors may not support this feature at all.

References

  • regular-expressions.info/Character Classes in XML Regular Expressions
  • MSDN – Regular Expression Character Classes – Subtraction

Related questions

  • What is the point behind character class intersections in Java’s Regex?

Leave a Comment