Try the following regular expression:
^(?:[\p{L}\p{Mn}\p{Pd}\'\x{2019}]+\s[\p{L}\p{Mn}\p{Pd}\'\x{2019}]+\s?)+$
In PHP this translates to:
if (preg_match('~^(?:[\p{L}\p{Mn}\p{Pd}\'\x{2019}]+\s[\p{L}\p{Mn}\p{Pd}\'\x{2019}]+\s?)+$~u', $name) > 0)
{
// valid
}
You should read it like this:
^ # start of subject
(?: # match this:
[ # match a:
\p{L} # Unicode letter, or
\p{Mn} # Unicode accents, or
\p{Pd} # Unicode hyphens, or
\' # single quote, or
\x{2019} # single quote (alternative)
]+ # one or more times
\s # any kind of space
[ #match a:
\p{L} # Unicode letter, or
\p{Mn} # Unicode accents, or
\p{Pd} # Unicode hyphens, or
\' # single quote, or
\x{2019} # single quote (alternative)
]+ # one or more times
\s? # any kind of space (0 or more times)
)+ # one or more times
$ # end of subject
I honestly don’t know how to port this to Javascript, I’m not even sure Javascript supports Unicode properties but in PHP PCRE this seems to work flawlessly @ IDEOne.com:
$names = array
(
'Alix',
'André Svenson',
'H4nn3 Andersen',
'Hans',
'John Elkjærd',
'Kristoffer la Cour',
'Marco d\'Almeida',
'Martin Henriksen!',
);
foreach ($names as $name)
{
echo sprintf('%s is %s' . "\n", $name, (preg_match('~^(?:[\p{L}\p{Mn}\p{Pd}\'\x{2019}]+\s[\p{L}\p{Mn}\p{Pd}\'\x{2019}]+\s?)+$~u', $name) > 0) ? 'valid' : 'invalid');
}
I’m sorry I can’t help you regarding the Javascript part but probably someone here will.
Validates:
- John Elkjærd
- André Svenson
- Marco d’Almeida
- Kristoffer la Cour
Invalidates:
- Hans
- H4nn3 Andersen
- Martin Henriksen!
To replace invalid characters, though I’m not sure why you need this, you just need to change it slightly:
$name = preg_replace('~[^\p{L}\p{Mn}\p{Pd}\'\x{2019}\s]~u', '$1', $name);
Examples:
- H4nn3 Andersen -> Hnn Andersen
- Martin Henriksen! -> Martin Henriksen
Note that you always need to use the u modifier.