Remove diacritical marks (ń ǹ ň ñ ṅ ņ ṇ ṋ ṉ ̈ ɲ ƞ ᶇ ɳ ȵ) from Unicode chars
I have done this recently in Java: public static final Pattern DIACRITICS_AND_FRIENDS = Pattern.compile(“[\\p{InCombiningDiacriticalMarks}\\p{IsLm}\\p{IsSk}]+”); private static String stripDiacritics(String str) { str = Normalizer.normalize(str, Normalizer.Form.NFD); str = DIACRITICS_AND_FRIENDS.matcher(str).replaceAll(“”); return str; } This will do as you specified: stripDiacritics(“Björn”) = Bjorn but it will fail on for example Białystok, because the ł character is not diacritic. If … Read more