Character Translation using Python (like the tr command)

See string.translate import string “abc”.translate(string.maketrans(“abc”, “def”)) # => “def” Note the doc’s comments about subtleties in the translation of unicode strings. And for Python 3, you can use directly: str.translate(str.maketrans(“abc”, “def”)) Edit: Since tr is a bit more advanced, also consider using re.sub.

PHP Transliteration

You can use iconv, which has a special transliteration encoding. When the string “//TRANSLIT” is appended to tocode, transliteration is activated. This means that when a character cannot be represented in the target character set, it can be approximated through one or several characters that look similar to the original character. — http://www.gnu.org/software/libiconv/documentation/libiconv/iconv_open.3.html See here … Read more

Remove diacritical marks (ń ǹ ň ñ ṅ ņ ṇ ṋ ṉ ̈ ɲ ƞ ᶇ ɳ ȵ) from Unicode chars

I have done this recently in Java: public static final Pattern DIACRITICS_AND_FRIENDS = Pattern.compile(“[\\p{InCombiningDiacriticalMarks}\\p{IsLm}\\p{IsSk}]+”); private static String stripDiacritics(String str) { str = Normalizer.normalize(str, Normalizer.Form.NFD); str = DIACRITICS_AND_FRIENDS.matcher(str).replaceAll(“”); return str; } This will do as you specified: stripDiacritics(“Björn”) = Bjorn but it will fail on for example Białystok, because the ł character is not diacritic. If … Read more