MySQL C# Text Encoding Problems

There are two things that you need to do to support UTF-8 in the ADO.NET Entity frame work (or in general using the MySQL .NET Connector): Ensure that the collation of your database of table is a UTF-8 collation (i.e. utf8_general_ci or one of its relations) Add Charset=utf8; to your connection string. “Server=localhost;Database=test;Uid=test;Pwd=test;Charset=utf8;” I’m not … Read more

latin-1 to ascii

So here are three approaches, more or less as given or suggested in other answers: # -*- coding: utf-8 -*- import codecs import unicodedata x = u”Wikipédia, le projet d’encyclopédie” xtd = {ord(u’’’): u”‘”, ord(u’é’): u’e’, } def asciify(error): return xtd[ord(error.object[error.start])], error.end codecs.register_error(‘asciify’, asciify) def ae(): return x.encode(‘ascii’, ‘asciify’) def ud(): return unicodedata.normalize(‘NFKD’, x).encode(‘ASCII’, ‘ignore’) … Read more

Convert a unicode String In C++ To Upper Case

If your system is already in UTF-8, by using std::use_facet, you can write: #include <iostream> #include <locale.h> int main() { std::locale::global(std::locale(“”)); // (*) std::wcout.imbue(std::locale()); auto& f = std::use_facet<std::ctype<wchar_t>>(std::locale()); std::wstring str = L”Zoë Saldaña played in La maldición del padre Cardona.”; f.toupper(&str[0], &str[0] + str.size()); std::wcout << str << std::endl; return 0; } And you get … Read more

Writing unicode strings via sys.stdout in Python

export PYTHONIOENCODING=utf-8 will do the job, but can’t set it on python itself … what we can do is verify if isn’t setting and tell the user to set it before call script with : if __name__ == ‘__main__’: if (sys.stdout.encoding is None): print >> sys.stderr, “please set python env PYTHONIOENCODING=UTF-8, example: export PYTHONIOENCODING=UTF-8, when … Read more

requests.get returns 403 while the same url works in browser

Well that’s because default User-Agent of requests is python-requests/2.13.0, and in your case that website don’t like traffic from “non-browsers”, so they try to block such traffic. >>> import requests >>> session = requests.Session() >>> session.headers {‘Connection’: ‘keep-alive’, ‘Accept-Encoding’: ‘gzip, deflate’, ‘Accept’: ‘*/*’, ‘User-Agent’: ‘python-requests/2.13.0’} All you need to do is to make the request … Read more

Does \w match all alphanumeric characters defined in the Unicode standard?

perldoc perlunicode says Character classes in regular expressions match characters instead of bytes and match against the character properties specified in the Unicode properties database. \w can be used to match a Japanese ideograph, for instance. So it looks like the answer to your question is “yes”. However, you might want to use the \p{} … Read more

C++ & Boost: encode/decode UTF-8

Thanks everyone, but ultimately I resorted to http://utfcpp.sourceforge.net/ — it’s a header-only library that’s very lightweight and easy to use. I’m sharing a demo code here, should anyone find it useful: inline void decode_utf8(const std::string& bytes, std::wstring& wstr) { utf8::utf8to32(bytes.begin(), bytes.end(), std::back_inserter(wstr)); } inline void encode_utf8(const std::wstring& wstr, std::string& bytes) { utf8::utf32to8(wstr.begin(), wstr.end(), std::back_inserter(bytes)); } … Read more