Why does ENcoding a string result in a DEcoding error (UnicodeDecodeError)?

“你好”.encode(‘utf-8′) encode converts a unicode object to a string object. But here you have invoked it on a string object (because you don’t have the u). So python has to convert the string to a unicode object first. So it does the equivalent of “你好”.decode().encode(‘utf-8′) But the decode fails because the string isn’t valid ascii. … Read more

“SyntaxError: Non-ASCII character …” or “SyntaxError: Non-UTF-8 code starting with …” trying to use non-ASCII text in a Python script

I’d recommend reading that PEP the error gives you. The problem is that your code is trying to use the ASCII encoding, but the pound symbol is not an ASCII character. Try using UTF-8 encoding. You can start by putting # -*- coding: utf-8 -*- at the top of your .py file. To get more … Read more

Removing unicode \u2026 like characters in a string in python2.7 [duplicate]

Python 2.x >>> s ‘This is some \\u03c0 text that has to be cleaned\\u2026! it\\u0027s annoying!’ >>> print(s.decode(‘unicode_escape’).encode(‘ascii’,’ignore’)) This is some text that has to be cleaned! it’s annoying! Python 3.x >>> s=”This is some \u03c0 text that has to be cleaned\u2026! it\u0027s annoying!” >>> s.encode(‘ascii’, ‘ignore’) b”This is some text that has to be … Read more

Python – ‘ascii’ codec can’t decode byte

“你好”.encode(‘utf-8′) encode converts a unicode object to a string object. But here you have invoked it on a string object (because you don’t have the u). So python has to convert the string to a unicode object first. So it does the equivalent of “你好”.decode().encode(‘utf-8′) But the decode fails because the string isn’t valid ascii. … Read more

How to print Unicode character in Python?

To include Unicode characters in your Python source code, you can use Unicode escape characters in the form \u0123 in your string. In Python 2.x, you also need to prefix the string literal with ‘u’. Here’s an example running in the Python 2.x interactive console: >>> print u’\u0420\u043e\u0441\u0441\u0438\u044f’ Россия In Python 2, prefixing a string … Read more

UnicodeDecodeError: ‘utf8’ codec can’t decode byte 0x9c

http://docs.python.org/howto/unicode.html#the-unicode-type str = unicode(str, errors=”replace”) or str = unicode(str, errors=”ignore”) Note: This will strip out (ignore) the characters in question returning the string without them. For me this is ideal case since I’m using it as protection against non-ASCII input which is not allowed by my application. Alternatively: Use the open method from the codecs … Read more