UnicodeDecodeError: ‘utf8’ codec can’t decode byte 0x9c

Question

http://docs.python.org/howto/unicode.html#the-unicode-type

str = unicode(str, errors="replace")

or

str = unicode(str, errors="ignore")

Note: This will strip out (ignore) the characters in question returning the string without them.

For me this is ideal case since I’m using it as protection against non-ASCII input which is not allowed by my application.

Alternatively: Use the open method from the codecs module to read in the file:

import codecs
with codecs.open(file_name, 'r', encoding='utf-8',
                 errors="ignore") as fdata: