http://docs.python.org/howto/unicode.html#the-unicode-type
str = unicode(str, errors="replace")
or
str = unicode(str, errors="ignore")
Note: This will strip out (ignore) the characters in question returning the string without them.
For me this is ideal case since I’m using it as protection against non-ASCII input which is not allowed by my application.
Alternatively: Use the open method from the codecs
module to read in the file:
import codecs
with codecs.open(file_name, 'r', encoding='utf-8',
errors="ignore") as fdata: