Comparing two .txt files using difflib in Python

For starters, you need to pass strings to difflib.SequenceMatcher, not files: # Like so difflib.SequenceMatcher(None, str1, str2) # Or just read the files in difflib.SequenceMatcher(None, file1.read(), file2.read()) That’ll fix your error. To get the first non-matching string, see the difflib documentation.

High performance fuzzy string comparison in Python, use Levenshtein or difflib [closed]

In case you’re interested in a quick visual comparison of Levenshtein and Difflib similarity, I calculated both for ~2.3 million book titles: import codecs, difflib, Levenshtein, distance with codecs.open(“titles.tsv”,”r”,”utf-8″) as f: title_list = f.read().split(“\n”)[:-1] for row in title_list: sr = row.lower().split(“\t”) diffl = difflib.SequenceMatcher(None, sr[3], sr[4]).ratio() lev = Levenshtein.ratio(sr[3], sr[4]) sor = 1 – distance.sorensen(sr[3], … Read more