What algorithm does Readability use for extracting text from URLs?
Readability mainly consists of heuristics that “just somehow work well” in many cases. I have written some research papers about this topic and I would like to explain the background of why it is easy to come up with a solution that works well and when it gets hard to get close to 100% accuracy. … Read more