You can think of it as the Binary Search Algorithm.

In each iteration, we ask a question, which should eliminate roughly half of the possible word choices. If there are total of N words, then we can expect to get an answer after log2(N) questions.

With 20 question, we should optimally be able to find a word among 2^20 = 1 million words.

One easy way to eliminate outliers (wrong answers) would be to probably use something like RANSAC. This would mean, instead of taking into account all questions which have been answered, you randomly pick a smaller subset, which is enough to give you a single answer. Now you repeat that a few times with different random subset of questions, till you see that most of the time, you are getting the same result. you then know you have the right answer.

Of course this is just one way of many ways of solving this problem.