Hangman Strategy

Problem: Devise strategy for a computer to play hangman.

Details: The computer needs to guess a word using as few guesses as possible, and make no more than maxWrongGuesses incorrect guesses.
At end of the game, calculate the score for a word to be:

# letter guesses + # number of incorrect word guesses if the computer guessed the word right before making maxWrongGuesses incorrect guesses

or

25 if the computer lost the game before guessing the word correctly.

Solution: The computer starts with a set P of candidate words that initially contains all words from dictionary. Then at any point of time, the computer will guess letter x that will result in discarding maximum # of words from P. The computer finds the answer when only one word is left in P.
Let n = # of words in P
Let G = the computer’s guess so far. To take an example, lets say the word to be guessed is DREAM. When game starts G = —– and P will contain all five letter words in the English dictionary
To calculate the expected cardinality of P, conditional on guessing x, and assuming each word in the dictionary is equally likely to be the secret word that the computer has been asked to guess

foreach (var x in C)
{
   int sum = 0;
   foreach (var word in P)
   {
      // calculate cardinality of P that will result if
      // secret word to be guessed is word, and x is guessed
      // by the computer, and G is the guess so far.
      sum += Cardinality(x, word, G);
   }
   e[x] = sum;
}

The computer guesses x that minimizes e over the range of possible characters C to guess from.
When computer has guessed x in a turn, it will never consider guessing x again in subsequent turns (whether it was wrong or right)
i.e., after guessing x, remove x from C, and also revise P appropriately.
If x is a wrong guess, discard all words from P that have x in them;
else if x is a correct guess, update G and filter out words in P that don’t match G.

Implementation: (full code is not shown below)

using System.Collections.Generic;
using System.Diagnostics;
using System.Linq;

namespace Hangman
{
    public class SidsStrategy : GuessingStrategy
    {
        private HashSet<char> letters = new HashSet<char>(new char[] { 'A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M', 'N', 'O', 'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'W', 'X', 'Y', 'Z' });
        private readonly HashSet<string> words = new HashSet<string>();
        private char lastGuess;
        private string lastSofar;

        public SidsStrategy(IEnumerable<string> dictionary)
        {
            foreach (var word in dictionary)
            {
                words.Add(word.ToUpper());
            }
        }

        public Guess nextGuess(HangmanGame game)
        {
            var sofar = game.getGuessedSoFar();
            Filter(game);
            int N = words.Count;
            if (N == 1)
            {
                return new GuessWord(words.First());
            }
            else
            {
                Dictionary<char, double> dict = new Dictionary<char, double>();
                foreach (var ch in letters)
                {
                    int sum = 0;
                    foreach (var word in words)
                    {
                        // calculate cardinality of P assuming secret word to be guessed is word, and we guess ch
                        sum += Cardinality(ch, word, sofar);
                    }
                    dict[ch] = sum;
                }
                lastGuess = ArgMin(dict);
                letters.Remove(lastGuess);
                lastSofar = sofar;
                return new GuessLetter(lastGuess);
            }
        }

        private int Cardinality(char ch, string secretWord, string sofar)
        {
            Debug.Assert(secretWord.Length == sofar.Length);
            char[] array = new char[sofar.Length];
            bool flag = false;
            for (int i = 0; i < sofar.Length; i++)
            {
                if (sofar[i] == HangmanGame.MYSTERY_LETTER &&
                    secretWord[i] == ch)
                {
                    array[i] = ch;
                    flag = true;    // ch will be a correct guess
                }
                else
                {
                    array[i] = sofar[i];
                }
            }
            string postGuess = new string(array);   // if we guess ch, postGuess is what we will end up with
            if (flag)
            {
                int answer = 0;
                foreach(var word in words)
                {
                    if (isEligible(word, postGuess))
                    {
                        answer++;
                    }
                }
                return answer;
            }
            else
            {
                // ch is not present in the secret word
                int answer = 0;
                foreach (var word in words)
                {
                    if (!word.Contains(ch))
                    {
                        answer++;
                    }
                }
                return answer;
            }
        }

        private char ArgMin(Dictionary<char, double> dict)
        {
            double min = double.MaxValue;
            char answer = '0';
            foreach (var kvp in dict)
            {
                if (kvp.Value < min)
                {
                    min = kvp.Value;
                    answer = kvp.Key;
                }
            }
            return answer;
        }

        private void Filter(HangmanGame game)
        {
            var sofar = game.getGuessedSoFar();
            HashSet<string> discard = new HashSet<string>();
            if (sofar == lastSofar)
            {
                // we guessed incorrectly, discard all words that have lastGuess in them
                foreach (var word in words)
                {
                    if (word.Contains(lastGuess))
                    {
                        discard.Add(word);
                    }
                }
            }
            else
            {
                foreach (var word in words)
                {
                    if (!isEligible(word, sofar))
                    {
                        discard.Add(word);
                    }
                }
            }
            foreach (var word in discard)
            {
                words.Remove(word);
            }
        }

        private bool isEligible(string candidate, string sofar)
        {
            if (candidate.Length != sofar.Length)
            {
                return false;
            }
            for (int i = 0; i < candidate.Length; i++)
            {
                var ch1 = candidate[i];
                var ch2 = sofar[i];
                if (ch1 != ch2 && ch2 != HangmanGame.MYSTERY_LETTER)
                {
                    return false;
                }
            }
            return true;
        }
    }
}

Performance: This algorithm turns out to be woefully slow, and cannot be used in any practical scenario.

Revision: Instead of computing expected cardinality conditional on guessing x, we just consider two cases:

Case1: x is a right guess

Case2: x is a wrong guess

f(x) = p1*n1 + p2*n2

p1 = probability of case 1 = n1 / (n1 + n2)

p2 = probability of case 2 = n2 / (n1 + n2)

n1 = # of words in P that contain x

n2 = # of words in P that do not contain x

Strategy: Choose x that minimizes f. This algorithm is practical and scores for sample test words are shown below (maxWrongGuesses is set to 5):

COMAKER = 9

CUMULATE = 9

ERUPTIVE = 25

FACTUAL = 7

MONADISM = 6

MUS = 25

NAGGING = 25

OSES = 4

REMEMBERED = 25

SPODUMENES = 7

STEREOISOMERS = 25

TOXICS = 25

TRICHROMATS = 5

TRIOSE = 6

UNIFORMED = 25

MODEL = 8

CAR = 7

RAILWAY = 5

ROAD = 5

TELEPHONE = 4

EXCHANGE = 6

AUTHORITY = 5

COMPLAINTS = 7

FURNITURE = 5

HOUSE = 7

NOTEBOOK = 5

LONG = 6

SHORT = 8

TALL = 25

THIN = 25

FAT = 6

PLAZA = 6

RELOCATIONS = 6

ACCOUNTS = 7

ADDRESS = 7

WEBSITE = 8

AVERAGE SCORE = 11

The problem with this algorithm is that there is no penalty for making a wrong guess. Consider guesses x and y.
x is correct guess, and y is wrong guess.
If both reduce the search space P by equal amounts, both are weighed equally. We would like to choose x over
y in such a case. To factor this in, we make following empirical modification:

f(x) = p1*n1 + (1 + α/maxWrongGuesses)*p2*n2

The new scores are shown below (α set to 5):

COMAKER = 25

CUMULATE = 10

ERUPTIVE = 6

FACTUAL = 9

MONADISM = 6

MUS = 25

NAGGING = 5

OSES = 4

REMEMBERED = 6

SPODUMENES = 6

STEREOISOMERS = 3

TOXICS = 9

TRICHROMATS = 5

TRIOSE = 10

UNIFORMED = 6

MODEL = 8

CAR = 7

RAILWAY = 7

ROAD = 5

TELEPHONE = 6

EXCHANGE = 6

AUTHORITY = 7

COMPLAINTS = 6

FURNITURE = 6

HOUSE = 7

NOTEBOOK = 6

LONG = 6

SHORT = 8

TALL = 25

THIN = 25

FAT = 6

PLAZA = 6

RELOCATIONS = 9

ACCOUNTS = 6

ADDRESS = 5

WEBSITE = 7

AVERAGE SCORE = 8.58333333333333

An alternative to try if you are one of those who don’t like arbitrary constants like α in the algorithm above, is to set α=0 and if the computer guessed wrong, do not update P. This was actually a bug in my program. I was not updating P in case of a wrong guess, and was using f(x) = p1*n1 + p2*n2. When I realized the bug, and updated P in case of wrong guess, I found much to my surprise that my scores actually increased by quite a bit instead of decreasing! After much debugging, I realized the need of an α. Anyway the results with α=0, and not updating P in case of a wrong guess are as follows:

COMAKER = 8
CUMULATE = 10
ERUPTIVE = 10
FACTUAL = 8
MONADISM = 8
MUS = 25
NAGGING = 5
OSES = 4
REMEMBERED = 25
SPODUMENES = 9
STEREOISOMERS = 5
TOXICS = 11
TRICHROMATS = 9
TRIOSE = 7
UNIFORMED = 8
MODEL = 10
CAR = 6
RAILWAY = 6
ROAD = 5
TELEPHONE = 6
EXCHANGE = 8
AUTHORITY = 6
COMPLAINTS = 7
FURNITURE = 8
HOUSE = 7
NOTEBOOK = 6
LONG = 8
SHORT = 8
TALL = 6
THIN = 25
FAT = 6
PLAZA = 9
RELOCATIONS = 6
ACCOUNTS = 10
ADDRESS = 8
WEBSITE = 7
AVERAGE SCORE = 8.88888888888889

Download code from https://skydrive.live.com/self.aspx/.Public/Hangman.zip?cid=c1da670a9c473d39&sc=documents

This entry was posted in Software. Bookmark the permalink.

Leave a comment