Optimal Guesses in Wordle Using Entropy • Oren Bell

There’s been a lot of strategies to find optimal guesses in wordle. Some involving letter frequencies or linquistics. Here we’re going to examine the fundamental principles on how information is conveyed and use entropy to solve wordle.

All code is available on my github

Introduction to Entropy

Imagine a coin flip. Two options: heads and tails. Equally likely. 50/50 odds. Guessing the outcome will never be any better than random chance. In the world of information science, we can assign a 1 to be heads and a 0 to be tails. Flipping the coin is akin to reading a bit of information.

Now say we have a trick coin: both sides are heads.

via GIPHY

In this situation, we don’t even need to flip the coin to know the outcome. No information is conveyed, because there’s no uncertainty. In other words, 0 bits of information.

But what if the coin is weighted? There’s still a heads and tails, but the odds of heads or tails is now 60/40. In this scenario, a betting man would always guess heads. They would be wrong a lot of the time, but they’re right more often than not. Clearly this situation has some uncertainty, so it’s more than 0 bits of information, but it’s not as uncertain as 50/50 chance, so less than 1 bit. To reconcile this dilemma, we introduce: the entropy equation.

Given an event with a set of possible outcomes (maybe 2, maybe more) of varying probabilities, this formula calculates how much “information” is gained from observing the result of the event. A event with 100 equally likely outcomes is very unpredictable, and therefore has high entropy. An event with a guaranteed outcome is said to have an entropy of zero. And biased events (like a weighted coin flip) are said to have less entropy than purely random chance, but still more than zero.

If we imagine the probability of all outcomes like a histogram, the entropy is a measure of how “level” that histogram is.

Entropy of coin flips of different weights

Applying Entropy to Wordle

Imagine you have to guess from among the following list of 5-letter words: naval, banal, dandy, pagan, manga, lanky, mangy, manly, valid, lands

If you arbitrarily guess naval, then you’ll get one of the following sets of clues depending on the actual answer.

naval		naval
		banal
		dandy, lanky, mangy, balls
		pagan
		manga
		valid, lands

If the answer is “naval”, obviously the game ends here. If the answer is banal, pagan, or manga, then we’ll know enough to get it right on the next guess. But if our clue says , then it could be any of 4 different words. The trick is to perform this check for every possible guess, and see which one narrows down the list the most. Turns out, this optimal word is “dingy”, which isn’t even on the list.

dingy		lands
		balls
		pagan
		valid
		naval
		banal
		dandy
		lanky
		mangy
		manga

True, we surrender the possibility that we get lucky on the first guess, but that was 1:10 odds anyways, and now we can guarantee we get it on guess #2.

But we can’t always narrow it down to individual words. There’s 12,964 legal guesses, 2,315 possible answers, and only 243 permutations of colored squares. The best bet is to find the guess that is most likely to narrow down the list. Find the guess with the most entropy.

On guess #1, this word is “raise”.

Putting it in Code

Wordle has two word lists it uses. One of every 5-letter word in the English language, and another smaller list of words people actually use. It’s the latter that’s used to generate the word of the day. If a word isn’t on the second list, it will never be an answer. Here is that list.

Basic Structure

We’re going to need two functions. get_mask is a utility function that will return an encoding of the colored squares given by a hypothetical guess and answer. make_guess is the meat of the script. Given a list of legal guesses and potential answers, it returns the optimal guess.

import math

def get_mask(guess, answer):
    # TODO:
    return 0

def make_guess(guessing_list, answer_list):
    # TODO
    return guessing_list[0]

guessing_list = []

with open("wordle_dictionary.txt", 'r') as fin:
    for line in fin:
        guessing_list.append(line[:5])   # Remove newline

answer_list = set(guessing_list.copy())  # Make a set, so lookups are efficient

while len(answer_list) > 1:
    best_guess = make_guess(guessing_list, answer_list)
    print("Try", best_guess)

    mask = int(input("What was the result? (in 0,1,2s): "))
    answer_list = [a for a in answer_list if get_mask(best_guess, a) == mask]

if len(answer_list) == 0:
    print("One of us messed up somewhere cause there's no more possible words")
elif len(answer_list) == 1:
    print("Answer is", answer_list[0])

As an initialization step, we populate the guessing and answer lists. Then a loop is entered where a guess is made and the answer list is narrowed down These two steps repeat until the answer list is narrowed down to one word.

Generate Mask

For the get_mask utility function, we want to return a 5 digit number. 0 for absent letters, 1 for letters in the wrong position, and 2 for a perfect match. Note the potential for duplicate letters to mess this up. In order to avoid this we need to set some ground rules: exact matches take precedent, only one clue per letter in our guess, and only one clue per letter in our answer. Eg guessing “troll” when the answer is “smile” should return 1 exact match for an L, but the other L should be no-match.

def get_mask(guess, answer):
    # Mask defaults to all wrong and convert strings to lists
    guess = list(guess)
    answer = list(answer)
    mask = [0,0,0,0,0]

    # If exact match
    for i in range(5):
        if guess[i] == answer[i]:
            mask[i] = 2
            guess[i] = 'X'     # Make sure it doesn't get double counted
            answer[i] = 'Y'

    # Find anything in wrong spot
    for i,l in enumerate(guess):
        try:
            j = answer.index(l)
            mask[i] = 1
            answer[j] = 'Y'
        except ValueError as err:
            pass
    
    # Convert to number (little-endian, so it reads the same way it was typed in)
    mask.reverse()
    return sum([(10**i)*v for i,v in enumerate(mask)])

We find the exact matches first, blotting out the relevant letters in guess and answer. Then we iterate through remaining letters in guess and see if they’re present in answer. (If they aren’t it’ll throw a ValueError, which we just ignore.)

Make Guess

def make_guess(guessing_list, answer_list):
    max_entropy = -1
    best_guess = "idk"

    for guess in guessing_list:
        bins = {}
        for a in answer_list:
            mask = get_mask(guess, a)
            if not mask in bins.keys():
                bins[mask] = 1
            else:
                bins[mask] += 1

        # Replace best guess if there's a word with better entropy
        new_entropy = compute_entropy(bins.values())
        if new_entropy > max_entropy:
            max_entropy = new_entropy
            best_guess = guess

    return best_guess

We assume every word in answer_list is also in guessing_list. The entropy of a wordle guess is found by testing all possible answers, and tallying up how many answers correspond to a specific clue. If the clue is all gray squares, the list of remaining answers will be relatively large. If the clue is all green, then you have narrowed it down to one word, the answer.

Once that histogram is tallied up, just take number of answers for each clue, as a list of integers, and compute the entropy. That’s the entropy of a particular guess. Repeat for all guesses, and return the one with the highest score.

Compute Entropy

As a reminder, the formula for entropy is:

P(x_i) is supposed to be a probability. Meaning between 0 and 1. Since we have a list of positive integers, we have to normalize it first. So sum them up to find the total number of possible words.

def compute_entropy(int_list):
    total = sum(int_list)
    entropy = 0.0

    for i in int_list:
        p = i / total
        entropy -= math.log(p) * p

    return entropy

The sum can be computed with a for loop. For each integer, convert it to a number between 0 and 1 by dividing by the total. Then compute the p*log(p) portion and add it to the sum (or subtract from the sum, as it were).

Return the resulting sum.

Fix those Pesky Quirks

If you run the code as is, you’ll notice several things.

First off, when you get down to 2-3 guesses, it may still guess an invalid word. While guessing obviously invalid words may be more entropic, it’s often better to guess suboptimal words near the end, if they’re sufficiently entropic.

Not only can we compute the entropy of a single guess, we can compute the entropy of the remaining word list. If a guess’s entropy is larger than or equal to the entropy of the remaining words, then the guess is guaranteed to narrow down our choices to 1. So we can check for that. Modify make_guess for that.

def make_guess(guessing_list, answer_list):
    max_entropy = -1
    best_guess = "idk"

    # remaining_entropy = compute_entropy([1 for a in answer_list])
    remaining_entropy = math.log(len(answer_list))  # These are equivalent

    for guess in guessing_list:
        bins = {}
        for a in answer_list:
            mask = get_mask(guess, a)
            if not mask in bins.keys():
                bins[mask] = 1
            else:
                bins[mask] += 1

        # Replace best guess if there's a word with better entropy, but defer
        # to valid answers if they have sufficient entropy
        new_entropy = compute_entropy(bins.values())
        if new_entropy > max_entropy or (guess in answer_list
                and best_guess not in answer_list
                and new_entropy >= remaining_entropy):
            max_entropy = new_entropy
            best_guess = guess

    return best_guess

Secondly, you’ll notice that the first guess takes a long time to compute, even though it’s always “raise”. You can modify the script to hardcode this first guess before the while loop, but we can take things a step further.

Precompute Everything

All these decisions are deterministic. For any given answer, this script will always make the same guesses. Since we’ve already precomputed the first answer, can we precompute the optimal 2nd or 3rd answer? Yes!

By simulating every possible answer, and tracking what the script guesses, we can create a decision tree. For those not familiar with a decision tree, imagine it like a choose-your-own adventure story. If this thing, then turn to page 46. If this clue received, guess this word. Etc.

You’ve probably already seen this chart on reddit and that’s what brought you here, but here it is again, the wordle entropy decision tree.

Make it more Human

If this feels like cheating, it is. A friend commented to me that, automation aside, since the script knows the exact answer list, it has an unfair advantage over a naive human, however talented. How is a ordinary human player supposed to know that “dwarf” might be an answer, but “caret” won’t be? After all, wordle lets you guess both!

But expanding answer_list include any legal 5-letter word absolutely nerfs performance because the script keeps guessing words like “aahdl”. So where’s the nice middle ground? Occurrence lists.

If I incorporate the rates that words occur in everyday writing into our probability calculations, I can bias the script to more common words, which are more likely to be valid answers.

The code is more or less the same. But now, the answer lists are going to be tuples of a word and it’s occurrence rate. When tallying up the size of a bin, add the occurrence rate instead. The main changes are to make_guess below.

    remaining_entropy = compute_entropy([p for a, p in answer_list])    # Here

    for guess in guessing_list:
        bins = {}
        for a, p in answer_list:   # Here
            mask = get_mask(guess, a)
            if not mask in bins.keys():
                bins[mask] = p     # Here
            else:
                bins[mask] += p    # Here

There’s also some modifications to the wordlist files, but I trust you to figure that out.

Is it Truly Optimal?

Short answer, no. For two words, broker and otter, it’ll take all 6 guesses. That’s because this particular approach only considers entropy of an individual turn rather than the game as a whole. For a more in depth dive into this global entropy approach to wordle, see the video by 3Blue1Brown that dropped three days before this blog post despite the fact I was working on this post for weeks.