Have you ever found yourself reading a problem in an exam and never forget it again because it gives you chills as well as makes you want to solve it every time you come across it? Well, the geek in me found one such problem in the final exam I took with my advisor Jared in the course CS561 (Data Structures and Algorithms) at UNM. I wanted to share this awesome problem with you, with the hope that you will find it as challenging and intriguing as I did. So here is what it says.
You and an opponent are playing a game using a row of n coins of values where is even. Players take turns selecting either the first or last coin from the row, removing it from the row, and receiving the value of the coin. Assume you play first. Following are some examples, assuming optimal play for both players:
- 2, 4, 8, 10 : You can collect maximum value 14 (10 + 4).
- 8, 20, 3, 2 : You can collect maximum value 22 (2 + 20).
Assume that your opponent does not always play optimally. In particular, if coins remain, then they choose the optimal move with probability (for example may decrease as grows). Describe an algorithm to optimize your expected winnings.
Sounds challenging, right! This was one of the bonus problems in the final exam, which still gives me chills every time I attack it. However, with a very clever use of dynamic programming, this problem can be attacked in a nice way. I will first focus on the easy case first, where . This means the opponent plays optimally always. For this easy case, before jumping to the dynamic programming solution, let us first try to convince ourselves why a greedy approach won’t work (this is standard practice with dynamic programming problems).
Greedy (non optimal) solution for
To see why a greedy solution won’t work for this problem, we have to find an example in which the greedy solution fails to provide an optimal solution. I will formalize my proof once we understand this example.
So, the problem at hand is to devise a sequence of coin values on which the greedy approach fails. Recall that the greedy approach picks the coin of highest value from either end of the sequence, alternating between the players. Assuming I go first, consider the sequence , the same sequence as in the example above. The greedy approach makes me pick coins with values 8 and 3, making my total value 11. However, as pointed above, the optimal value is 22, which is significantly higher. KMN!
So why does the greedy approach not work? To see this, we use this trick of constructing a sequence , which given a value , gives me a total value of if I use the greedy approach, but a value in the optimal play. This would prove that we will need to dig a bit deeper to solve this problem. Here is how this construction works.
Let be given as the value. (Note : The lower bound on is just a technicality. The solution can be generalized to all easily, although I will spare that discussion here. Also, I am assuming that coin values can be arbitrary rational numbers and not necessarily integers. However, I again claim that extending the solution to integer coin values is fairly trivial and hence, I leave that discussion from this post.)
Consider this sequence : . For example, given , the sequence we are constructing is . Using the greedy solution, the total value I obtain is 21 + 19 = 40, as required. However, an optimal strategy will yield me a value of 19 + 30 = 49. Generalizing this, the greedy strategy yields a total value of , while the optimal play yields , which is more than when . So, what on Earth is this optimal strategy!!
[Enter dynamic programming.]
Dynamic programming solution for
In cases of optimization like these, if greedy doesn’t work, dynamic has to. Ofcourse, this comes at the cost of efficiency and the requirement of precomputing the entire solution in the beginning, but hey! correctness is way more important than efficiency here. So, let’s take a deep breath and dive in.
Let and be a sequence of coin values. Assuming I play first, define to be the maximum value of the coins I can obtain using an optimal play in the sequence only. Clearly, if , then and when , then . Another easy case is . We are interested in the case when . Let me first state the solution and then justify it.
High five if you just rolled your eyes! Understanding this solution isn’t hard if you keep in mind that both players are playing optimally. This means both of them are trying to maximize their total values. Hence, when I am presented with a choice of two coins, I need to look ahead into what possible strategies my opponent can play next, and then choose a move that maximizes my gain over all those strategies. The max in front of the solution justifies this fact. I can either choose coin to gain value or I can choose coin to gain value . This choice will depend on what my opponent may do next. Let us discuss these two cases separately.
Case I : If I choose coin , my opponent has to choose a coin from values . If he chooses , I will then be making my choice from values , in which case I gain , by defn. Else, if my opponent chooses , I will have to make my choice from the values, in which case I gain . Since the play is optimal for both of us, I will assume my worst case for his choice and minimize over these two cases. Hence, the equation on the top.
Case II : Works similarly.
Clever, right! The beauty of this solution lies in the fact that even without knowing what my opponent will choose, just knowing his strategy is able to let me exactly compute what he will go for. Note that using techniques like memoization, I can precompute my optimal value in just time, which is linear in the number of coins. Once I have this precomputed solution, all I need to do is make my evil laugh! Bwahahaha!!!
Not so optimal opponent
All good so far, but beware! Common sense is not so common in this world. Finding such an intelligent player may be rarer than we want. So, why not give the poor opponent some slack and allow him to deviate from optimality with some probability. More formally, let be the probability that when coins remain, the opponent chooses to play optimally, and not otherwise.
An important assumption to compute the expected value I gain here is that I know ahead of time for all . If not, I may be in trouble. We can argue about how practical is it to assume this, but we can always engineer the game to make this assumption hold. I will simply let my opponent flip a coin in which the probability of heads is when he has to choose a coin from coins. Refer to my previous post to ensure that I can always have such a coin with me.
Now, assuming that I have found a friend who is ready to bow to my rules of this game, how can I compute my expected value at the end of the game? Well, not so tricky this time! The min in the equation will just be replaced by max when the opponent is not playing optimally. Voila! Here we go. Thus, out final solution is that , when , it is when and if . Remember that I am still playing optimally. For , we will now have the following. (Note, here stands for the expected value of .)
Double high five if you rolled your eyes again! One more crazy step? Why assume that I play optimally? I have to be realistic right? Similar to what I am assuming for my opponent, let me assume that with probability , I play optimally when given coins, and not optimally otherwise. I will then use a similar trick, where I will choose the max in the front of the equation above with probability , and I will choose min otherwise.
We’re done! So much for choosing coins! As you saw, dynamic programming has the potential to make us rich. Of course, at the cost of going through the math. An interesting question before I close this post. What if I do not know and/or ? As stated earlier, I can engineer my opponent to play such that I know the probability of optimal play every time he makes a choice, but to be more considerate of his feelings, I need to make myself more adaptive of his moves. Every time I realize that he is not picking the coin which the optimal strategy suggests, I will have to revise my knowledge of his probability of erring on the side of non-optimality. We have to agree that such a strategy will only work when the number of coins is large, to begin with, in which case I can hope to come close to guessing his exact probability of erring. In any case, realism is not a friend of mathematics, especially when it comes to modeling human behavior.
Thanks for reading this post. Stay tuned for my next post, which is likely to be on some cool application of linear algebra in a puzzle type setting. Until then, ciao!