3/17/2006

03-18-06 - 3

03-18-06

I started playing with Poker AI again. It started when I was contacted by the author of GNU Holdem who's integrating some of my work. Then I found "mabots", which is a free implementation of the TTH (Turbo Texas Hold'em) AI. That suddenly opened up an exciting possibility - I could run a brute force simulation of poker! The key is that the TTH bots are so simple and fast that you can actually run them about a million times a second. That allows you to explore the full branching tree of possibilities in the hand, and simulate what your opponent would do in each case. I've written about this type of poker play before, but the idea of running a full-tree brute force simulation just seemed ridiculous. Even with how fast the TTH bots are, you can only run the full tree heads up, in a multiway pot you either have to do a monte-carlo sampling of the simulation, or prune the tree to condense branches with similar outcomes.

So, I got this TTH simulator working, and the good news is it completely destroys the TTH bots. At a table full of random TTH opponents, the simulator beats them for 12 PTBB/100. That's just an insane ridiculous win rate at limit hold'em, the best players in the world win at 3 PTBB/100. The very best TTH bots at the same table win at around 5 PTBB/100. Even that is high and it's because many of the TTH bots are so weak. Heads up against the very best TTH bots, my simulator bot wins at a rate of 30 PTBB/100.

The TTH bots are pretty bad, and my simulator quickly finds their flaws. For example, if the hand checks to the river, the TTH bots will often stab at the river with no hand. The simulator will then raise them with any two cards, because the TTH bots will fold to the raise. The TTH bots generally make very predictable stabs at the pot and also fold easily to aggression, so you can trap them and bluff them. I saw another cute example :

Board is [9 T Q] and my simulator has Q7 and TTH has A4. TTH checks to me and I check behind (!!). Yes, it looks drawy, but I get more value because : Turn is a [3] and TTH bets out with air. The simulator raises (!!) and TTH calls. The river blanks and it goes check-check. The simulator estimates that it gains +1.5 big bets from this line rather than just betting the flop, because TTH will just fold the flop to most bets, but here we trapped in 2 more big bets. The rare cases where we get beat are outweighed by all the times he has junk and we win those extra bets.

Basically the simulator is able to play 100% perfectly against the TTH bots. Of course it can't see their cards, and the TTH bots do have a tiny bit of randomization, so my simulator can still make "mistakes" in the silly Sklansky Theory of Poker sense of a "mistake" (eg. if I could actually see its cards, I might do something else). However, the simulator plays 100% perfect poker in the sense that it's the best you can possibly play against them - I put them on the best possible hand range, and in each case I can predict what they would do in response to my actions, since I just run their brain to find out.

Unfortunately, this Simulator is not very useful as a general purpose AI. The weaknesses of TTH are so predictable and unrealistic, that it makes the simulator play strangely. That is, the Simulator is perfect against TTH but it's very very far off a game theory optimal AI, it's sort of at the opposite end of the spectrum, it's tuned itself to be perfect against TTH which then makes it highly exploitable to other play styles. For example, the Simulator against a Poki AI is a losing player.

So, this turned out to be a purely theoretical excercise. Perhaps I need to read some more game theory papers. I suspect there may be some way to prevent the Simulator from becoming so exploitable to other strategies. For example, maybe if I just added some chance that instead of doing the TTH action, the opponent does something completely random at each decision point? That would certainly keep me from zeroing in too much on the TTH opponent, but not sure if it's good or not.

In other words, if you have an AI A1, and you construct the optimal strategy that beats it, A2 = Sim(A1), then you can construct the optimal strategy that beats it, A3 = Sim(A2), etc.. in theory this is a sequence that converges to a fixed strategy AN which is the game theoretic optimal strategy, that is AN+1 = AN (as N -> infinity), the best strategy against AN is just AN. I'm not even sure if this simulator approach will ever converge, or if it will just oscillate wildly, but certainly I've seen that for the first few steps in the sequence, it can swing wildly. Presumably if I made a simulator than ran my simulator, it too would have very strange exploitable flaws.

No comments:

old rants