12-08-07 - 3

Trivial way to do iterative game theory solutions : player A goes first. Try all possible moves for player A. Choose the move that optimizes his EV. Assume that player A knows that player B will be playing the best possible way for himself. For each move of player A's in the EV computation, simulate all possible moves, eg. go to player B's move and try all possible moves, assume player B knows player A's strategy but not his actual hand, and let player B choose the strategy that is best for him. (the branch is not actually on the # of moves, it's on the # of strategies, which is much larger).

Note that this does not necessarilly give you the correct game theory solution in all cases since it's a greedy search, but in simple games it will (games with a piecewise linear EV shape). It's also exponentially branching, but it's not actually that bad. The reason is you're not actually trying to find a full solution into the future, I just want player A's next move, then after he moves I'll do this again from scratch to find player B's next move. Player A's best move has a decreasing dependence on the future moves (in simple games anyway, this is crucial, I don't want moves N into the future to suddenly be more important than earlier moves). That is, A's best move is highly dependent on player B's next move, less so dependent on the next move and even less dependent on the next move. What that means is you only need to search a few moves ahead, and then you can just use some heuristic EV evaluation of the situation and terminate the branching. You also don't need to simulate branches that are obviously horrible for the person making the choice (actually even good branches which you can determine are definitely worse than some other branch can also be dropped).

In poker in particular you can usually stop the sim when the current round ends and just do a heuristic EV for the next round. eg. simulate all the possible player actions in the current round, but whenever somebody caps the betting or just calls, you simulate drawing a future card and just evaluate a heuristic EV based on the probabilities of improving. The heuristic EV still needs to be reasonably complex, it should include factors for position and being the leader, the fact that people on draws will put more money in if they hit but just check-fold if they miss, etc. but it doesn't need to simulate every possible action on all possible future cards.

No comments:

old rants