03-08-06 [poker] - 2

03-08-06 [poker]

Poker is not some mysterious problem that's hard to solve. The perfect way to play poker can be formulated exactly : the ideal way to play hold'em . That, of course, is assuming you have a perfect model for how your opponent will act with given cards in a given situation. That is, if I have OpponentModel(board,hole,history), then I can run IdealPlayer[OpponentModel] and it will play as well as is possible by any means given those inputs. No human could ever possibly beat this IdealPlayer, assuming that I had an OpponentModel() which simulated exactly how they played. Note that IdealPlayer[] doesn't know their hole cards and doesn't know what cards will come in the future, it just knows what they would do in concrete situations.

As an aside, if you wanted the "game theoretical" optimal solution, you simply assume that the opponent plays the same style as you, and set OpponentModel = IdealPlayer[OpponentModel], and you have a recursive equation which you can solve. Of course you can't actually solve it, or maybe you can but it's a hell of a bitch and no one has done it yet. (this could also have multiple solutions, some of which are false solutions)

In practice, you actually can run this IdealPlayer[] on modern machines with a simulator sort of like IBM's Deep Blue. The problem is finding a good OpponentModel which is accurate and fast. OpponentModel gets called around 1 million times per decision, and ideally you could call it even more. That's not bad, but if you try to use another IdealPlayer for OpponentModel, then you get 1M*1M and it's out of control. So, you have to use some sort of cheaper heuristic CrappyOpponentModel, and that leads to problems. It means that IdealPlayer plays perfectly against CrappyOpponentModel, but that may not actually be very good play.

For example, CrappyOpponentModel might fold too much if you just bet like crazy. That will give you an IdealPlayer that's just a maniac, betting all the time. In order to train an IdealPlayer to really play great poker, you need an OpponentModel that's on a pretty high level. In terms of the standard levels of thought thing, if OpponentModel is on level N thinking, then IdealPlayer is on level N+1. The problem is that to take each step of levels you need to run the previous level about 1M times. So to do level N thinking you're O( (1million)^N ).

No comments:

old rants