Computing Robust Counter-Strategies


Michael Johanson, Martin Zinkevich and Michael Bowling. Computing Robust Counter-Strategies. In Advances in Neural Information Processing Systems 20 (NIPS 2007), pages 721-728, 2008.

Download


Abstract

Adaptation to other initially unknown agents often requires computing an effective counter-strategy. In the Bayesian paradigm, one must find a good counter-strategy to the inferred posterior of the other agents’ behavior. In the experts paradigm, one may want to choose experts that are good counter-strategies to the other agents’ expected behavior. In this paper we introduce a technique for computing robust counter-strategies for adaptation in multiagent scenarios under a variety of paradigms. The strategies can take advantage of a suspected tendency in the decisions of the other agents, while bounding the worst-case performance when the tendency is not observed. The technique involves solving a modified game, and therefore can make use of recently developed algorithms for solving very large extensive games. We demonstrate the effectiveness of the technique in two-player Texas Hold’em. We show that the computed poker strategies are substantially more robust than best response counter-strategies, while still exploiting a suspected tendency. We also compose the generated strategies in an experts algorithm showing a dramatic improvement in performance over using simple best responses.


Notes

This paper introduces Restricted Nash Response. Traditionally, there are two types of strategies to use against an opponent. You can use a Nash Equilibrium, which minimizes its own exploitability, but doesn't attempt to exploit an opponent's mistakes and so doesn't maximize its expected value when the opponent is weak. Alternatively, if you know the opponent's strategy, you can compute and use a best response counter-strategy, which maximizes its winnings against the opponent. However, best response counter-strategies are brittle: they are highly exploitable, and might fail badly even against different weak opponents that are not trying to exploit the best response


Restricted Nash Response (RNR) is an algorithm that gives us a compromise between these goals. By setting one parameter p, we can produce counter-strategies that trade off between minimizing exploitability and maximizing exploitation of a known opponent. Further, Restricted Nash Response provides the best possible set of tradeoffs: it is not possible for a different counter-strategy to be both less exploitable and more exploitive.


While Restricted Nash Response provides this best possible set of counter-strategies in cases where the opponent's strategy is known, it is not as effective when we only have some information about their strategy - for example, observations of an opponent playing the game. While we present a frequentist approach in this paper that allows us to build opponent models and then build RNR counter-strategies to a model, we later found that this approach may not be effective in practice. For a more practical opponent modelling approach that can be applied when you do not know the opponent's entire strategy, read our Data Biased Response paper.


Links

  • ACPC Forums thread where I can answer questions about this paper.
  • Data Biased Response, a followup paper that presents a practical algorithm that can be used when we only have observations of the opponent's actions, and not their complete strategy.
  • My MSc Thesis, which summarizes the CFR and RNR papers.

BibTeX

@incollection{ 2007-nips-rnash
 title = {Computing Robust Counter-Strategies},
 author = {Michael Johanson and Martin Zinkevich and Michael Bowling},
 booktitle = {Advances in Neural Information Processing Systems 20},
 editor = {J.C. Platt and D. Koller and Y. Singer and S. Roweis},
 publisher = {MIT Press},
 address = {Cambridge, MA},
 pages = {721--728},
 year = {2008}
}