Selection Dynamics and Adaptive Behavior Without Much Information

John B. Van Huyck, Raymond C. Battalio, Frederick W. Rankin

October 2001

[ Download | Introduction | Conclusion | References | John's Web ]

Abstract: This paper investigates whether behavior in a coordination game changes when subjects are limited to the information used by reinforcement learning algorithms. In the experiment subjects converge to an absorbing state at rates that are orders of magnitude faster than reinforcement learning algorithms, but slower than under complete information. Usually, this state is very close to a mutual best response outcome. All of the subjects are within a dime of giving a best response and 82.5 percent of the subjects gave a best response to the behavior of the other subjects in their cohorts without any information about their own best response function. The stability conditions derived from the best response dynamic are to conservative both under complete information and reinforcement information.

Key Words: Stability, Equilibrium Selection, Information, Reinforcement Learning, Adaptive Behavior.

JEL classification: c72, c92.

Acknowledgments: Rajiv Sarin found an error in our reinforcement learning simulations. Eric Battalio programmed the graphical user interface. The National Science Foundation and Texas Advanced Research Program provided financial support. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation or the Texas Advanced Research Program.


Introduction

Economists usually assume that decision makers know the consequences of their actions, form rational expectations, and possess internally consistent preferences. We do this in order to predict how people will behave in novel situations. When the situation is strategic, the knowledge assumptions needed to deduce a prediction are even stronger. Yet, people often have to make decisions without much information and at least as often ignore information that is readily available.

For example, it is difficult to communicate a common knowledge description of a game to undergraduate students. Moreover, we sometimes doubt that subjects in an experiment are actually using this information to deduce a mutually consistent way to behave. As Vernon Smith (1990, p12) wrote, "Many years of experimental research have made it plain that real people do not solve decision problems by thinking about them in the way we do as economic theorists. Only academics learn primarily by reading and thinking. Those who run the world, and support us financially, tend to learn by watching, listening, and doing. ... When experiments approximate the predictions of theory, it is because subjects experience the choices of others and then choose based on what they have learned to expect."

Van Huyck, Cook, and Battalio (1994) report an experiment rejecting the stability conditions derived from the myopic best response dynamic and find that stability conditions based on relaxation algorithms with inertia make more accurate predictions. They were careful to communicate the best response function to their subjects, since this information seemed central to the theory being tested. Smith’s observation suggests that the subjects’ knowledge of the best response function doesn’t explain why the theory made accurate predictions. Our research hypothesis is that taking away information only used in a deductive analysis of the situation, like the best response function, will not influence behavior since subjects don’t use it anyway.

Reinforcement learning algorithms only require that players know their feasible actions and respond to the consequences of their actual choices. These algorithms can also be used as selection theories, although convergence to a mutually consistent outcome is not guaranteed in general. This paper reports an experiment in which subjects are limited to the information used by reinforcement learning algorithms and contrasts the results with Van Huyck, Cook, and Battalio (1994).

The reinforcement learning algorithms considered below do not accurately predict observed behavior under reinforcement information conditions. Humans converge to the interior equilibrium at speeds that are orders of magnitude faster than the models. Observed behavior under complete information and reinforcement information treatments do differ in the length of time it takes to converge to a mutual best response outcome. However, all cohorts converged to the market statistic predicted by the interior equilibrium regardless of the information conditions or the stability conditions derived from the myopic best response dynamic. Average subject behavior did increase in exactly the way predicted by a conventional comparative static analysis.

The paper is organized as follows: Section II reviews Van Huyck, Cook, and Battalio’s (1994) analytical framework and the Cross Dynamic studied in Borgers, Morales, and Sarin (2001); Section III reports the experimental design; Section IV reports the experimental results; Section V estimates an empirical model of satisfaction; Section VI compares the complete and reinforcement information treatments; Section VII concludes with a discussion of low cognitive game theory and the problem of modeling imagination in light of our results.

[ Top | Download | Introduction | Conclusion | References | John's Web ]


Download

Adobe Acrobat (PDF) format:

Surface mail request (comments, suggestions, references, etc.): john.vanhuyck@tamu.edu

[ Top | Download | Introduction | Conclusion | References | John's Web ]


Conclusion

In the experiment subjects converge to an absorbing state at rates that are orders of magnitude faster than reinforcement learning algorithms. These states are always very close to a mutual best response outcome and the terminal median is always exactly equal to the interior equilibrium median. The interior equilibrium is behaviorally stable. Given a theory that selects the interior equilibrium, standard comparative static arguments accurately predict that increasing the tuning parameter, T, increases individual effort, ei, and median effort, M.

All of this is true under both the complete and reinforcement information treatments. So there is a sense in which the hypothesis that taking away information only used in a deductive analysis of the situation will not influence behavior since subjects don’t use it anyway is not contradicted by the experiment. However, the information treatment does influence behavior in a subtle, but statistically significant way. It takes longer for the median to converge to the interior equilibrium median, the process is noisier under the reinforcement information treatment, and more subjects fail to give a best response to the median in the terminal period.

A careful analysis of the data reveals that random search models of reinforcement learning, like Erev and Roth (1995) or the closely related Cross Dynamic, do not accurately describe behavior even when subjects are restricted to reinforcement information. Specifically, our subjects are able to search the action space much more efficiently than the random-search-reinforcement-learning analysis allows. Our subjects do better even under information conditions that favor the reinforcement learning algorithm. It appears to us that human cognition is not well described by either the choice-theoretic analysis or the random-search-reinforcement-learning analysis.

Sarin and Vahid (2001) propose a payoff assessment model that takes account of the similarity amongst strategies to explain the data reported in this paper. Their payoff assessment model fits remarkably well and is more successful than the original Cross model and versions that include similarity or declining step size. Chen and Khoroshilov (2001) use the data reported in this paper to compare the payoff assessment model to a version of experience weighted attraction learning model and a version of the relative payoff sum model. They also find the payoff assessment model fits the data best.

[ Top | Download | Introduction | Conclusion | References | John's Web ]


References

Aumann, Robert and Adam Brandenberger, "Epistemic Conditions for Nash Equilibrium" Econometrica 63(5), September 1995, 1161-80.

Baumol, William J. and Jess Benhabib. "Chaos: Significance, Mechanism, and Economic Applications." Journal of Economic Perspectives 3(1) Winter 1989, 77-105.

Boldrin, Michele and Michael Woodford. "Equilibrium Models Displaying Endogenous Fluctuations and Chaos: A survey." Journal of Monetary Economics 25 1990, 189-222.

Borgers, Tilman, Antonio J. Morales, and Rajiv Sarin, "Expedient and Monotone Learning Rules," laser-script, July 2001.

Bray, M. "Learning, Estimation, and the Stability of Rational Expectations." Journal of Economic Theory 26, 1982, 318-39.

Bush, Robert and Frederick Mosteller, Stochastic Models for Learning (New York, NY: Wiley, 1955).

Conover, W.J. Practical Nonparametric Statistics. 2nd ed. (New York, NY: John Wiley & Sons, 1980).

Chen, Yan, and Yuri Khoroshilov, "Learning Under Limited Information," laser-script, October 2001.

Cross, J.G., "A Stochastic Learning Model of Economic Behavior," Quarterly Journal of Economics, 87, 1973, 239-66.

Eckalbar, John C. "Economic Dynamics." In Economic and Financial Modeling with Mathematica, edited by H. Varian. (New York, Springer-Verlag, 1993).

Erev, Ido and Alvin E. Roth, "On the need for low rationality, cognitive game theory: Reinforcement learning in experimental games with unique, mixed strategy equilibria", laser-script August 1995.

Lucas, Robert E., Jr. "Adaptive Behavior and Economic Behavior." In Rational Choice: the contrast between economics and psychology, edited by R. Hogarth and M. Reder. (Chicago, University of Chicago Press, 1987).

McAllister, Patrick H., "Adaptive Approaches to Stochastic Programming," Annals of Operations Research, 30, 1991, 45-62.

Milgrom, Paul and John Roberts. "Adaptive and Sophisticated Learning in Normal Form Games." Games and Economic Behavior 3(1) February 1991, 82-100.

Possajennikov, Alexandre, "An Analysis of a Simple Reinforcing Dynamics: Learning to Play an ‘Egalitarian’ Equilibrium," laser-script, January 1997.

Rassenti, Stephen, Stanley S. Reynolds, Vernon L. Smith, and Ferenc Szidarovszky, "Learning and Adaptive Behavior in Repeated Experimental Cournot Games," laser-script, October 1993.

Roth, Alvin E., and Ido Erev, "Learning in Extensive-Form Games: Experimental Data and Simple Dynamic Models in the intermediate term," Game and Economic Behavior 8(1), January 1995, 164-212.

Sargent, Thomas J. Bounded Rationality in Macroeconomics. (Oxford, Clarendon Press, 1993).

Sarin, Rajiv, "Learning Through Reinforcement: The Cross Model," laser-script March 1995.

Sarin, Rajiv, and Farshid Vahid, "Strategic Similarity and Coordination," laser-script July 2001.

Schlag, K., "A Note on Efficient Linear Learning Rules," laser-script 1994.

Selten, Reinhard, "The Chain Store Paradox," Theory and Decision 1978; reprinted in Models of Strategic Rationality (Dordrecht, Kluwer Academic Publishers, 1988).

Smith, Vernon L., "Experimental Economics: Behavioral Lessons for Microeconomic Theory and Policy," 1990 Nancy L. Schwartz memorial lecture, Kellogg Graduate School of Management, Northwestern University.

Van Huyck, John B., Raymond C. Battalio, and Richard O. Beil, "Tacit Coordination Games, Strategic Uncertainty, and Coordination Failure," American Economic Review, 80, 1990, 234-48.

Van Huyck, John B., Raymond C. Battalio, and Richard O. Beil. "Strategic Uncertainty, Equilibrium Selection, and Coordination Failure in Average Opinion Games." The Quarterly Journal of Economics Vol. CVI, No. 426, August 1991: 885-910.

Van Huyck, John B., Joseph P. Cook and Raymond C. Battalio. "Adaptive Behavior and Coordination Failure," Journal of Economic Behavior and Organization, 32, 1997, 483-503.

Van Huyck, John B., Joseph P. Cook and Raymond C. Battalio, "Selection Dynamics and Adaptive Behavior," Journal of Political Economy 102(5), 1994, 975-1005.

[ Top | Download | Introduction | Conclusion | References | John's Web ]