Defined variously as the science of strategy [4], and as the study of conflict and cooperation between rational decision-makers [8], the theory of strategic games essentially embodies an analytic method for understanding and codifying both the structures of conflict and the dynamic interactions shaping behaviours. Although concepts of conflict and cooperative interaction extend back to ancient Babylon—as preserved by the Talmud [1]—the mathematical study of game interactions is a relatively recent undertaking, emerging in the first part of the 20th century. Since the 1950s, the study of games has extended its applications first across the field of economics, then to dynamics of warfare and political science, to biology, and finally, to the interdisciplinary field of decision science. This article provides a succinct survey of both the history and the concepts of game theory relevant to the field of decision science. Particularly significant to this field are the contemporary branches of evolutionary game theory and stochastic game theory, and their emerging applications in supply-chain and innovation management, and in multi-agent responsive learning.

### Introduction

Ernst Zermelo’s (1871-1953) article, Uber eine Anwendung der Mengenlehre auf die Theorie des Schachspiels (1913), is generally recognized to be the first formal theorem in the theory of games [10]. In his existence theorem, Zermelo applied Luitzen Brouwer’s newly-established fixed-point theorem of topology (1909) to the game of chess and developed a proof for the existence of a winning strategy in a non-cooperative game of perfect information. In selecting the game of chess, Zermelo restricted the parameters of his inquiry to that of the finite game—where there exists a finite number of moves, and a finite number of choices available at each move—of infinite sequences (i.e. the assumption of no stopping rules allows for an infinite number of games). Within the non-cooperative, two-person, zero-sum game of chess, players meet with strictly opposing interests for which cooperation never benefits either player and for which there are no chance moves. Like a game of perfect information, chess players move in an alternating sequence and maintain awareness of all of the actions made until that point, and all of the future actions available to either player.

In framing this proof, Zermelo sought to answer the following two questions: (1) What does it mean, mathematically, to be in a winning position; and (2) if in a winning position, can the number of moves forcing a win to be determined? To this first question, Zermelo defined the following two sets: The first set contained all possible sequences of moves such that player A would win independently of how player B played; the second set contains all possible sequences of moves such that a player could postpone a loss for an infinite number of moves [11]. If the first set was empty, the player’s best possible outcome would be a draw. If the second set was empty, the player could not draw, but would only avoid a loss for a finite number of moves (assuming the opponent “forced a win”). Because there remained a possibility that both sets were empty, the “first mover” thus could not guarantee that they would not lose. To the second question, Zermelo proved by contradiction that the number of moves needed to force a win for a player in a winning position will never take more moves than there are positions in the game; therefore, an optimal strategy exists at each stage of the game (2001).

Similarly, between 1921 and 1938, Émile Borel (1871-1956) published several works in which he formulates mathematical theorems investigating the existence of optimal stratagem in non-cooperative games. In particular, Borel explores the existence of optimal “méthods de jeu” in non-cooperative games with imperfect information—e.g. bluffing. In 1921, he proved a minimax theorem for two-person zero-sum games for asymmetric payoff matrix—to this, however, he incorrectly assumes a restriction on the number of “best strategies” for which there is a minimax equilibrium. Perhaps his most significant contribution to the awakening field, Borel’s Algebre et calcul des probabilities (1927) presents a premier insight into the concept of mixed strategies, a probability distribution of “right” moves for each player. Although Borel extends his 1938 inquiry into game strategy beyond the two-person zero-sum model to one of the non-cooperative games in general, Borel’s models of poker present somewhat unrealistic situations, particularly in the assumptions that a player must fold if they do not bet—realistically, a player may “call”—and that a player bluffs with the best of his non-bet hands—realistically, a player bluffs with their worst hands.

Simultaneously, yet independently from Borel, mathematician John von Neumann set out to prove the existence of optimal mixed strategies for any finite extensive game (that is, non-cooperative games of imperfect information, in which players move sequentially). In his 1928 Theorie der Gesellschaftsspiele—“theory of games of strategy”—von Neumann sought to answer the question of how player interactions shape player behaviours. To do this, von Neumann mathematically first defined the normal-form general concept of the “strategic game”, and then presented a primary version of his now-infamous Minimax theorem, applied to a narrower scope of two-person zero-sum games. This minimax theorem, which he based on Brouwer’s fixed-point theorem and proved by “backward induction” states simply that, for any normalized two-person, non-cooperative game, there exists an optimal expected value, and for each player, an optimal mixed strategy, from their own rational choosing, for maximizing the payoff to themselves and simultaneously minimizing that of their competition [6]. Thus, the outcome of the game is determined by the independent strategies employed by its players (as in Zermelo’s chess game).

In 1944, von Neumann co-authored with economist Oscar Morgenstern on the compendious Theory of Games and Economic Behavior, which presented a simplified proof of the Minimax theorem proving the existence of an optimal mixed strategy for each player of a normal-form game. Based on Jean Ville’s 1938 proof connecting the existence of the minimax to the solvability of systems of linear equalities, the 1944 Minimax theorem thrust beyond the sphere of two-person, non-cooperative games to consider cooperative games involving multiple players. Here, von Neumann assumed player interdependence based on some, non-monetary value, where players coordinate strategies via coalitions [8]. In this work, Von Neumann and Morgenstern provided an axiomatic theory of expected utility, which corrected his prior assumptions that players play games in order to maximize expected monetary payoff by introducing the concept of an expected, non-monetary “utility” (e.g. player satisfaction). With the concept of expected utility, von Neumann and Morgenstern expanded the definition of rational decision-making to include situations containing elements of change (e.g. lotteries). In their model of poker, though similar in form to that of Borel, von Neumann and Morgenstern more correctly assume that a player may either bet, fold or call at each stage and that a player bluffs with their worst hands.

Five years later, recognizing von Neumann’s incongruent assumptions of payer independence and interdependence, John Nash Jr. (1928 – 2015) utterly dismantled the whole of game theory in his PhD thesis, Non-cooperative Games. Between 1948 and 1949, Nash proposed first, a revolutionary two-person bargaining theory uniquely assuming non-transferable utility, and then a formal proof demonstrating the existence of equilibria in any extensive game [8]. Then, in 1951, he published an existence proof, which succeeded in generalizing the existence of an optimal equilibrium across all games, both non-cooperative and cooperative. In removing the prior assumptions of transferable utility and zero-sum restriction, Nash transformed von Neumann’s game theory into an analytical structure for all situations of conflict and cooperation. Together, von Neumann’s normal form for extensive games and Nash’s general equilibrium theorem continues to frame both the problem and solution concepts of modern game theory, respectively.

#### Contemporary Applications

Since the 1950s, succeeding applications of game theory find relevance across an ever-widening range of fields, giving rise to distinct applications, such as stochastic game theory and evolutionary game theory. Particularly significant within the field of decision analytics, stochastic game theory finds diverse applications within the fields of economics, computer science, and machine learning. Special cases of the stochastic game model find particular relevance to the topic of learning in the field of artificial intelligence for both single-agent and multi-agent decision-support systems. From biology to medicine, to social science and into operations research, evolutionary game theory increasingly extends its utility to understanding non-rational behaviours. New applications of evolutionary game theory to the management of strategic innovation and to supply-chain sustainment provide frameworks for strategic decisions affected by non-rational influences.

#### Stochastic Game Theory

Originally defined by Lloyd Shapely in 1953, stochastic game theory considers both non-cooperative and cooperative games over time. Uniquely, however, stochastic game theory considers the game as proceeding along different stages, allowing for games of both fixed and un-fixed determinacy. Supporting this expanded generality is the definition that, for each stage, there exists a finite number of states, for each of which there are a finite number of available strategies. At a given state, the players select their strategies simultaneously, rather than sequentially, implying the existence of imperfect information. Both the state and the selected strategy together determine the payoff for each player, probabilistic ranking the likelihood of moving to subsequent states. These probabilities transition along with each succeeding state and the total payoff to the player is the average of the payoffs received at each stage. Special cases of the stochastic game include the single-state game—which reduces to a repeated game—and the single-player game—which reduces to a Markov decision process.

In Coordinating Many Agents in Stochastic Games (2012), Ana Bazzan investigated the application of stochastic game theory to machine learning as a method to understanding the problem of multi-agent coordination across multiple states. Exponentially more complex than single-agent reinforcement learning, multiple-agent, joint-action learning poses a particular challenge, even among stochastic game approaches: viz., the “combinatorial explosion” in the space of joint states and actions may imply multiple equilibria, possibly in mixed strategies [3]. While previous research manages this complexity through the definition of specific assumptions—such as assuming perfect monitoring at the agent-level, or assuming an a priori coordination of agent learning processes—they do so as special single-state or single-state cases of the stochastic game. Thus, these assumptions fail to scale up as an increased number of agents are considered, and are often inefficient in higher dimensions. Bazzan approached this multi-agent problem of (potential) non-optimality through the introduction of a hierarchical structure of joint learning under supervision. This structure seeks to exploit the utility of joint-agent learning, which does not assume an agent’s prior knowledge, yet maintains an omnipotent awareness of payoffs across multiple agents in multiple states. By grouping multiple low-level “joint” agents under the supervisory control of a reduced number of high-level “individual” agents, Bazzan balances the need for the stochastic game to maintain the assumption of perfect monitoring with the need for efficiency for a high number of agents.

In A Novel Multiagent Reinforcement Learning Algorithm Combination with Quantum Computing (2016), Xiangping Meng considered the application of quantum computing to the efficiency problem of the multi-agent, multi-state stochastic game. As Meng noted, the problem of multi-agent learning behaviour is crucial to developing and adapting multi-agent systems. Historically, reinforcement learning succeeds in finding optimal control policies for single-agents in a stable environment; the application of stochastic game theory seeks to expand this benefit to multi-agent systems in multi-state environments [7]. However, as previously indicated, the combinatorial complexity of additional agent interactions introduces significant computational inefficiency. Because quantum computing is capable of processing incredible numbers of quantum states in parallel, it theoretically lends itself as a powerful action-selection method in multi-agent reinforcement learning. To test this theory, Meng developed a multi-agent quantum reinforcement learning algorithm, which was applied to a two-player cooperative game. Their results indicated a meaningful computational advantage in the combination of reinforcement learning and quantum computing to enhancing the speed of system learning, warranting further study.

Finally, and perhaps equally fascinating to the reader, one recent study approaches the problem of stochastic game equilibria of multi-agent learning through particle swarm optimization (PSO). With applications in neural network training, data clustering, mobile robot path-finding and optical communication systems, PSO considers the behaviour of communities displaying both individual and social behaviours [13]. In One Swarm per Queen: A Particle Swarm Learning for Stochastic Games, Tcheukam and Tembine propose and assess a model-less strategic learning scheme based on particle swarm collaboration, in the context of a stochastic environment. Although they successfully demonstrated the utility of PSO for the purpose of quickly approximating optimal equilibria of stochastic games, Tcheukam and Tembine do so within the parameters of continuous action space and identically distributed states, and with the familiar restriction to two-agent interaction.

#### Evolutionary Game Theory

Evolutionary game theory removes from prior game models the assumption of rational decision-making and instead considers the interaction of genetically determined forms of behaviour within population groups. The evolutionary model provides an analytical tool for examining the fitness (i.e. payoff) of an organism, whose genes (i.e. players) genetically determine their behaviours (i.e. strategies). In a fixed environment, evolution by natural selection increases organism fitness; however, natural selection in variable environments introduces the element of mutation, which causes organism fitness to decrease over time [5]. It is in the domain of the variable environment that evolutionary game theory shines like a beacon, illuminating the existence and nature of evolutionarily stable strategies (i.e. equilibria) between competing dynamics. As formally articulated by scientists John Maynard Smith and George Robert Price in 1973, an evolutionarily stable strategy (ESS) is a behavioural strategy such that, if an entire population adopts it, no alternative strategy can invade. Additionally, as developed by Peter Taylor and Leo Jonker in 1987, evolutionary game theory lends itself as an approach to understanding the properties of the evolving environment through the analysis of the frequency of changes to equilibria of competing dynamics, over a fixed range of time.

In Application of Evolutionary Game Theory to Strategic Innovation (2016), researchers Ozkan-Canbolat, Beraha, and Bas examined the existence of optimal equilibria for each of two competing innovation strategies: rational adoption, or irrational adaptation. It is well understood that strategic innovation affects completive advantage. However, most frameworks shaping the decision of whether to adopt an emerging innovation assume the interaction of rational factors, such as benefits vice costs. What is not well understood is how irrational strategies shape an organization’s decision of whether to adapt to industry changes. Analogous to selection pressures that shape an organism’s fitness, the “fad” theory of strategic innovation assumes the existence and influence of bandwagon as an irrational strategy influencing an organization’s fitness within its environment [10]. Bandwagon pressure may cause an organization to adopt an innovation that they would otherwise assess to be inefficient or to reject an emerging technology that they would otherwise assess to be efficient. Inversely, counter-bandwagon pressure may cause an organization to reject a widely distributed technology regardless of its measure of efficiency, and precisely as a result of the sheer number of adaptors. This application of game theory to the question of adoption strategies lends itself to the body of management research by questioning the bias inherent in the assumption that the adoption of new innovation is always good, and instead considers the organization’s decision of whether or not it is good to reject an emerging innovation.

Similarly, in their study, researchers Babu and Mohan approached the question of holistic supply-chain sustainment through the application of evolutionary game theory. Recognizing the gaps inherent in a body of research devoted to “green” supply-chain management and optimization, Babu and Mohen considered the equilibria between social, economic, cultural/governance, and environmental dimensions to model the sustainability of the supply chain [2]. Modelled as an evolutionary game, the Babu and Mohan definition of sustainment lends itself generally to all supply chains, regardless of the number of interacting dimensions. This framework not only supports the identification of sustainability for a particular supply chain but also depicts how that system moves into equilibrium over a long, finite period of time. In understanding these long-range patterns, this application of the evolutionary game reveals the cascading effects that even trivial choices affect on the equilibrium of the supply chain. The effects of future decisions on the sustainability of the supply chain may, therefore, be predicted.

#### Discussion

Within the emerging field of decision science and data analytics, “black-box” algorithms are of increasing concern. Unlike decision trees or regression analysis, the application of neural networks comprising the study of “deep” machine learning are not fully transparent. In considering the increased role of machine learning within decision-support systems, the need for transparency is essential to the development of trust between the human decision-maker and the “cognitive” agent. The application of game theory to machine learning and AI may help to resolve this “black-box” dilemma by shedding light on how the algorithm is reaching a certain output. Particularly with the evolution of reinforcement learning over time, the illuminated black box lends itself to human and/or agent intervention and correction.

Another problem within the application of machine learning is the need to simultaneously optimize multiple dynamics interacting on a system and to propose recommendations in near-real-time. Real-time analytics, though increasingly possible, place a high computational demand on an information system. The transformation of oceans of data into streams of relevant information requires increased processing efficiency as well as more effective data mining methods. To manage the problem of complexity, approaches to problems of optimization restrict themselves to one or two-directional axes. Here, too, the framework of game theory lends valuable insights into reducing the computational complexity a system, while also increasing the capacity of that system to consider multiple, competing dynamics.

**References**

[1] Aumann, R. J., & Maschler, M. (1985). Game-theoretic analysis of a bankruptcy problem from the Talmud. Journal of Economic Theory, 36(2), 195-213. doi:10.1016/0022-0531(85)90102-4

[2] Babu, S., & Mohan, U. (2017). An integrated approach to evaluating sustainability in supply chains using evolutionary game theory. Computers & Operations Research. doi:10.1016/j.cor.2017.01.008

[3] Bazzan, A. L. (2012). Coordinating many agents in stochastic games. 2012 International Joint Conference on Neural Networks (IJCNN), 1. Retrieved September 7, 2017, from __http://proxy1.ncu.edu/login?url=http://search.ebscohost.com/login.aspx?direct=true&db=edb&AN=86631948&site=eds-live__

[4] Dixit, A., & Nalebuff, B. (n.d.). Game theory. In The Concise Encyclopedia of Economics. Retrieved September 7, 2017, from __http://www.econlib.org/library/Enc/GameTheory.html__

[5] Easley, D. (2010). Evolutionary game theory. In J. Kleinberg (Author), Networks, crowds and markets: Reasoning about a highly connected world (pp. 209-227). Cambridge University Press. Retrieved July 14, 2017, from __http://www.cs.cornell.edu/home/kleinber/networks-book/__

[6] Kjeldsen, T. H. (2001). John von Neumann’s conception of the minimax theorem: A journey through different mathematical contexts. Archive for History of Exact Sciences, 56(1), 39-68. doi:10.1007/s004070100041

[7] Meng, X., Chen, Y., Pi, Y., & Yuan, Q. (2006). A novel multiagent reinforcement learning algorithm combination with quantum computation. 2006 6th World Congress on Intelligent Control and Automation. doi:10.1109/wcica.2006.1712835

[8] Myerson, R. B. (1991). Game Theory Analysis of Conflict. Cambridge, MA: Harvard University Press.

[9] Myerson, R. B. (1999). Nash equilibrium and the history of economic theory. Journal of Economic Literature, 37(3), 1067-1082. doi:10.1257/jel.37.3.1067

[10] Ozkan-Canbolat, E., Beraha, A., & Bas, A. (2016). Application of evolutionary game theory to strategic innovation. Procedia – Social and Behavioral Sciences, 235, 685-693. doi:10.1016/j.sbspro.2016.11.069

[11] Schwalbe, U., & Walker, P. (2001). Zermelo and the early history of game theory. Games and Economic Behavior, 34(1), 123-137. doi:10.1006/game.2000.0794

[12] Sundar, D. K., & Ravikumar, K. (2014). An actor-critic algorithm for multi-agent learning in queue-based stochastic games. Neurocomputing, 127, 258-265. doi:10.1016/j.neucom.2013.07.020

[13] Tcheukam, A., & Tembine, H. (2016). One swarm per queen: A particle swarm learning for stochastic games. In 2016 IEEE 10th International Conference on Self-Adaptive and Self-Organizing Systems (SASO) (p. 144). Ieee. doi:10.1109/SASO.2016.22