Game theory


 * The Essay topic: Game Theory
 * Class:4IT496 Simulation of Systems (WS 2014/2015)
 * Author: Kristýna Gubišová

The beginning of mathematical game theory originated in 1944, when the book called Theory of Games and Economic Behavior by mathematician John von Neumann and economist Oskar Morgenstern was written and published by Princeton University Press. Game theory is the discipline of applied mathematics, that analyzes the spectrum of conflict decision situations that may occur, wherever there is a conflict of interests. Game theory models try to analyze these conflict situations and by building a mathematical model of the conflict and computing are trying to find the best strategies for specific parties of the conflict.

Game theory is a branch of, originally, applied mathematics, used mostly in economics and political science, a little bit in biology, that gives us a mathematical taxonomy of social life and it predicts what people are likely to do and believe others will do in cases where everyone's actions affect everyone else. That's a lot of things: competition, cooperation, bargaining, games like hide-and-seek, and poker.

= Basic concepts = The basis of most mathematical models of game theory is the assumption of rationality. Every player acts such that he maximizes their profit (their win).
 * The player based on stable preferences sets objectives and strategies to elect the most efficient possible to achieve these objectives.
 * The player is confronted by a number of situations and he is able to sort them according to their preferences from the most advantageous to the least advantageous.

The assumption of rationality (of each player) is the factor which distinguishes the game theory from the theory of decision.

This alignment must be complete, i. e. must cover all situations and transitive, i. e. if a player prefer the situation A before the situation B and situation C, and must prefer the situation A before the situation C. Based on the preferences of the situation is derived utility function of player. The sole objective of the player is then maximizes the value of the utility function.

=Games=

All the games that we are considering in this chapter have certain things in common and these are:

=Games types=


 * There is a finite set of players assumed (who may be people, groups of people holding the same opinion, or more abstract entities like computer programs or “nature” or “the house”). The least possible number of players is 2.


 * Number of applied strategies may be finite and infinite. For endless strategy plays a role timing of moves. (for game theory consider games with a finite number of strategies)


 * Winning types are divided into games with constant sum and games with non-constant sum.
 * In constant sum games, the strategy for each option is the sum of payroll functions (winnings) of all players constant - special case of these games are zero-sum games, in which what one player wins, the other player must lose (e. g. chess, tic-tac-toe, popular Chinese game go)
 * In non-zero sum games, winning the one player might not necessarily mean losing for another player (e. g. prisoner's dilemma, battle of the buddies, politics)


 * The game ends after a finite number of moves.
 * Strategic games assumes, that, players will make a one move or decision in the same time (e. g. rock-paper-scissors, prisoner's dilemma)
 * Tensile games are based on sequences of moves at which the players take turns (e. g. chess, tic-tac-toe)


 * According to available information, there are:
 * Games with complete information, each player has complete knowledge of the rules of the game (e. g. chess)
 * Games with incomplete information occur very rarely in real life(e. g. poker, prisoner's dilemma, Bayesian game).


 * Players cooperation
 * In cooperative games, players can eventually build coalitions among themselves or negotiate
 * In non-cooperative games is not possible to create a coalition or negotiate


 * At different points in the game, each player has a range of choices or moves. This set of choices is finite.


 * After the game ends, each player receives a numerical payoff. This number may be negative, in which case it is interpreted as a loss of the absolute value of the number. For example, in a game like chess the payoff for winning might be +1, for losing -1, and for a draw 0.

In addition, the following are properties which a game may or may not have:
 * There may be chance moves. In a card game, the dealing of the hands is such a chance move. In chess there are no chance moves.
 * In some games, each player knows, at every point in the game the entire previous history of the game. This is true of tic-tac-toe and backgammon but not of bridge (because the cards dealt to the other players are hidden). A game with this property is said to be of perfect information. Note that a game of perfect information may have chance moves. Backgammon is an example of this because a die is rolled at points in the game.

Normal form games
Game in the normal form is defined by set $$\{ Q, X_1, ..., X_n, u_1 (x_1,...X_n),..., u_n (x_1,..., x_n) \}$$

where $$Q = \{1,..., n\}$$ are involved players, sets $$X_1,...,X_n$$ are sets of strategies and $$u_i(x_1,..., x_n)$$ are winnings of the player $$i$$ for individual strategies. The normal form game is usually represented as a matrix.

The game for two players in normal form is shown below:

$$player 2 - \begin{matrix}t_1 & t_2 & t_t \end{matrix}\\ pl. 1, \begin{matrix}\mbox{s 1}\\\mbox{s 2}\\\mbox{s k}\end{matrix} \begin{pmatrix}u_{11} & u_{12} & ... u_{1t} \\u_{21} & u_{22} & ... u_{23} \\u_{k1} & u_{k2} & ... u_{kt} \end{pmatrix}$$

Here the player number 1 chose from strategies $$s_1,..., s_k$$ and the player number 2 chose from strategies $$ t_1,..., t_t$$. There is an assumption that this is a zero-sum game, and in this game applies:

$$u_1(s_i,t_j) = -u_2(s_i,t_j)$$

and therefore there is need to write only the value of the prize (utility function) for one player.

Note: according to the definition of the state space

''$$s_k$$ is follower

$$s_i$$ is forerunner''

Example of the enrolment of the unspecified game:

$$player 2 - \begin{matrix}t_1 & t_2\end{matrix}\\\\ player 1 \begin{matrix}\mbox{s1}\\\mbox{s2}\end{matrix} \begin{pmatrix}2 & 3\\3 & 4\end{pmatrix}$$

Example of the rock-paper-scissors rock-paper-scissors game:



$$player 2 - \begin{matrix}R & P & S \end{matrix}\\ pl. 1, \begin{matrix}\mbox{R}\\\mbox{P}\\\mbox{S}\end{matrix} \begin{pmatrix}0 & 1 & -1 \\-1 & 0 & 1 \\1 & -1 & 0 \end{pmatrix}$$

Normal form describes possible strategies and winnings of each player and allows the study of the question of optimal strategies. If there is interested in finding a way appropriate strategies there is need to use enrollment of the games in explicit form.

Extensive form game
Extensive or explicit form of the game is used for the formalization of these games, in which plays a role the order of moves. These games are represented as trees. Each node represents the place where one player chooses stroke and each edge corresponds to a possible move.



The notation of the NIM game in explicit form is shown below. In this game of two players at the beginning there are two stacks of two matches. Players are taking turns and they are taken from one of the piles either one or two matches. The player who removes the last match lose the game.

There are two true sentences: Every game in explicit form can be transferred to just one game in normal form. For every game in normal form notation can be found more games in the explicit form.

=Optimal strategies=

Antagonistic games
The goal of every rational player is to win the game, other words to maximize their winnings. Consider the only strategy antagonistic games between two players that are written in normal form:

$$player 2 - \begin{matrix}t_1 & t_2 & t_l \end{matrix}\\ pl. 1, \begin{matrix}\mbox{s 1}\\\mbox{s 2}\\\mbox{s k}\end{matrix} \begin{pmatrix}u_{11} & u_{12} & ... u_{1l} \\u_{21} & u_{22} & ... u_{23} \\u_{k1} & u_{k2} & ... u_{kl} \end{pmatrix}$$

This matrix reflects values of the player number 1 winning. This player chooses between these strategies (rows $$i$$ of matrix $$u (s_i, t_j)$$) so that his winnings are maximal. Also he knows that his opponent, player number 2 will choose its strategy to minimize winnings of the player number 1. Player number 1 chooses a strategy s*, for which the minimum value of his winning (in this row) will be the maximal one from all lines.

$$s*= max_i, min_j u (s_i, t_j)$$

The player number 2 progresses analogously. Between its strategies (column j and matrix $$u (s_i, t_j)$$) he chooses the strategy t*, for which the maximal value of his loses (in this column) will be the minimal one from all columns.

It applies that: $$max_i min_j u (s_i, t_j) \leq < min_j max_i u (s_i, t_j)$$

$$ max_i min_j u(s_i, t_j)$$ is called the lower price of the game

$$ min_j max_i u(s_i, t_j)$$ is called the upper price of the game

Example with the equilibrium point
$$player 2 - \begin{matrix}t_1 & t_2\end{matrix}\\\\ player 1, \begin{matrix}\mbox{s1}\\\mbox{s2}\end{matrix} \begin{pmatrix}2 & 3\\3 & 4\end{pmatrix}$$

For this normal form game, the equilibrium point takes a form $$(s_2, t_1)$$. For the player number 1 is min $$ u(s_1, t_j) = 2$$ and min $$ u(s_2,t_j) = 3$$ and then max min $$u (s,t) = u(s_2, t_1)$$ what corresponds to strategy $$s_2$$. For the player number 2 is max $$ u(s_i, t_1) = 3$$ and max $$u(s_i, t_2) = 4$$ and then min max $$u(s,t) = u(s_2,t_1)$$ what corresponds to strategy $$t_1$$.

Definition of the saddle point


If $$ u(s*, t*) = max_i min_j u(s_i, t_j) = min_j max_i u(s_i, t_j)$$, there u $$(s*, t*)$$ i called saddle point of the matrix and presents the price of the game.

Game theory tries to find the equilibrium point in every game, where players choose strategies. Players choose such strategies, that anyone of them does not have the reason to change its strategy assuming andt that nobody else does change the strategy. If there is a point of equilibrium, optimal strategies of both players are called pure strategy

Example with saddle point
Sometimes the equilibrium point and then the pure strategy does not exist. In the game shown by the matrix below:

$$player 2 - \begin{matrix}t_1 & t_2\end{matrix}\\\\ player 1, \begin{matrix}\mbox{s1}\\\mbox{s2}\end{matrix} \begin{pmatrix}11 & 5\\7 & 9\end{pmatrix}$$

The player number 1 chooses strategy $$s_2$$, because for his $$min_j, u_{ij}$$ the biggest element is $$u_{21} = 7$$. The player number 2 chooses strategy $$t_1$$, because for his $$max_i, u{ij}$$ the smallest element is $$u{21}=9$$. In this case the lower price of the game is less than the upper price of the game:

$$max_i, min_j, u(s_i, t_j) <  min_j, max_i u(s_i, t_j).$$

And this game does not have the equilibrium point and that is why there are no pure strategies of the players.

Definition of the mixed strategy
Where there is matrix game described of this matrix:

$$player 2 - \begin{matrix}t_1 & t_2 & t_l \end{matrix}\\ pl. 1, \begin{matrix}\mbox{s 1}\\\mbox{s 2}\\\mbox{s k}\end{matrix} \begin{pmatrix}u_{11} & u_{12} & ... u_{1l} \\u_{21} & u_{22} & ... u_{23} \\u_{k1} & u_{k2} & ... u_{kl} \end{pmatrix}$$

Let $$p_1 + p_2 + ... + p_k = 1 and p_i >0

q_1 + q_2 + ...q_l = 1 and q_i > 0$$

Then there is a game with payroll function: $$ \pi (p, q) = \sum_{i=1}^k \sum_{j=1}^l p_i, u_ij, q_j$$ and it is called mixed extension of the original game. This mixed extension means, that every participating player chooses from his strategies with certain probability. The mixed strategy is written as the vector of the relevant probabilities.

Example with mixed strategy
Let's go back to this game:

$$player 2 - \begin{matrix}t_1 & t_2\end{matrix}\\\\ pl. 1, \begin{matrix}\mbox{s1}\\\mbox{s2}\end{matrix} \begin{pmatrix}11 & 5\\7 & 9\end{pmatrix}$$

Mixed strategy of individual players can be found as the extreme of payroll function, that are values p and q for which applies that $$ \pi (p, q)$$ will be maximal resp. minimal.

$$ \pi (p,q) = 11 p_1 q_1 + 5 p_1 (1-q_1) + 7 q_1 (1-p_1) + 9 (1-p_1)(1-q_1) = 8 p_1 q_1 - 4 p_1 - 2 q_1 + 9 $$ and there are partial derivations:

$$\frac{\partial \pi (p, q)}{p_1}= 8 q_1 - 4 = 0$$ and then $$q_1 = 1/2$$

$$\frac{\partial \pi (p, q)}{q_1}= 8 p_1 - 2 = 0$$ and then $$p_1 = 1/4$$

The player number 1 chooses strategy (1/4, 3/4) and the player number 2 chooses strategy (1/2, 1/2). For these strategies is the price of the game $$ \pi(p,q)$$ maximal and equals to 8.

Non-antagonistic games
In the case of non-antagonistic games the winning of one player is not at the expense on the other one. Players can (cooperative games) or cannot (non-cooperative games) negotiate about their strategies. Double matrix of the winnings

$$player 2 - \begin{matrix} t_1 & ........t_2 & ........... & t_l \end{matrix}\\ pl. 1 , \begin{matrix}\mbox{s 1}\\\mbox{s 2}\\\mbox{..}\\\mbox{s k}\end{matrix} \begin{pmatrix} u_{11}, v_{11} & u_{12}, v_{12} & ... & u_{1l}, v_{1l} \\u_{21}, v_{21} & u_{22}, v_{22} & ... & u_{23}, v_{23} \\... & ... & ... & ... \\ u_{k1}, v_{k1} & u_{k2}, v_{k2} & ... & u_{kl}, v_{kl}\end{pmatrix}$$

The value $$u_{ij}$$ corresponds to the payroll of the player number 1 in the case of the strategies $$s_i$$ (player number 1) and $$t_j$$ (player number 2), the value $$v_{ij}$$ corresponds to the payroll of the player number 2 in the case of strategies $$s_i$$ (player number 1) and $$t_j$$ (player number 2).

Definition of the non-antagonistic game
The couple of strategies (s*, t*) is called the equilibrium point, just when:

$$u(s, t*) \leq u(s*, t*)$$ for every s

$$u(s*, t) \leq v(s*, t*)$$ for every t

Let equilibrium point corresponding to the element $$(u_{ij}, v_{ij})$$. Then:

$$u_{ij} = max_k u_{kj}$$

$$v_{ij} = max_k v_{ik}$$

Example of the Prisoner's dilemma
The best example of the non-antagonistic game is non-cooperative game Prisoner's dilemma. Where two prisoners are accused of the same crime. If both of them confess they are both sentenced to 5 years in prison, because the confession is a mitigating circumstance. If both of them deny the crime, they will be both sentenced of the minor delict to 1 year in prison. If only one of them admits, he will be in the role of chief witness acquitted, but the other accused will be sentenced to 10 years in prison. The relevant notation of this game in normal form is shown below. The number in the matrix indicates the length of the sentence. The main goal of the players is to minimize the value of the length of the sentence, or choose a strategy, for which the maximal amount of punishment (for possible strategies for co-accused) is minimal. Then:

The prisoner number 1 $$ s* = min_i max_j u_{ij}$$

The prisoner number 2 $$ t* = min_j max_i v_{ij}$$

$$ player 2 - \begin{matrix} D &  A \end{matrix}\\\\ pl. 1, \begin{matrix}\mbox{D}\\\mbox{A}\end{matrix} \begin{pmatrix} (1, 1) & (10, 0)\\ (0, 10) & (5, 5) \end{pmatrix}$$

Based on the equilibrium strategy the best option for both of them is $$(s*, t*) = (admit, admit)$$. By looking at the double matrix, we find that the better (in terms of the amount of punishment) would be the strategy $$(deny, deny)$$ because in this case the amount of the punishment for both prisoners was lower. Hence the name of the game is "dilemma". None of the accused does not know whether his accomplice succumb to temptation confess and go unpunished.

The search for suitable strategy
The following chapters show two ways to search for a suitable strategy for antagonistic games - minimax and alpha-beta pruning. Both strategies are shown in the example of the game written in developed form represented by the tree.

There are also algorithms for searching trees and the main difference between these algorithms and algorithms for finding strategies during games is that there is no control over all transitions between the nodes of the tree because some of the moves do the opponent.

More precisely: if we consider sequential game of two players, then player A decides at nodes that are even-numbered and player B at nodes that are odd. This method of searching when players have no control over all of their moves is called adversarial search.

Minimax
Minimax strategy for decision making under uncertainty has been described above. It is based on the principle that assuming the assumption of rationality of the opponent, we choose such a move to make the best subsequent move of the opponent, from our perspective the least dangerous.

The notation consists of the tree, where to each node is assign a value based on the number of wins (value function $$v$$) in the sub tree of the node. Each node of the tree is divided into the MAX level (even nodes) and MIN levels (odd nodes). At every MAX level the first player selects the move, which maximize the value of the specified criterion, and at every MIN level the second player, which minimizes the value of that criteria. The criterion is MINIMAX value:

$$MINIMAX(n)=\left\{\begin{matrix} u(n), & \mbox{for }n\mbox{ leaf node} \\ max_s MINMAX (s), & \mbox{for  }n\mbox{ is MAX node} \\ min_s MINMAX (s), & \mbox{for   }n\mbox{ is MIN node}\end{matrix}\right.$$

where $$s$$ are all the successors of node $$n$$.

Valuing the nodes runs from the bottom - from the leaves representing the final game situation towards the root. In the MINIMAX picture, the player chooses for his move, the MIN, that is first node from the left, which is 5. Then he choose the MIN from second set, it is 2 and then he also chooses the MIN from the third set, which is 2. From these 3 results, he chooses the MAX, which is 5. The result is 5.

Application of MINIMAX assumes, that the whole tree of solution is known. The example from real life is tic-tac-toe.

Alfa-beta pruning


is modification of the MINIMAX strategy based on the branches method, which allows cuts not perspective parts of the tree. In the algorithm, there are two new values:

$$\alpha$$ is the biggest known value for the node MAX

$$\beta$$ is the smallest known value for the node MIN

Every MAX node is sequentially compared with MINMAX value of individual followers with $$\beta$$ value and if the MINMAX $$ > \beta$$, then other followers are not searched. Every MIN node is sequentially compared with MINMAX value of individual followers with $$\alfa$$ value and if the MINMAX $$ < \alfa$$, then other followers are not searched. The complexity time of alpha-beta pruning is lower than by MINIMAX strategy.

=Backup file=

=References=

--Xgubk00 19:46, 25 January 2015 (CET)