Abstract

Doudizhu poker is a very popular and interesting national poker game in China, and now it has become a national competition in China. As this game is a typical example of incomplete information game problem, it has received more and more attention from artificial intelligence experts. This paper proposes a multirole modeling-based card-playing framework. This framework includes three parts: role modeling, cards carrying, and decision-making strategies. Role modeling learns different roles and behaviors by using a convolutional neural network. Cards carrying can calculate reasonable rules especially for “triplet” by using an evaluation algorithm. Decision making is for implementing different card strategies for different player roles. Experimental results showed that this card-playing framework makes playing decisions like human beings, and it can to some extent learn, collaborate, and reason when facing an incomplete information game problem. This framework won the runner-up in the 2018 China Computer Game Competition.

1. Introduction

As one important branch of artificial intelligence (AI), computer game is a challenging problem in the broad and deep logical AI decision-making field. It has long been an important verification scenario for various data mining and machine learning algorithms and is known as AI “fruit fly” [1].

The field of computer games is divided into two branches: complete information and incomplete information machine games. Characteristics of the complete information game are that the player can obtain all the situation information completely, for example, in Go [2], chess [3], Chinese chess [4], and Tibetan chess [5]. Characteristics of incomplete information games are that players cannot obtain all or credible situation information during the game. In incomplete information games, the true state of the game environment is often unknowable, and the information held by the players involved is asymmetric and incomplete, which makes the study of incomplete information games more complicated and challenging, for example, poker games such as Texas Hold’em [6], mahjong [7], and Doudizhu [8]. Most applications in the real world are incomplete information games, such as business strategy negotiations, financial investments, bidding strategies, political activities, autonomous driving, medical planning, network security, and military applications.

The traditional computer game mostly focuses on chess games with complete information. Initially, the maximum and minimum search based on depth-first search was used as a general method of game state tree search in computer game systems. Subsequently, the famous Alpha-Beta pruning [9] was proposed and widely used. The search for the maximum and minimum of Alpha-Beta pruning is called Alpha-Beta search. Based on Alpha-Beta search, some excellent and improved algorithms are derived, such as PVS [10], MTD (f) [11], and other algorithms [12, 13] that optimize the search window based on the local principle of search space and data quality-sensitive various heuristic or nonheuristic permutation table optimization methods. In the absence of a full search, the actual effect of Alpha-Beta search is highly dependent on its situation evaluation function. In order to avoid the dependence of the Alpha-Beta search process, especially its situation evaluation process, the Monte Carlo Tree Search (MCTS) algorithm [14, 15] came into being. It uses a large number of random matches to simulate the objective game’s winning rate and then solves the game problem. It has good versatility and controllability.

With the breakthrough development of deep learning, models such as deep confidence network (DBN) [16], deep automatic encoding machine (DAE) [17, 18], and deep convolutional neural network (CNN) [19] have been used to successfully solve many problems in the field of computer vision. Especially CNN’s superior performance in the field of image pattern recognition and the relatively easy and purely supervised learning and training process make it quickly popular [2023]. Deep learning technology is known for its powerful map expression ability and excellently completed various regression and classification tasks. No matter in the laboratory or in various practical application scenarios, deep learning technology has the potential to be a core component to optimize the quality and efficiency of computer game systems. The most famous deep learning computer game model belongs to the AlphaGo series of Go computer game systems of the DeepMind team. In 2015, AlphaGo defeated the European Go champion Fan Yu [24], and then in 2016, its reinforcement version AlphaGo Lee defeated world-class Go Master Li Shishi; in 2017, AlphaGo Master defeated World Go champion KeJie in the Open; in the same year, the computer game system AlphaGo Zero [25] and the general chess computer game system AlphaZero were fully trained by unsupervised reinforcement learning methods [26]. It was announced that AlphaZero defeated the strongest existing computer game system in Go, chess, and shogi. AlphaGo is the first integrated deep learning computer game system with remarkable success. It uses both strategy network and value network. A deep convolutional neural network provides reference opinions for decision making and situation evaluation. In addition, these two CNN models first use a large number of professional game data for supervised training and then use a reinforcement learning algorithm based on the DQN algorithm [27].

In a game with incomplete information as opposed to a complete information game, game players have private information, and neither party can get all the state information of the current situation. Therefore, it is impossible to reasonably evaluate the game situation by artificially extracting features. It is difficult to determine the range of actions that the opponent can perform. In addition, the game tree of the incomplete information game is extremely large. Although the Monte Carlo method will search for the optimal path to a certain extent, it still makes the original game algorithm inapplicable to the game with incomplete information.

At present, there are three main ideas about incomplete information games: the first one is based on game theory, through various methods to shrink and create a game tree [28, 29], using a search method similar to complete information game to traverse the game tree and find the best strategy obtained at the equilibrium point [3032]; the second is based on reinforcement learning and multiagent cooperation, through self-play, learning to formulate game strategies [3335]. The third is a knowledge-based method. By learning the behavioral characteristics of a large number of professional human players and combining the rules of artificially joining information, the game strategy is finally formulated [3638].

In this paper, the second multiagent is combined with the third knowledge-based method. Each character is regarded as an agent, which is modeled separately to design and implement different card-playing strategies for different characters. Relying on large-scale historical data, the deep learning method is applied to the Doudizhu poker game.

In Section 2, we will introduce the rules of the Doudizhu game and the overall framework of the Doudizhu game system based on multirole modeling. We will explain each component in Sections 3, 4, and 5 including detailed information on character modeling, carrying cards strategies, and decision making. In Section 6, we will show how to prepare for the experiment and the results of competition with human players. Finally, in Section 7, a summary and problems to be improved will be given.

2. Design of Card Game System Based on Multirole of Doudizhu

2.1. Rules and Common Terms in Doudizhu

Doudizhu is a simple and entertaining traditional Chinese poker game and is usually played by three players. A standard game includes dealing, bidding, playing, and scoring. Three players use a 54-card deck (with two jokers), in which every player can get 17 cards. Three cards are left as hole cards. There are two sides in the game, the Dizhu and the Farmers. After dealing, players bid according to their hand; the player who bids the highest score becomes a Dizhu (the attacker, a game of three contains only one Dizhu). A player becomes the Dizhu by bidding and gets the hole cards, and the other two players become Farmers (defenders, they are allies) to compete with the Dizhu. Then, the players take turns playing cards according to the rules (about the played cards). The side that gets rid of all their cards first wins. Dizhu gets more score than Farmers if he wins. Terms in this paper are defined as follows:Game: the whole process including dealing, bidding, playing, and scoring is called a game.Round: several games played by three players are called a round.Hands: the number of plays to play all according to the rules when the other two players choose to pass every time.Suit pattern: the suit patterns, patterns for short, are certain combinations of cards that are legal to play in the game such as pass, rockets, bombs, and standard patterns.

2.2. The Overall Framework of the Card Game System of Doudizhu

Each player needs to constantly change judgments and dynamically choose his own strategy based on his role, the relationship of other participants relative to himself, and the actual actions he observes of other participants. The design of the playing system of the three characters in the Doudizhu is shown in Figure 1. The Doudizhu framework is divided into three parts: role modeling, carry cards strategy, and decision making.

In Figure 1, the “history data,” which are based on the human poker player provided by a well-known website, are first divided into two different datasets according to Dizhu and Farmers. Those data are used for subsequent model training and verification; the “role modeling” uses convolutional neural networks to model Dizhu, Farmer 1, and Farmer 2 according to different training data and learns the behaviors of different characters; the “banding strategy” is mainly for the “three belts” card type, and it is reasonable to use valuation algorithms to learn. The licensing rules of different roles are the same; the “decision making” is to give different strengths of playing strategies for three different roles of different Dizhu, Farmer 1, and Farmer 2 to reflect a higher level of cooperative confrontation. The following sections introduce the design and implementation of role modeling, licensing strategy, and decision making.

3. Modeling and Design of Doudizhu Game Based on Convolutional Neural Network

The modeling of multirole in this paper includes two aspects: (1) separation of training data and (2) different decision-making methods of playing cards. This article divides the historical card-playing data of the platform into two parts according to the role of the final winning player, which are the data of Dizhu win and Farmer win, respectively. Use the Dizhu winning data to train the Dizhu model and use the Farmers’ winning data to train Farmer 1 and Farmer 2 models, respectively. See Section 5 for the realization of card strength. Multirole modeling is implemented using deep convolutional neural network (CNN).

3.1. CNN Model Input Format Design

In the Doudizhu game, the game participants have private information and cannot get all the status information of the current situation. Although they only know the characteristics, strategy space, and income function information of some other participants, they are not aware of their opponents’ cards. It is a state that is not fully understood, and as players make various operations, the information that can be learned will gradually increase, and the estimates of other players’ hands will be gradually accurate.

The information provided to the neural network model should be complete and not redundant. If playing cards do not consider suits, there are 15 different card information, namely, the numbers A, 2–10, J, Q, K, black joker, and red joker. In this article, enter “A23456789TJQKXD” in the following order to enter the player’s hand information, where “T” means “10,” “X” means “black joker,” and “D” means “red joker”.

In order to fully reflect the advantages of convolutional neural networks, in the representation of the model input data, it is necessary to not only show the current game state but also include the historical sequence of operations and reflect the player’s confrontation relationship. To this end, this article contains the following five aspects of information in each model input (game state) of a single character.where represents the total card of the Dizhu’s game; represents the remaining hands of the current player; represents the unknown hands (the sum of other players’ hands); represents the total number of historical cards; and represents the number of rounds from the current state forward card data. This article uses the first 5 rounds of data, a total of 9 sets of data.

The confrontation and cooperation of the game are reflected in the input channel and are arranged in the order of the Dizhu, Farmer 1, and Farmer 2.

Therefore, the input size of the CNN model is a [9 × 15 × 3] matrix, where “9” means , “15” means card information, and “3” means three different player data of Dizhu, Farmer 1, and Farmer 2.

3.2. CNN Model Output Format Design

The model’s output is the way to play. This section mainly considers the 8 kinds of action, such as “pass,” “bomb,” “single,” “pair,” “triplet,” “single sequence,” “double sequence,” and “triplet sequence.” The carrying card type is more complicated and will be discussed separately in Section 4.

Corresponding to 8 kinds of action, this paper further divides the way to play cards into 182 types, as shown in Table 1. Each card-playing method is represented by a single vector, and the corresponding position of 15 cards is marked with “1,” and the remaining positions are marked with “0.” The way of playing cards is manifested in the form of different probability distributions, and the one with the highest probability value is the strategy of the round of playing cards.

3.3. Role Model Design Based on CNN

The model uses a convolutional neural network, which consists of 9 convolutional layers, 2 fully connected layers, one batch normalization (BN) [39] layer, and an output layer. As shown in Figure 2, after the data sample is input, it will pass through 9 convolutional layers.

The number of convolution kernels from the first layer to the third layer is 64, 128, and 196, the size of the convolution kernels is 5 × 5, 3 × 3, and 3 × 3, the number of the remaining 6 layers of convolution kernels is 256, the size of the convolution kernels is 3 × 3, the horizontal and vertical movement steps of the convolution kernels are 1, and 0 is added around the output matrix after the convolution operation to maintain the input time. The data size of 9 × 15 remains unchanged. The activation function after the convolution operation uses ReLu (rectified linear units) [40]. No downsampling operation is performed after the convolution operation. After 9 convolutional layers, the data enter 2 fully connected layers. The number of neurons in each fully connected layer is 256, and the nonlinear activation function still uses the ELU function. Finally, the data enter the output layer after entering a BN layer, which contains 182 neuron structures. The data entering the output layer do not need to go through a nonlinear activation function. The Adam algorithm [41, 42], which is more stable than the stochastic gradient descent algorithm, is selected, and it iteratively optimizes the convolution kernels of all convolutional layers in the network and the connection weight values of neurons in the fully connected layer according to the error between the network output and the expected value. The model output is normalized by the sigmoid function and falls in the interval [0, 1].

4. Carry Cards Strategy Design

There is a special card type in the Doudizhu game, which is divided into three carry cards and four carry two cards. The specific explanation is shown in Table 2.

From the explanation in Table 2, it can be seen that the card type is more complicated. It is based on the “triplet” and “triplet sequence” card types and fully considers the current hand card information to decide which card types are more appropriate.

First, split the opponent cards according to the “rocket,” “bomb,” “sequence,” “pair,” “single,” and other card types and then count the number of various card types and use the valuation algorithm. Calculate the estimated size returned by multiple split branch nodes and select the maximum node as the final card type.

The valuation algorithm mainly considers the following points:(1)Consider whether you can finish your hand after you carry the card. If you can finish the hand, choose this operation directly.(2)Consider the degree of threat to opponents by different card types; the “bomb,” “sequence,” and other card types are assigned different value weights from high to low.(3)Consider that the number of “three belts” can offset the number of “single cards” and “pairs”; the more the number of offsets, the better.(4)It is stipulated that the number of “three belts” cannot offset the number of “single cards” generated by the licensed system.(5)When calculating the value of “straight,” consider some special split situations. For example, splitting “3455667789” into “34567” and “56789” is the best calculation method.(6)Consider the value of the single card of the large digital board, that is, the value of “A,” “2,” “X,” and “D.”

The value of each card type in the hand is calculated as shown in formula (2), that is, the square of the card type coefficient α is multiplied by the number of card types.where represents the number of different cards, is the coefficient of different cards, and . In this paper, the value of the “bomb” is set to 8; the “Lian Pair” value is 6; “straight” value is 5; “three belts” value is 3; separate “pair,” “single card,” and the main card’s a value is 1.

The face value of different hands is the sum of the included card values. The calculation is shown in the following equation:where indicates whether the hand is finished after the card is brought, and the default value is 0. If the hand can be played, a larger value is returned directly, such as 9999.

For the different characters in the Doudizhu game, the strategy of carrying cards is the same, and the value with the highest valuation is selected.

5. Decision-Making Design Based on Multirole Modeling

The output of the CNN model we gave in Section 3 is the probability distribution of the players’ different card-playing strategies, and the maximum probability play card is directly used as the final card-playing strategy. Sometimes, certain errors will occur. For example, the previous player played the “straight” card type and the current player’s maximum card probability may be “connected pairs,” but it is against the rules. Therefore, according to the probability value, this paper selects 5 card types from large to small and chooses the one that satisfies the rules of the game as the strategy, instead of only considering the card types with maximum probability.

On the basis of multirole modeling, this section further refines the strategy for different players. Combining role modeling and card-bearing strategies, plus the confrontational and cooperative relationship between players, different levels of playing strength are used to generate final playing decisions for different characters. The specific settings are as follows:(1)For the role of Dizhu, the strategy is to directly select the maximum probability of playing cards as the final playing strategy.(2)For the role of Farmers, because the Farmers’ strategy contains a large number of “pass” operations, although the cooperation relationship is reflected to a certain extent, too many such operations will cover some correct methods, especially when the Farmers have fewer cards, and when the card power is small, the “pass” operation of other players cannot effectively increase the player’s chance of playing cards. For example, in a situation where Farmer 2 has a remaining “4” card, if the Dizhu plays a “5” card, the strategy probability of Farmer 1 is [0.32, 0.25, 0.19, 0.11, 0.09], representing [“pass,” “2,” “9,” “Q,” “6”]; at this time, the maximum probability is the “pass” operation, and the second probability is within 0.1 of the maximum probability, so choose to play the “2” plate as the best strategy. Therefore, the final strategy selection method for Farmers is shown as follows:where x represents the maximum probability,y represents the second probability for a “pass” strategy, z indicates the maximum probability that the strategy is not a “pass”, and δ represents the card strength (δ is taken as 0.1 in this paper). When the maximum probability card strategy is “pass” and if the difference between the second probability and it is within 0.1, the card strategy represented by the second probability is selected. The larger the d value, the more the Farmer’s card strategy which tends to avoid “pass” operations.

To sum up, this paper combined multirole modeling with the card-carrying strategy, considered the antagonism and cooperation between players, used different levels of card strength, and generated the final card strategy of different roles.

6. Experimental Results and Analysis

The server configuration of the training environment is Ubuntu 16.04.2 LTS operating system, NVIDIA GeForce GTX TITAN X graphics card, 12 GB video memory, and Tensorflow version 1.0.0. The data come from a real-time game record of a live-action platform on a well-known website in China, including the initial hand of the game and the detailed card process. Of the 5 million Games selected, 3 million were won by Dizhu, and 2 million were won by farmers.

This paper, respectively, conducted experiments on multirole model, multirole card-playing performance, and card strategy performance, analyzed the effects, and proposed ways to improve the problems.

6.1. Implementation of Multirole Modeling

The experiment of multirole modeling mainly shows the training effect of different role models of Dizhu, Farmer 1, and Farmer 2. The model is trained on a high-performance graphics card with a batch size of 100 and a learning rate of 0.001. The training results are shown in Figures 35, which show the changes in the accuracy rate of the Dizhu model, Farmer 1 model, and Farmer 2 model as the training data increase. The horizontal axis is the number of iterations, and the vertical axis is the similarity of the network output strategy and the actual player strategy, that is, the correct rate.

The experimental results show that the similarity between the output strategy of the three player models and the real player strategy is around 85%. It shows that the model has extracted certain game state features, and the selection of the current playing strategy is to some extent similar to real players. In addition, the statistics of the three character players’ playing cards are found, and it is found that the Farmer strategy has more “checking” operations than the Dizhu strategy, indicating that the two Farmers are cooperative, and the Farmer players often provide better teammates.

6.2. Multirole Card Performance Test

This experiment mainly tests whether the three card-playing models (hereinafter referred to as AI) can draw an appropriate card-playing strategy according to the current situation, which reflects the confrontation and cooperation relationship. In order to test the intelligence of AI, the same game was set up in three ways to play: (1) three-role AI Program for self-game; (2) AI for the Dizhu and human for Farmers to game; and (3) human for the Dizhu and two AIs for Farmers to game. The similarity between AI and human strategies is observed in a particular game. The game situation is shown in Table 3.

“Cards information” in Table 3 indicates the initial situation of the game in the order of “Dizhu’s initial hand, Farmer 1’s initial hand, Farmer 2’s initial hand, and the bottom card,” where “0” indicates Dizhu, “1” indicates Farmer 1, and “2” means Farmer 2. In different game processes, “0, 33” indicates that the Dizhu played “33.” If the player chooses the “pass” strategy, it will not be recorded.

The game process shows that AI has the characteristics of cooperation, card removal, and card combination:(1)In the wavy part of the second data, when the player at position 2 plays “QQ,” even if the player at position 1 has a larger card, the AI chooses “pass” to increase the chance of playing at position 2.(2)In the thick line part of the second data, human players and AI face the same situation and play the same card type.(3)In the wavy part of the third data, the player at position 2 will disassemble the “JJJ” and give priority to the “TJQKA” card type.

The data show that the game program implemented based on the method in this paper is very similar to the playing habits of human players and can perform some reasonable combination of card operations, as well as cooperation between the two Farmers.

6.3. Carrying Card Strategy Performance Implementation and Testing

According to the game process listed in Table 3, further analysis of the implementation of the card strategy is mentioned in this article. We selected four cases from Table 3 (see double-underlined position) , focusing on AI's strategy, summarized in Table 4.. Among them, “situation” means the current player’s hand situation when faced with the use of the card strategy; “type of card” is the type of the card that can be selected; “output strategy” is the type of card that is finally recommended by the card strategy.

The analysis found the following:(1)In ordinary situations, such as the situation where some “single cards” or some weak cards exist alone, the card strategy can find such cards well, such as when the “3” card exists alone. The strategy with license will be output first.(2)In a special situation, such as a situation with a combination of “sequence” and “triplet”, the card-licensing strategy still has good performance, as shown in Table 4.(3)In a special situation, the card-licensing strategy will give priority to the output of scattered cards and will not destroy the combination of key cards such as “sequence” and “triplet”. For example, in the first data of Table 3, the game situation faced by AI is “456667899”, which includes the “sequence” card type. The output of the strategy with the card does not destroy this card type and even “intentionally” uses other card types to play and create “sequence” cards.

In a word, the experiment shows that the card-licensing strategy can make a more reasonable strategy when facing different situations.

7. Conclusion

From the perspective of incomplete information games, this paper proposes a complete game framework for Doudizhu, fully considering the confrontation and cooperation in the Doudizhu game, models separately according to the player’s role, and fully reflects the game information and rules on the CNN model input representation. This article elaborates on the complete game method of the Doudizhu game of “player modeling strategy with card decision making,” supplemented by specific examples. In the final decision-making section, this paper discusses a number of key factors that affect decision making and uses different levels of card strength for different players. This program has won the runner-up in the 2018 China Computer Game Contest, which shows that the multirole modeling strategy proposed in this paper is feasible.

Data Availability

The data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

This study was supported by key potential projects of promoting research level program at Beijing Information Science and Technology University (no. 5212010937), by Normal Projects of General Science and Technology Research Program (no. KM201911232002), and by Construction Project of Computer Technology Specialty (no. 5112011019).