Introduction
This is the README that you can find in my GitHub repository, which contains the project I developed for my Bachelor’s thesis in Economics and Finance at the University of Bologna. The purpose of my research is to detect the emergence of collusion in Prisoner Dilemma games played by two reinforcement learning algorithms using a Random Forest Classifier. If you find this topic particularly interesting, you can read the entire thesis here.
Project overview
The project is based on research from this paper which states that algorithmic pricing softwares powered by Reinforcement Learning can autonomously learn to collude. The main question addressed in this project is:
When and how is it possible to detect the beginning of tacit collusive behaviour between reinforcement algorithmic powered agents?
Problem definition
In the Prisoner’s Dilemma game, players make one move at a time, and with each move, a time series of scores is created, indicating the optimal next move. These scores vary around 0, and depending on whether they are positive or negative, the RL algorithm will undertake cooperation or defection. At some point, the scores take values that remain strictly positive or negative, indicating the beginning of collusion
To visualize this, an example is provided below:
Every line refers to a current state
- red (CC) both cooperate -> collusive state
- black (DD) both defect
- blue (DC) one defect and the other cooperates
- green (CD) opposite of the blue line And the score (dQ) indicates what the next action will be:
- positve -> next action defect
- negative -> next action cooperate
The pattern shown in this image is the same in all experiments, where the red line departs from zero sooner than all the others, then after some thousand iterations, the other 3 lines take strictly positive or negative values.
Collusion is defined as the moment when both agents cooperate and will continue to cooperate for the next moves. This moment is visually obvious, as the red line becomes negative forever. To give a “date” to this event, the exact iteration can be identified by zooming in on the image, as shown below:
For this experiment is iteration 159290
Featues engineering
After identifying the critical dates, when the previous iteration is non-negative and the following is negative, the switch date (the date on which the red CC line becomes strictly negative) is labeled for all training and testng data
Features are created by calculating moving averages on different periods for every critical date, for states CC, DD, and DC:
- 100 interations before and after ml100, mu100
- 500 interations before and after ml500, mu500
- ml1000, mu1000
- ml3000, mu3000
- ml5000, mu5000
- ml8000, mu8000
- ml30000, mu30000
- ml50000, mu50000
Random forest classifier
A Random Forest Classifier (RFC) was trained and tested on the data generated by the Prisoner’s Dilemma game. The classifier was trained on 2000 sessions (1000 per agent) and was tested on the same number of unknown observations. The most relevant features for the classifier were found to be the moving averages of the CC red line, as well as the longest averages of the DC blue line. The variable importance plot of this classifier is shown below:
The results of testing the RFC on the unseen data showed a high level of accuracy, with an Area Under the Curve (AUC) of 0.9985019 on the Receiver Operating Characteristic (ROC) curve, as shown below:
Conclusions
This project demonstrates that it is possible to detect the emergence of collusive behavior between Reinforcement Learning powered agents playing the repeated game of Prisoner’s Dilemma with a high level of accuracy. This was achieved using a Machine Learning algorithm, a Random Forest Classifier. As pointed out by Calvano et al. “Today, the prevalent approach to tacit collusion is relatively lenient, in part because tacit collusion among human decision-makers is difficult to detect." This project provides a method for detecting collusion in the context of algorithmic agents and has the potential to inform regulatory approaches to collusion in digital markets.
Future work
This project is a proof-of-concept and future research could expand on this work by testing the classifier on different types of games or in more realistic settings. Additionally, further research could explore different machine learning algorithms or feature engineering techniques to improve the detection of collusion.
Note
The code and data used in this project can be found in the src and data folders of this repository. If you find this topic particularly interesting, you can read all my thesis here.