Skip to main content

Project Synopsis

PROBLEM:

In a variety of games people have developed bots that can accurately mimic human behavior while playing a game. Our problem is can we develop a traditional algorithm or machine learning model that can accurately detect whether a player is a human or bot. In addition, can we develop framework that can read data, process it, and use it to make real time predictions for an end user?

APPROACH:

Our approach focuses on training machine learning models to accurately make predictions on whether a player is a bot or a real person. To streamline this we developed a modular framework that a client can add to in order to pair their own models with our framework in order to generalize the solution to various models and games. These plugins have standard implementations that we have created to create our own working implementation of the framework as well as examples of how additional plugins could be created and used.

There are several components that need to be developed:

  • Machine learning models for bot detection
  • Memory processing module to read in data
  • Data processing module to prepare data for a model
  • Prediction module that takes data in order to output a prediction

Each module of the framework must run as its own process to keep up with a running game to output predictions to the user quickly and so we use multithreading to accomplish this. In addition, we want the framework to be versatile and added a config file where various settings can be changed to help fit different possible scenarios an end user may wish to use the framework for.

This framework will run alongside a game where the input will be piped to the framework through the command line.

Once running it will output results to said terminal with a certainty percentage of its prediction (With 0 being a guess of bot and 1 being a guess of a human).

From there our framework continues to run and produce new predictions until the input of new data ends, after which all processes close.

The first machine learning model plugin that we added to the framework is random forest classifier (RFC). Random forests are an ensemble learning method, meaning it is an aggregate of many decision trees and the final output uses the outputs of all the trees or estimators specified. The RFC has good classification performance but its biggest benefit is its use in feature selection. The RFC will only utilize features that split the data in the decision trees so many features can be dropped by seeing the number of times they were used for splitting. We do not use this functionality but it is important to note this use.

Another approach to the problem that we investigate is the usage of a neural network known as the Long-Short Term Memory Autoencoder. Long-Short Term Memory (LSTM) is a recurrent neural network model for encoding long dependencies in sequences. LSTM Autoencoder is an unsupervised learning model that aims to classify known and unknown sequences. In LSTM Autoencoder, an encoder LSTM encodes and compress the sequence. The output of the encoder is passed to the LSTM decoder to regenerate the sequence as best as it can. As the model train on the target data, the model will learn to reproduce the sequences better. During the inference phase of the model, if the input data has some anomaly that was not in the target data, the model will fail to reproduce the sequence, thus incur large loss. A threshold for this loss is use to binary classify the target and anomalous sequences. For this model, the target data is the bot game play, and the anomalous data is the human game play.