Bdtlib by govg

Introduction

Multi-Armed Bandit originates from the study of slot machines and optimally playing them. The general scenario is where in an iterative process, an agent chooses from available actions, receives rewards corresponding to his choice, and attempts to maximize the reward he obtains. There are various assumptions that can be made on the actions (does the agent know any side information? can he see what reward he could have obtained with other choices?), which lead to different analysis.

This library implements popular algorithms that try to solve both the normal multi-armed bandits case (where the agent only choose an arm number, and each arm generates a reward distributed about a particular mean) and the contextual multi-armed bandit case (where the agent observes some side information, "context" associated with each arm before the choice is made).

Multi armed bandit without context

The currently implemented (and tested) algorithms are :

Upper Confidence Bound (UCB)
Epsilon Greedy
Random

This gives a nice overview of the various algorithms, including ones that have not been implemented as of yet.

Contextual multi-armed bandits

The currently implemented (and tested) algorithms are:

Thompson Sampling
LinUCB
Epsilon Greedy - Currently in progress
Random

This is a nice overview of different algorithms and their analysis with respect to the contextual case.

Two popular datasets (that I will try and work with) that can be used as reference are the following

Note that the (contextual) multi-armed bandit setting lends itself well to modelling online learning, especially for models where the final outcome (click/no click, expected rating) can be expressed as a linear combination of the input features (a'la linear regression). To this, it can be verified that these algorithms quickly converge to a decent linear regressor when used on standard regression datasets (the tests include one that uses the Boston House prices dataset).

Build and usage

The package has been tested only on Linux (Arch Linux 64bit), but it should work on any Linux distribution without any changes. I don't feel like testing on Windows. Usage is straightforward, drop the bdtlib folder into either your python path, or your local folder (in which case you will have to append that to the python path). Example code can be found in the tests folder, this includes testing all implemented algorithms as well as plotting the results on both the Boston prices dataset, as well as randomly generated linear model with uniform noise.

As this is part of a course project, I will be developing (making breaking changes to ) it in the coming month, and hopefully over the next few as well.

Contact

I can be found using the nick govg on Freenode/Quakenet/Snoonet.

For further details, please go to my homepage

Govind Gopakumar (@govg)

Department of Computer Science and Engineering,

Indian Institute of Technology, Kanpur