{ "cells": [ { "cell_type": "markdown", "id": "functioning-maine", "metadata": {}, "source": [ "![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)\n", "\n", "# Two-Armed Bandit\n", "\n", "This tutorial was inspired by and adapted from [Models of Learning](http://www.hannekedenouden.ruhosting.nl/RLtutorial/Instructions.html) and the [Neuromatch Academy tutorials](https://github.com/NeuromatchAcademy/course-content) [[CC BY 4.0](https://creativecommons.org/licenses/by/4.0/)]." ] }, { "cell_type": "markdown", "id": "received-roman", "metadata": {}, "source": [ "In this tutorial, we will complete a learning task where your goal will be to maximize the amount of points you can earn by sampling the reward distribution from one of two slot machines. This is also known as a two-armed bandit task." ] }, { "cell_type": "markdown", "id": "manual-fairy", "metadata": {}, "source": [ "## Getting started\n", "\n", "To start off, we will (0) open Terminal (Mac/Linux) or Anaconda Prompt (Windows). We will then (1) activate our course environment, (2) change the directory to `gu-psyc-347-master` (or whatever you named your course directory that we downloaded at the beginning of the semester; you can also [download it here](https://github.com/shawnrhoads/gu-psyc-347/archive/master.zip)), (3) update the directory contents using `git`, and finally (4) check that our course environment is up-to-date.\n", "\n", "Items 1-4 can be accomplished using the following four commands in the Terminal / Prompt:\n", "\n", "```\n", "conda activate gu-psyc-347\n", "cd gu-psyc-347-master\n", "git pull origin master\n", "conda env update --file course-env.yml\n", "```" ] }, { "cell_type": "markdown", "id": "yellow-christopher", "metadata": {}, "source": [ "## Running the task\n", "\n", "The task we will run was built using [PsychoPy](https://www.psychopy.org/), which is a \"free cross-platform package allowing you to run a wide range of experiments in the behavioral sciences (neuroscience, psychology, psychophysics, linguistics).\" PsychoPy is a great tool to use to create and run behavioral experiments because it is open-source and is backed by a huge community of developers and users!\n", "\n", "To run the course, check that you now have a directory called `two-armed-bandit` in your course directory. If your course directory is called `gu-psyc-347`, the directory structure should look something like this:\n", "\n", "```\n", "gu-psyc-347\n", "├── course-env.yml\n", "├── LICENSE\n", "├── README.md\n", "├── requirements.txt\n", "├── docs\n", "│ ├── static\n", "│ ├── solutions\n", "│ ├── tasks\n", "│ └── two-armed-bandit\n", "│ ├── two-armed-bandit.psyexp\n", "│ ├── two-armed-bandit.py\n", "│ ├── two-armed-bandit_lastrun.py\n", "│ ├── data\n", "│ ├── orders\n", "│ ├── stimuli\n", "```\n", "\n", "Once you are able to confirm that your directory looks like this, then we can start the experiment!\n", "\n", "**Before you run the experiment, your instructor will give you a number ranging from 0-13. This will be the participant ID number that you input at the beginning of the task. Please keep this number in mind.**\n", "\n", "You can run the experiment by using this command in Terminal (Mac/Linux) or Anaconda Prompt (Windows): `python two-armed-bandit.py`\n", "\n", "The task will take roughly 8 minutes to complete." ] }, { "cell_type": "markdown", "id": "asian-temple", "metadata": {}, "source": [ "## Task Debrief\n", "You just completed 72 trials! 2 slot machines were presented on every trial, each associated with a certain probability of reward.\n", "\n", "Did you learn which slot machine had the greater payout probability? \n", "\n", "You were not told this, but the payout probabilities for the blue and orange machines were coupled. There was always one 'good' option and one 'bad' option.\n", "\n", "In breakout rooms, discuss with a partner the following questions and report back to the group:\n", "- What did you like about the task?\n", "- What didn't you like about the task?\n", "- What do you think the probablities of the slot machines were? (e.g., 50/50, 25/75, 60/40, 80/20). Do you think you and your partner(s) had the same slot machine reward probabilities? Why or why not?\n", "- Why do you think the position of the machines were randomized on each trial? In other words, why wasn't the orange slot machine always on the same side?\n", "- How many trials did it take you to learn which slot machine was better (if at all)? If there were less trials (<72 trials) or more trials (>72 trials), do you think you would be better or worse at learning?\n", "- Were there times when you expected to receive a rewarding outcome but didn't? How did that change your behavior on the next trial?\n", "- When you didn't get the outcome you expected, how often did you switch your choice on the next trial? What parameter in the Rescorla-Wagner model might correspond to this phenomenon? (Recall the equation from class: $ Q^k_{t+1} = Q^k_t + \\alpha (r_t - Q^k_t) $)\n", "- How \"explorative\" was your behavior when you were completing the task? In other words, even if you learned which slot machine had the greater payout probability, how often did you sample the other one just to see what would happen? Do you think we can model this as well?" ] }, { "cell_type": "markdown", "id": "spanish-geneva", "metadata": {}, "source": [ "