Prosocial RL Exercises Solutions¶

We will now fit your behavioral data to the models developed during class (as well as a few others) and then compare the models to assess which model best explains the data. This exercise is a bit different from previous exercises. I will only provide you with minimal code/text to guide you through, but you should be able to complete it all on your own using the course resources, previous tutorials/exercises, previous papers, and previous lectures!

Once you complete this exercise, you will be well on your way to becoming a pro computational modeler!

You can download the data from GitHub or by using the following code.

# import relevant modules
from scipy.optimize import minimize # finding optimal params in models
from scipy import stats             # statistical tools
import os                           # operating system tools
import numpy as np                  # matrix/array functions
import pandas as pd                 # loading and manipulating data
import matplotlib.pyplot as plt     # plotting
%matplotlib inline

np.random.seed(2021)                # set seed for reproducibility

# this function will load the data into memory 
def load_subjects():
    '''
    input: n/a
    output: dictionary of DataFrames containing the data
    '''
    urls = [f'https://raw.githubusercontent.com/shawnrhoads/gu-psyc-347/master/docs/static/data/{x:02}_psyc-347-prosocial-learning.csv' for x in range(1,12)]
        
    subject_data = {}
    for index, file in enumerate(urls):
        df = pd.read_csv(file, index_col='subject')
        subject_data[index] = df_filtered = df[df['outcomeDescr'] != 'practice'][['block',
                                                                                  'trial_num',
                                                                                  'true_accuracy',
                                                                                  'outcome',
                                                                                  'outcomeDescr',
                                                                                  'cumulativePts_self',
                                                                                  'cumulativePts_social']]
        
    return subject_data

# display data from first subject
subject_data = load_subjects()

Data¶

4 conditions/blocks¶

self, win
self, avoid loss
social, win
social, avoid loss

Each block contains 24 trials¶

2 stimuli per trial (counterbalanced wrt side)
outcomes associated with each stimulus is probabilistic (75%, 25%)

Data output key¶

block: (play for self to win); (play for self to avoid loss); (play for next student to win); (play for next student to avoid loss)
trial_num: order in which trials are displayed within a block (0-23)
true_accuracy: 1 if selected stimulus with highest probability of winning or avoid losing
outcome: outcome for trial (+1, 0, -1); blank if missed
outcomeDescr: text description of outcome (practice, social win, social avoid win, social loss, social avoid loss, self win, self avoid win, self loss, self avoid loss)
cumulativePts_self: running total of points for self
cumulativePts_social: running total of points for social

# view data from first subject
display(subject_data[0])

	block	trial_num	true_accuracy	outcome	outcomeDescr	cumulativePts_self	cumulativePts_social
subject
1.0	social_win	0	0.0	0.0	social avoid win	1000	1000
1.0	social_win	1	1.0	1.0	social win	1000	1100
1.0	social_win	2	1.0	1.0	social win	1000	1200
1.0	social_win	3	1.0	1.0	social win	1000	1300
1.0	social_win	4	1.0	0.0	social avoid win	1000	1300
...	...	...	...	...	...	...	...
1.0	self_avoidloss	19	1.0	0.0	self avoid loss	2100	1300
1.0	self_avoidloss	20	1.0	0.0	self avoid loss	2100	1300
1.0	self_avoidloss	21	1.0	0.0	self avoid loss	2100	1300
1.0	self_avoidloss	22	1.0	-1.0	self lose	2000	1300
1.0	self_avoidloss	23	0.0	-1.0	self lose	1900	1300

96 rows × 7 columns

# we will need this function to grab indices of specific elements 
def get_index_positions(list_of_elems, element):
    ''' Returns the indexes of all occurrences of given element in
    the list- listOfElements '''
    index_pos_list = []
    index_pos = 0
    while True:
        try:
            # Search for item in list from indexPos to the end of list
            index_pos = list_of_elems.index(element, index_pos)
            # Add the index position in list
            index_pos_list.append(index_pos)
            index_pos += 1
        except ValueError as e:
            break
            
    return index_pos_list

Specifying plausible models¶

Below we will begin to specify some plausible models to characterize behavior during the task. Typically, each of these models can be written as a class objects (see Python Classes) with methods (e.g., model.fit()), but we will use a more-procedural approach for clarity during these exercises.

This first model is a simple Rescorla-Wagner Model that ignores information that is socially-relevant (e.g., deciding self or other) or outcome-relevant (e.g., positive or negative outcomes).

# Completed Model 1
def Simple_1a1t(params, choices, outcomes, block, plot=False):
    '''
    Inputs:
        - params: list of 2 guesses (float) for each parameter (alpha, theta)
        - choices: list of 96 choices (int) on each trial (0, 1)
        - outcomes: list of 96 outcomes (int) on each trial (-1, 0, 1)
        - block: list of 96 conditions (string) on each trial (self_win, self_avoidloss, social_win, social_avoidloss)
    Outputs:
        - negLL: negative loglikelihood computed
                 from the choice probabilities (float)
    '''
    alpha, theta = params 
    
    if np.isnan(alpha) or np.isnan(theta): # check inputs
        return np.inf
    else:
        blocks = list(block)

        # extracts list of four strings corresponding to conditions
        unique_conditions = list(set(block))

        # init choice probs
        choiceProb = np.zeros((len(blocks)), dtype = float) 
        Q_out = {}

        count = 0
        for condition in unique_conditions:

            T_temp = blocks.count(condition)    
            Q = [0.5, 0.5] # Q at trial 0
            Q_stored = np.zeros((2, T_temp), dtype = float) 

            cur_indices = get_index_positions(blocks, condition)
            c = np.array(choices)[cur_indices]
            r = np.array(outcomes)[cur_indices]

            # check if self vs social
            if 'self' in condition:
                # check if win vs avoid loss
                if 'win' in condition:
                    r = r
                elif 'avoidloss' in condition:
                    r = np.array([x+1 for x in r])                
            elif 'social' in condition:
                # check if win vs avoid loss
                if 'win' in condition:
                    r = r
                elif 'avoidloss' in condition:
                    r = np.array([x+1 for x in r])
            
            # loop through trials within condition
            for t in range(T_temp):       

                if np.isnan(c[t]):
                    # don't update if nan
                    choiceProb[count] = np.nan
                    Q_stored[:,t] = Q

                else:
                    # compute choice probabilities for k=2
                    # use the softmax rule
                    ev = np.exp(theta*np.array(Q))
                    sum_ev = np.sum(ev)
                    p = ev / sum_ev

                    # compute choice probability for actual choice
                    choiceProb[count] = p[int(c[t])]

                    # update values
                    delta = r[t] - Q[int(c[t])]
                    Q[int(c[t])] = Q[int(c[t])] + alpha * delta

                    # store Q_t+1
                    Q_stored[:,t] = Q

                count += 1

            Q_out[condition] = Q_stored
            
        negLL = -np.nansum(np.log(choiceProb))
        
        if plot: #plot mean across 4 blocks
            Q0 = np.mean(np.stack((Q_out['self_win'][0], Q_out['social_win'][0], Q_out['self_avoidloss'][0], Q_out['social_avoidloss'][0]),axis=0),axis=0)
            Q1 = np.mean(np.stack((Q_out['self_win'][1], Q_out['social_win'][1], Q_out['self_avoidloss'][1], Q_out['social_avoidloss'][1]),axis=0),axis=0)

            plt.plot(range(T_temp),Q0)
            plt.plot(range(T_temp),Q1)
            plt.title('Mean Q across conditions')
            plt.xlabel('trial')
            plt.ylabel('Q')
            plt.show()
        
        return negLL

Let’s test this function using a guess for alpha and theta for one subject.

# use one subject (feel free to change)
behavior = subject_data[9] 

# guess for alpha and theta
params = [.12, 2.11] 

# specify subject data
choices = behavior.true_accuracy
outcomes = behavior.outcome
block = behavior.block 

# compute negative log likelihood given
# data and guessed parameters
subj_negll = Simple_1a1t(params, choices, outcomes, block, plot=True)

../_images/module-04-01_Prosocial-RL-Exercises-Solutions_15_0.png

Now, let’s put this into a for-loop to minimize the negative log likelihood across all subjects.

#initialize dataframe to store results
df1 = pd.DataFrame(index=range(11), columns=['alpha','theta','NLL'])

# initialize list of algorithms 
# (use one during class, but you can try a few others on your own)
algorithms = ['L-BFGS-B'] #['Powell','TNC','SLSQP','trust-constr']

# loop through subjects
for index, behavior in enumerate(subject_data.values()):
    c, o, block = behavior.true_accuracy, behavior.outcome, behavior.block
    bounds = ((0,1),(0,12))

    # gradient descent to minimize neg LL
    res_nll = np.inf # set initial neg LL to be inf

    # guess several different starting points for alpha
    for alpha_guess in np.linspace(0,1,3):
        for theta_guess in np.linspace(1,12,4):

            # guesses for alpha, theta will change on each loop
            init_guess = (alpha_guess, theta_guess)

            for algorithm in algorithms:

                # minimize neg LL
                result = minimize(Simple_1a1t,
                                  init_guess,
                                  (c, o, block),
                                  bounds=bounds,
                                  method=algorithm)

                # if current negLL is smaller than the last negLL,
                # then store current data
                if result.fun < res_nll and result.success:
                    res_nll = result.fun
                    param_fits = result.x

    # also, compute BIC
    BIC = 2 * res_nll + len(init_guess) * np.log(len(c))

    #store in dataframe
    df1.at[index, 'alpha'] = param_fits[0]
    df1.at[index, 'theta'] = param_fits[1]
    df1.at[index, 'NLL'] = res_nll
    df1.at[index, 'BIC'] = BIC

    # print/plot Q values for subject   
    print(fr'subject {index+1:02}: alpha={param_fits[0]:.2f}, theta={param_fits[1]:.2f}; negLL={res_nll:.2f}; BIC={BIC:.2f}')
    nll = Simple_1a1t(param_fits, c, o, block, plot=True)

subject 01: alpha=0.91, theta=3.13; negLL=41.24; BIC=91.60

../_images/module-04-01_Prosocial-RL-Exercises-Solutions_17_1.png

subject 02: alpha=0.03, theta=12.00; negLL=51.42; BIC=111.97

../_images/module-04-01_Prosocial-RL-Exercises-Solutions_17_3.png

subject 03: alpha=0.06, theta=12.00; negLL=41.13; BIC=91.39

../_images/module-04-01_Prosocial-RL-Exercises-Solutions_17_5.png

subject 04: alpha=0.00, theta=1.00; negLL=66.54; BIC=142.21

../_images/module-04-01_Prosocial-RL-Exercises-Solutions_17_7.png

subject 05: alpha=0.23, theta=7.15; negLL=35.25; BIC=79.63

../_images/module-04-01_Prosocial-RL-Exercises-Solutions_17_9.png

subject 06: alpha=0.44, theta=5.81; negLL=28.37; BIC=65.87

../_images/module-04-01_Prosocial-RL-Exercises-Solutions_17_11.png

subject 07: alpha=0.13, theta=12.00; negLL=19.16; BIC=47.44

../_images/module-04-01_Prosocial-RL-Exercises-Solutions_17_13.png

subject 08: alpha=0.16, theta=5.38; negLL=48.77; BIC=106.68

../_images/module-04-01_Prosocial-RL-Exercises-Solutions_17_15.png

subject 09: alpha=0.31, theta=7.58; negLL=28.84; BIC=66.80

../_images/module-04-01_Prosocial-RL-Exercises-Solutions_17_17.png

subject 10: alpha=0.31, theta=11.00; negLL=18.06; BIC=45.26

../_images/module-04-01_Prosocial-RL-Exercises-Solutions_17_19.png

subject 11: alpha=0.25, theta=12.00; negLL=12.57; BIC=34.26

../_images/module-04-01_Prosocial-RL-Exercises-Solutions_17_21.png

At a quick glance at the negative log likelihood and BIC values, we can see that this model performs better for some subjects relative to others. Some subjects even perform so poorly that we can’t estimate parameters for them. We can see this all at once in the DataFrame we created to store the outputs.

display(df1)

	alpha	theta	NLL	BIC
0	0.910384	3.125691	41.235016	91.598729
1	0.034217	12.0	51.419038	111.966772
2	0.063137	12.0	41.131208	91.391112
3	0.0	1.0	66.542129	142.212955
4	0.234915	7.149005	35.249369	79.627435
5	0.438106	5.812434	28.370059	65.868814
6	0.129681	12.0	19.155706	47.440108
7	0.163722	5.379728	48.774263	106.677222
8	0.30879	7.577299	28.83555	66.799797
9	0.305454	11.001908	18.063611	45.255917
10	0.248679	12.0	12.56593	34.260556

Let’s try a different model…

Our task is very similar to the task used by Lockwood et al. (2016). Think about how the authors modeled human social behavior in their task!

One variation included learning rate parameters (e.g., alpha) that indexed sensitivity to prediction errors for both (1) self (e.g., alpha_self) and (2) other (e.g., alpha_other). How about we try that for our data?

def Social_2a1t(params, choices, outcomes, block, plot=False):
    # 1 alpha_self + 1 alpha_other + 1 theta
    '''
    Inputs:
        - params: list of 3 guesses (float) for each parameter (alpha_self, alpha_other, theta)
        - choices: list of 96 choices (int) on each trial (0, 1)
        - outcomes: list of 96 outcomes (int) on each trial (-1, 0, 1)
        - block: list of 96 conditions (string) on each trial (self_win, self_avoidloss, social_win, social_avoidloss)
    Outputs:
        - negLL: negative loglikelihood computed
                 from the choice probabilities (float)
    '''
    alpha_self, alpha_other, theta = params 
    
    # check inputs
    if np.isnan(alpha_self) or np.isnan(alpha_other) or np.isnan(theta): 
        return np.inf
    else:
        blocks = list(block)

        # extracts list of four strings corresponding to conditions
        unique_conditions = list(set(block))

        # init choice probs
        choiceProb = np.zeros((len(blocks)), dtype = float) 
        Q_out = {}

        count = 0
        for condition in unique_conditions:

            T_temp = blocks.count(condition)    
            Q = [0.5, 0.5] # Q at trial 0
            Q_stored = np.zeros((2, T_temp), dtype = float) 

            cur_indices = get_index_positions(blocks, condition)
            c = np.array(choices)[cur_indices]
            r = np.array(outcomes)[cur_indices]

            # check if self vs social
            #    Note: this bulky if-statement
            #    is intentional to make it clear
            #    what we are doing
            if 'self' in condition:
                # check if win vs avoid loss
                if 'win' in condition:
                    r = r
                elif 'avoidloss' in condition:
                    r = np.array([x+1 for x in r]) 
                
            elif 'social' in condition:
                # check if win vs avoid loss
                if 'win' in condition:
                    r = r
                elif 'avoidloss' in condition:
                    r = np.array([x+1 for x in r])
            
            # loop through trials within condition
            for t in range(T_temp):       

                if np.isnan(c[t]):
                    # don't update if nan 
                    choiceProb[count] = np.nan
                    Q_stored[:,t] = Q

                else:
                    # compute choice probabilities for k=2
                    # use the softmax rule
                    ev = np.exp(theta*np.array(Q))
                    sum_ev = np.sum(ev)
                    p = ev / sum_ev

                    # compute choice probability for actual choice
                    choiceProb[count] = p[int(c[t])]

                    # update values
                    delta = r[t] - Q[int(c[t])]
                    
                    if 'self' in condition:
                        Q[int(c[t])] = Q[int(c[t])] + alpha_self * delta
                    elif 'social' in condition:
                        Q[int(c[t])] = Q[int(c[t])] + alpha_other * delta

                    # store Q_t+1
                    Q_stored[:,t] = Q

                count += 1

            Q_out[condition] = Q_stored
            
        negLL = -np.nansum(np.log(choiceProb))
        
        if plot: #plot mean across 4 blocks
            Q0 = np.mean(np.stack((Q_out['self_win'][0], Q_out['social_win'][0], Q_out['self_avoidloss'][0], Q_out['social_avoidloss'][0]),axis=0),axis=0)
            Q1 = np.mean(np.stack((Q_out['self_win'][1], Q_out['social_win'][1], Q_out['self_avoidloss'][1], Q_out['social_avoidloss'][1]),axis=0),axis=0)

            plt.plot(range(T_temp),Q0)
            plt.plot(range(T_temp),Q1)
            plt.title('Mean Q across conditions')
            plt.xlabel('trial')
            plt.ylabel('Q')
            plt.show()
        
        return negLL

Now, let’s put this into a for-loop to minimize the negative log likelihood across all subjects.

#initialize dataframe to store results
df2 = pd.DataFrame(index=range(11), columns=['alpha','theta','NLL'])

# initialize list of algorithms 
# (use one during class, but you can try a few others on your own)
algorithms = ['L-BFGS-B'] #['Powell','TNC','SLSQP','trust-constr']

# loop through subjects
for index, behavior in enumerate(subject_data.values()):
    c, o, block = behavior.true_accuracy, behavior.outcome, behavior.block
    bounds = ((0,1),(0,1),(0,12))

    # gradient descent to minimize neg LL
    res_nll = np.inf # set initial neg LL to be inf

    # guess several different starting points for alpha
    for alpha_self_guess in np.linspace(0,1,3):
        for alpha_other_guess in np.linspace(0,1,3):
            for theta_guess in np.linspace(1,12,4):

                # guesses for alpha, theta will change on each loop
                init_guess = (alpha_self_guess, alpha_other_guess, theta_guess)

                for algorithm in algorithms:

                    # minimize neg LL
                    result = minimize(Social_2a1t,
                                      init_guess,
                                      (c, o, block),
                                      bounds=bounds,
                                      method=algorithm)

                    # if current negLL is smaller than the last negLL,
                    # then store current data
                    if result.fun < res_nll and result.success:
                        res_nll = result.fun
                        param_fits = result.x

    # also, compute BIC
    BIC = 2 * res_nll + len(init_guess) * np.log(len(c))

    #store in dataframe
    df2.at[index, 'alpha'] = param_fits[0]
    df2.at[index, 'theta'] = param_fits[1]
    df2.at[index, 'NLL'] = res_nll
    df2.at[index, 'BIC'] = BIC

    # print/plot Q values for subject   
    print(fr'subject {index+1:02}: alpha_self={param_fits[0]:.2f}, alpha_other={param_fits[1]:.2f}, theta={param_fits[2]:.2f}; negLL={res_nll:.2f}; BIC={BIC:.2f}')
    nll = Social_2a1t(param_fits, c, o, block, plot=True)

subject 01: alpha_self=0.76, alpha_other=1.00, theta=3.30; negLL=40.68; BIC=95.06

../_images/module-04-01_Prosocial-RL-Exercises-Solutions_24_1.png

subject 02: alpha_self=0.08, alpha_other=0.02, theta=12.00; negLL=46.42; BIC=106.53

../_images/module-04-01_Prosocial-RL-Exercises-Solutions_24_3.png

subject 03: alpha_self=0.11, alpha_other=0.04, theta=12.00; negLL=38.84; BIC=91.38

../_images/module-04-01_Prosocial-RL-Exercises-Solutions_24_5.png

subject 04: alpha_self=0.00, alpha_other=0.00, theta=12.00; negLL=66.41; BIC=146.51

../_images/module-04-01_Prosocial-RL-Exercises-Solutions_24_7.png

subject 05: alpha_self=0.26, alpha_other=0.18, theta=7.69; negLL=35.02; BIC=83.73

../_images/module-04-01_Prosocial-RL-Exercises-Solutions_24_9.png

subject 06: alpha_self=0.30, alpha_other=0.65, theta=6.24; negLL=27.35; BIC=68.39

../_images/module-04-01_Prosocial-RL-Exercises-Solutions_24_11.png

subject 07: alpha_self=0.14, alpha_other=0.12, theta=12.00; negLL=19.08; BIC=51.85

../_images/module-04-01_Prosocial-RL-Exercises-Solutions_24_13.png

subject 08: alpha_self=0.16, alpha_other=0.69, theta=3.99; negLL=47.11; BIC=107.91

../_images/module-04-01_Prosocial-RL-Exercises-Solutions_24_15.png

subject 09: alpha_self=0.42, alpha_other=0.24, theta=7.91; negLL=28.21; BIC=70.12

../_images/module-04-01_Prosocial-RL-Exercises-Solutions_24_17.png

subject 10: alpha_self=0.29, alpha_other=0.80, theta=12.00; negLL=17.71; BIC=49.11

../_images/module-04-01_Prosocial-RL-Exercises-Solutions_24_19.png

subject 11: alpha_self=0.33, alpha_other=0.20, theta=12.00; negLL=12.02; BIC=37.74

../_images/module-04-01_Prosocial-RL-Exercises-Solutions_24_21.png

display(df2)

	alpha	theta	NLL	BIC
0	0.760689	1.0	40.681883	95.056810
1	0.081382	0.015783	46.417372	106.527788
2	0.107811	0.037979	38.844495	91.382034
3	0.0	0.003832	66.406572	146.506188
4	0.260908	0.175293	35.018243	83.729531
5	0.296518	0.651074	27.348106	68.389257
6	0.139729	0.11758	19.077929	51.848903
7	0.160885	0.688449	47.106963	107.906970
8	0.424338	0.242731	28.212399	70.117842
9	0.285959	0.801353	17.710507	49.114059
10	0.326473	0.201148	12.022076	37.737196

What do we think? At a glance, do we think this model was better at characterizing the behavior?

Fitting your own models!¶

Now, we will split into our groups (3-4 students per group) to brainstorm other plausible models and then collaboratively write code to fit the data to these new models!

Here are some hints/ideas:

Think Lockwood et al. (2016)! The authors had more than one learning rate. What else did they include in their best-fitting model? Why?
What else is different about this task? Do participants always “play to win” like they do in Lockwood et al. (2016)? What other conditions are there?

# Model 3(12 out of 30 points)
def Valence_2a1t(params, choices, outcomes, block, plot=False):
    # 1 alpha_positive + 1 alpha_negative + 1 theta
    '''
    Inputs:
        - params: list of 3 guesses (float) for each parameter (alpha, theta)
        - choices: list of 96 choices (int) on each trial (0, 1)
        - outcomes: list of 96 outcomes (int) on each trial (-1, 0, 1)
        - block: list of 96 conditions (string) on each trial (self_win, self_avoidloss, social_win, social_avoidloss)
    Outputs:
        - negLL: negative loglikelihood computed
                 from the choice probabilities (float)
    '''
    alpha_positive, alpha_negative, theta = params 
    # check inputs
    if np.isnan(alpha_positive) or np.isnan(alpha_negative) or np.isnan(theta): 
        return np.inf
    else:
        blocks = list(block)

        # extracts list of four strings corresponding to conditions
        unique_conditions = list(set(block))

        # init choice probs
        choiceProb = np.zeros((len(blocks)), dtype = float) 
        Q_out = {}

        count = 0
        for condition in unique_conditions:

            T_temp = blocks.count(condition)    
            Q = [0.5, 0.5] # Q at trial 0
            Q_stored = np.zeros((2, T_temp), dtype = float) 

            cur_indices = get_index_positions(blocks, condition)
            c = np.array(choices)[cur_indices]
            r = np.array(outcomes)[cur_indices]

            # check if positive vs negative
            #    Note: this bulky if-statement
            #    is intentional to make it clear
            #    what we are doing
            if 'self' in condition:
                # check if win vs avoid loss
                if 'win' in condition:
                    r = r
                elif 'avoidloss' in condition:
                    r = np.array([x+1 for x in r]) 
                
            elif 'social' in condition:
                # check if win vs avoid loss
                if 'win' in condition:
                    r = r
                elif 'avoidloss' in condition:
                    r = np.array([x+1 for x in r])
            
            # loop through trials within condition
            for t in range(T_temp):       

                if np.isnan(c[t]):
                    # don't update if nan 
                    choiceProb[count] = np.nan
                    Q_stored[:,t] = Q

                else:
                    # compute choice probabilities for k=2
                    # use the softmax rule
                    ev = np.exp(theta*np.array(Q))
                    sum_ev = np.sum(ev)
                    p = ev / sum_ev

                    # compute choice probability for actual choice
                    choiceProb[count] = p[int(c[t])]

                    # update values
                    delta = r[t] - Q[int(c[t])]
                    
                    if 'win' in condition:
                        Q[int(c[t])] = Q[int(c[t])] + alpha_positive * delta
                    elif 'avoidloss' in condition:
                        Q[int(c[t])] = Q[int(c[t])] + alpha_negative * delta

                    # store Q_t+1
                    Q_stored[:,t] = Q

                count += 1

            Q_out[condition] = Q_stored
            
        negLL = -np.nansum(np.log(choiceProb))
        
        if plot: #plot mean across 4 blocks
            Q0 = np.mean(np.stack((Q_out['self_win'][0], Q_out['social_win'][0], Q_out['self_avoidloss'][0], Q_out['social_avoidloss'][0]),axis=0),axis=0)
            Q1 = np.mean(np.stack((Q_out['self_win'][1], Q_out['social_win'][1], Q_out['self_avoidloss'][1], Q_out['social_avoidloss'][1]),axis=0),axis=0)

            plt.plot(range(T_temp),Q0)
            plt.plot(range(T_temp),Q1)
            plt.title('Mean Q across conditions')
            plt.xlabel('trial')
            plt.ylabel('Q')
            plt.show()
        
        return negLL

#initialize dataframe to store results
df3 = pd.DataFrame(index=range(11), columns=['alpha_pos','alpha_neg','theta','NLL','BIC'])

# initialize list of algorithms 
# (use one during class, but you can try a few others on your own)
algorithms = ['L-BFGS-B'] #['Powell','TNC','SLSQP','trust-constr']

# loop through subjects
for index, behavior in enumerate(subject_data.values()):
    c, o, block = behavior.true_accuracy, behavior.outcome, behavior.block
    bounds = ((0,1),(0,1),(0,12))

    # gradient descent to minimize neg LL
    res_nll = np.inf # set initial neg LL to be inf

    # guess several different starting points for alpha
    for alpha_pos_guess in np.linspace(0,1,3):
        for alpha_neg_guess in np.linspace(0,1,3):
            for theta_guess in np.linspace(1,12,4):

                # guesses for alpha, theta will change on each loop
                init_guess = (alpha_pos_guess, alpha_neg_guess, theta_guess)

                for algorithm in algorithms:

                    # minimize neg LL
                    result = minimize(Valence_2a1t,
                                      init_guess,
                                      (c, o, block),
                                      bounds=bounds,
                                      method=algorithm)

                    # if current negLL is smaller than the last negLL,
                    # then store current data
                    if result.fun < res_nll and result.success:
                        res_nll = result.fun
                        param_fits = result.x

    # also, compute BIC
    BIC = 2 * res_nll + len(init_guess) * np.log(len(c))

    #store in dataframe
    df3.at[index, 'alpha_pos'] = param_fits[0]
    df3.at[index, 'alpha_neg'] = param_fits[1]
    df3.at[index, 'theta'] = param_fits[2]
    df3.at[index, 'NLL'] = res_nll
    df3.at[index, 'BIC'] = BIC

    # print/plot Q values for subject   
    print(fr'subject {index+1:02}: alpha_pos={param_fits[0]:.2f}, alpha_neg={param_fits[1]:.2f}, theta={param_fits[2]:.2f}; negLL={res_nll:.2f}; BIC={BIC:.2f}')
    nll = Valence_2a1t(param_fits, c, o, block, plot=True)

subject 01: alpha_pos=0.65, alpha_neg=1.00, theta=3.63; negLL=40.13; BIC=93.96

../_images/module-04-01_Prosocial-RL-Exercises-Solutions_30_1.png

subject 02: alpha_pos=0.04, alpha_neg=0.03, theta=12.00; negLL=51.40; BIC=116.50

../_images/module-04-01_Prosocial-RL-Exercises-Solutions_30_3.png

subject 03: alpha_pos=0.14, alpha_neg=0.03, theta=12.00; negLL=36.22; BIC=86.14

../_images/module-04-01_Prosocial-RL-Exercises-Solutions_30_5.png

subject 04: alpha_pos=0.00, alpha_neg=0.00, theta=1.00; negLL=66.54; BIC=146.78

../_images/module-04-01_Prosocial-RL-Exercises-Solutions_30_7.png

subject 05: alpha_pos=0.21, alpha_neg=0.39, theta=6.86; negLL=34.70; BIC=83.10

../_images/module-04-01_Prosocial-RL-Exercises-Solutions_30_9.png

subject 06: alpha_pos=0.25, alpha_neg=0.77, theta=7.09; negLL=26.24; BIC=66.17

../_images/module-04-01_Prosocial-RL-Exercises-Solutions_30_11.png

subject 07: alpha_pos=0.14, alpha_neg=0.12, theta=12.00; negLL=19.10; BIC=51.89

../_images/module-04-01_Prosocial-RL-Exercises-Solutions_30_13.png

subject 08: alpha_pos=1.00, alpha_neg=0.22, theta=3.18; negLL=46.58; BIC=106.85

../_images/module-04-01_Prosocial-RL-Exercises-Solutions_30_15.png

subject 09: alpha_pos=0.19, alpha_neg=0.43, theta=8.11; negLL=27.76; BIC=69.20

../_images/module-04-01_Prosocial-RL-Exercises-Solutions_30_17.png

subject 10: alpha_pos=0.24, alpha_neg=0.53, theta=11.22; negLL=17.33; BIC=48.35

../_images/module-04-01_Prosocial-RL-Exercises-Solutions_30_19.png

subject 11: alpha_pos=0.26, alpha_neg=0.24, theta=12.00; negLL=12.55; BIC=38.79

../_images/module-04-01_Prosocial-RL-Exercises-Solutions_30_21.png

display(df3)

	alpha_pos	alpha_neg	theta	NLL	BIC
0	0.648875	1.0	3.633796	40.134944	93.962933
1	0.035713	0.032702	12.0	51.401781	116.496607
2	0.144034	0.032195	12.0	36.224567	86.142178
3	0.0	0.0	1.0	66.542129	146.777303
4	0.208246	0.391592	6.86147	34.704541	83.102127
5	0.2497	0.76605	7.090218	26.236052	66.165148
6	0.139313	0.119977	12.0	19.098325	51.889695
7	1.0	0.224598	3.181657	46.578693	106.85043
8	0.185241	0.430937	8.112291	27.755787	69.204618
9	0.240672	0.533452	11.220255	17.326881	48.346807
10	0.264419	0.240717	12.0	12.548372	38.789789

# Model 4 (12 out of 30 points)
def Social_2a2t(params, choices, outcomes, block, plot=False):
    # 1 alpha_self + 1 alpha_other + 1 theta_self + 1 theta_other
    '''
    Inputs:
        - params: list of 4 guesses (float) for each parameter 
                (alpha_self, alpha_other, theta_self, theta_other)
        - choices: list of 96 choices (int) on each trial (0, 1)
        - outcomes: list of 96 outcomes (int) on each trial (-1, 0, 1)
        - block: list of 96 conditions (string) on each trial (self_win, self_avoidloss, social_win, social_avoidloss)
    Outputs:
        - negLL: negative loglikelihood computed
                 from the choice probabilities (float)
    '''
    alpha_self, alpha_other, theta_self, theta_other = params 
    
    # check inputs
    if np.isnan(alpha_self) or np.isnan(alpha_other) or np.isnan(theta_self) or np.isnan(theta_other): 
        return np.inf
    else:
        blocks = list(block)

        # extracts list of four strings corresponding to conditions
        unique_conditions = list(set(block))

        # init choice probs
        choiceProb = np.zeros((len(blocks)), dtype = float) 
        Q_out = {}

        count = 0
        for condition in unique_conditions:

            T_temp = blocks.count(condition)    
            Q = [0.5, 0.5] # Q at trial 0
            Q_stored = np.zeros((2, T_temp), dtype = float) 

            cur_indices = get_index_positions(blocks, condition)
            c = np.array(choices)[cur_indices]
            r = np.array(outcomes)[cur_indices]

            # check if self vs social
            #    Note: this bulky if-statement
            #    is intentional to make it clear
            #    what we are doing
            if 'self' in condition:
                # check if win vs avoid loss
                if 'win' in condition:
                    r = r
                elif 'avoidloss' in condition:
                    r = np.array([x+1 for x in r]) 
                
            elif 'social' in condition:
                # check if win vs avoid loss
                if 'win' in condition:
                    r = r
                elif 'avoidloss' in condition:
                    r = np.array([x+1 for x in r])
            
            # loop through trials within condition
            for t in range(T_temp):       

                if np.isnan(c[t]):
                    # don't update if nan
                    choiceProb[count] = np.nan
                    Q_stored[:,t] = Q

                else:
                    # compute choice probabilities for k=2
                    # use the softmax rule
                    if 'self' in condition:
                        ev = np.exp(theta_self*np.array(Q))
                    elif 'social' in condition:
                        ev = np.exp(theta_other*np.array(Q))
                    
                    sum_ev = np.sum(ev)
                    p = ev / sum_ev

                    # compute choice probability for actual choice
                    choiceProb[count] = p[int(c[t])]

                    # update values
                    delta = r[t] - Q[int(c[t])]
                    
                    if 'self' in condition:
                        Q[int(c[t])] = Q[int(c[t])] + alpha_self * delta
                    elif 'social' in condition:
                        Q[int(c[t])] = Q[int(c[t])] + alpha_other * delta

                    # store Q_t+1
                    Q_stored[:,t] = Q

                count += 1

            Q_out[condition] = Q_stored
            
        negLL = -np.nansum(np.log(choiceProb))
        
        if plot: #plot mean across 4 blocks
            Q0 = np.mean(np.stack((Q_out['self_win'][0], Q_out['social_win'][0], Q_out['self_avoidloss'][0], Q_out['social_avoidloss'][0]),axis=0),axis=0)
            Q1 = np.mean(np.stack((Q_out['self_win'][1], Q_out['social_win'][1], Q_out['self_avoidloss'][1], Q_out['social_avoidloss'][1]),axis=0),axis=0)

            plt.plot(range(T_temp),Q0)
            plt.plot(range(T_temp),Q1)
            plt.title('Mean Q across conditions')
            plt.xlabel('trial')
            plt.ylabel('Q')
            plt.show()
        
        return negLL

#initialize dataframe to store results
df4 = pd.DataFrame(index=range(11), columns=['alpha_self','alpha_other','theta_self','theta_other','NLL','BIC'])

# initialize list of algorithms 
# (use one during class, but you can try a few others on your own)
algorithms = ['L-BFGS-B'] #['Powell','TNC','SLSQP','trust-constr']

# loop through subjects
for index, behavior in enumerate(subject_data.values()):
    c, o, block = behavior.true_accuracy, behavior.outcome, behavior.block
    bounds = ((0,1),(0,1),(0,12),(0,12))

    # gradient descent to minimize neg LL
    res_nll = np.inf # set initial neg LL to be inf

    # guess several different starting points for alpha
    for alpha_self_guess in np.linspace(0,1,3):
        for alpha_other_guess in np.linspace(0,1,3):
            for theta_self_guess in np.linspace(1,12,3):
                for theta_other_guess in np.linspace(1,12,3):

                    # guesses for alpha, theta will change on each loop
                    init_guess = (alpha_self_guess, alpha_other_guess, theta_self_guess, theta_other_guess)

                    for algorithm in algorithms:

                        # minimize neg LL
                        result = minimize(Social_2a2t,
                                          init_guess,
                                          (c, o, block),
                                          bounds=bounds,
                                          method=algorithm)

                        # if current negLL is smaller than the last negLL,
                        # then store current data
                        if result.fun < res_nll and result.success:
                            res_nll = result.fun
                            param_fits = result.x

    # also, compute BIC
    BIC = 2 * res_nll + len(init_guess) * np.log(len(c))

    #store in dataframe
    df4.at[index, 'alpha_self'] = param_fits[0]
    df4.at[index, 'alpha_other'] = param_fits[1]
    df4.at[index, 'theta_self'] = param_fits[2]
    df4.at[index, 'theta_other'] = param_fits[3]
    df4.at[index, 'NLL'] = res_nll
    df4.at[index, 'BIC'] = BIC

    # print/plot Q values for subject   
    print(fr'subject {index+1:02}: alpha_self={param_fits[0]:.2f}, alpha_other={param_fits[1]:.2f}, theta_self={param_fits[2]:.2f}, theta_other={param_fits[3]:.2f}; negLL={res_nll:.2f}; BIC={BIC:.2f}')
    nll = Social_2a2t(param_fits, c, o, block, plot=True)

subject 01: alpha_self=0.64, alpha_other=1.00, theta_self=4.72, theta_other=2.61; negLL=39.76; BIC=97.78

../_images/module-04-01_Prosocial-RL-Exercises-Solutions_33_1.png

subject 02: alpha_self=0.08, alpha_other=0.02, theta_self=12.00, theta_other=12.00; negLL=46.42; BIC=111.09

../_images/module-04-01_Prosocial-RL-Exercises-Solutions_33_3.png

subject 03: alpha_self=0.11, alpha_other=0.46, theta_self=12.00, theta_other=3.21; negLL=37.07; BIC=92.40

../_images/module-04-01_Prosocial-RL-Exercises-Solutions_33_5.png

subject 04: alpha_self=0.00, alpha_other=0.00, theta_self=12.00, theta_other=12.00; negLL=66.41; BIC=151.07

../_images/module-04-01_Prosocial-RL-Exercises-Solutions_33_7.png

subject 05: alpha_self=0.26, alpha_other=0.20, theta_self=8.13, theta_other=6.87; negLL=34.96; BIC=88.19

../_images/module-04-01_Prosocial-RL-Exercises-Solutions_33_9.png

subject 06: alpha_self=0.31, alpha_other=0.65, theta_self=5.98, theta_other=6.47; negLL=27.33; BIC=72.93

../_images/module-04-01_Prosocial-RL-Exercises-Solutions_33_11.png

subject 07: alpha_self=0.14, alpha_other=0.12, theta_self=12.00, theta_other=12.00; negLL=19.08; BIC=56.41

../_images/module-04-01_Prosocial-RL-Exercises-Solutions_33_13.png

subject 08: alpha_self=0.04, alpha_other=0.78, theta_self=12.00, theta_other=3.13; negLL=45.80; BIC=109.86

../_images/module-04-01_Prosocial-RL-Exercises-Solutions_33_15.png

subject 09: alpha_self=0.47, alpha_other=0.19, theta_self=5.21, theta_other=12.00; negLL=24.89; BIC=68.03

../_images/module-04-01_Prosocial-RL-Exercises-Solutions_33_17.png

subject 10: alpha_self=0.29, alpha_other=0.56, theta_self=12.00, theta_other=8.34; negLL=17.49; BIC=53.24

../_images/module-04-01_Prosocial-RL-Exercises-Solutions_33_19.png

subject 11: alpha_self=0.33, alpha_other=0.20, theta_self=12.00, theta_other=12.00; negLL=12.02; BIC=42.30

../_images/module-04-01_Prosocial-RL-Exercises-Solutions_33_21.png

display(df4)

	alpha_self	alpha_other	theta_self	theta_other	NLL	BIC
0	0.64041	1.0	4.719063	2.606713	39.75991	97.777213
1	0.081382	0.015783	12.0	12.0	46.417372	111.092136
2	0.107811	0.462793	12.0	3.211915	37.072458	92.402308
3	0.0	0.003832	12.0	12.0	66.406572	151.070536
4	0.256449	0.204543	8.127819	6.870225	34.964606	88.186605
5	0.310434	0.651042	5.98393	6.468159	27.334393	72.926178
6	0.139729	0.11758	12.0	12.0	19.077929	56.413251
7	0.042555	0.779956	12.0	3.132528	45.800017	109.857427
8	0.474939	0.187772	5.210771	12.0	24.885691	68.028775
9	0.285959	0.559895	12.0	8.33617	17.489508	53.236408
10	0.326473	0.201148	12.0	12.0	12.022076	42.301544

# Bonus Model 5 (not required for exercise)
def SocialValence_4a1t(params, choices, outcomes, block, plot=False):
    # 1 alpha_self_pos + 1 alpha_self_neg +
    # 1 alpha_other_pos + 1 alpha_other_neg + 1 theta
    '''
    Inputs:
        - params: list of 5 guesses (float) for each parameter 
                    alpha_self_pos
                    alpha_self_neg
                    alpha_other_pos
                    alpha_other_neg
                    theta
        - choices: list of 96 choices (int) on each trial (0, 1)
        - outcomes: list of 96 outcomes (int) on each trial (-1, 0, 1)
        - block: list of 96 conditions (string) on each trial (self_win, self_avoidloss, social_win, social_avoidloss)
    Outputs:
        - negLL: negative loglikelihood computed
                 from the choice probabilities (float)
    '''
    alpha_self_pos, alpha_self_neg, alpha_other_pos, alpha_other_neg, theta = params 
    
    # check inputs
    if np.isnan(alpha_self_pos) or np.isnan(alpha_self_neg) or np.isnan(alpha_other_pos) or np.isnan(alpha_other_neg) or np.isnan(theta): 
        return np.inf
    else:
        blocks = list(block)

        # extracts list of four strings corresponding to conditions
        unique_conditions = list(set(block))

        # init choice probs
        choiceProb = np.zeros((len(blocks)), dtype = float) 
        Q_out = {}

        count = 0
        for condition in unique_conditions:

            T_temp = blocks.count(condition)    
            Q = [0.5, 0.5] # Q at trial 0
            Q_stored = np.zeros((2, T_temp), dtype = float) 

            cur_indices = get_index_positions(blocks, condition)
            c, r = np.array(choices)[cur_indices], np.array(outcomes)[cur_indices]

            # check if self vs social
            if 'self' in condition:
                # check if win vs avoid loss
                if 'win' in condition:
                    r = r
                elif 'avoidloss' in condition:
                    r = np.array([x+1 for x in r]) 
                
                
            elif 'social' in condition:
                # check if win vs avoid loss
                if 'win' in condition:
                    r = r
                elif 'avoidloss' in condition:
                    r = np.array([x+1 for x in r])
            
            # loop through trials within condition
            for t in range(T_temp):       

                if np.isnan(c[t]):
                    # don't update if nan
                    choiceProb[count] = np.nan
                    Q_stored[:,t] = Q

                else:
                    # compute choice probabilities for k=2
                    # use the softmax rule
                    ev = np.exp(theta*np.array(Q))
                    sum_ev = np.sum(ev)
                    p = ev / sum_ev

                    # compute choice probability for actual choice
                    choiceProb[count] = p[int(c[t])]

                    # update values
                    delta = r[t] - Q[int(c[t])]

                    if 'self' in condition:
                        if 'win' in condition:
                            Q[int(c[t])] = Q[int(c[t])] + alpha_self_pos * delta
                        elif 'avoidloss' in condition:
                            Q[int(c[t])] = Q[int(c[t])] + alpha_self_neg * delta

                    elif 'social' in condition:
                        if 'win' in condition:
                            Q[int(c[t])] = Q[int(c[t])] + alpha_other_pos * delta
                        elif 'avoidloss' in condition:
                            Q[int(c[t])] = Q[int(c[t])] + alpha_other_neg * delta                    

                    # store Q_t+1
                    Q_stored[:,t] = Q

                count += 1

            Q_out[condition] = Q_stored
            
        negLL = -np.nansum(np.log(choiceProb))
        
        if plot: #plot mean across 4 blocks
            Q0 = np.mean(np.stack((Q_out['self_win'][0], Q_out['social_win'][0], Q_out['self_avoidloss'][0], Q_out['social_avoidloss'][0]),axis=0),axis=0)
            Q1 = np.mean(np.stack((Q_out['self_win'][1], Q_out['social_win'][1], Q_out['self_avoidloss'][1], Q_out['social_avoidloss'][1]),axis=0),axis=0)

            plt.plot(range(T_temp),Q0)
            plt.plot(range(T_temp),Q1)
            plt.title('Mean Q across conditions')
            plt.xlabel('trial')
            plt.ylabel('Q')
            plt.show()
        
        return negLL

#initialize dataframe to store results
df5 = pd.DataFrame(index=range(11), columns=['alpha_self_pos','alpha_self_neg','alpha_other_pos','alpha_other_neg','theta','NLL','BIC'])

# initialize list of algorithms 
# (use one during class, but you can try a few others on your own)
algorithms = ['L-BFGS-B'] #['Powell','TNC','SLSQP','trust-constr']

# loop through subjects
for index, behavior in enumerate(subject_data.values()):
    c, o, block = behavior.true_accuracy, behavior.outcome, behavior.block
    bounds = ((0,1),(0,1),(0,1),(0,1),(0,12))

    # gradient descent to minimize neg LL
    res_nll = np.inf # set initial neg LL to be inf

    # guess several different starting points for alpha, theta
    # (doing less here because the search space is so large!)
    for alpha_self_guess in np.linspace(0,1,3):
        for alpha_other_guess in np.linspace(0,1,3):
            for theta_guess in np.linspace(1,12,3):

                # guesses for alpha, theta will change on each loop
                init_guess = (alpha_self_guess, alpha_other_guess, alpha_self_guess, alpha_other_guess, theta_guess)

                for algorithm in algorithms:

                    # minimize neg LL
                    result = minimize(SocialValence_4a1t,
                                      init_guess,
                                      (c, o, block),
                                      bounds=bounds,
                                      method=algorithm)

                    # if current negLL is smaller than the last negLL,
                    # then store current data
                    if result.fun < res_nll and result.success:
                        res_nll = result.fun
                        param_fits = result.x

    # also, compute BIC
    BIC = 2 * res_nll + len(init_guess) * np.log(len(c))

    #store in dataframe
    df5.at[index, 'alpha_self_pos'] = param_fits[0]
    df5.at[index, 'alpha_self_neg'] = param_fits[1]
    df5.at[index, 'alpha_other_pos'] = param_fits[2]
    df5.at[index, 'alpha_other_neg'] = param_fits[3]
    df5.at[index, 'theta'] = param_fits[4]
    df5.at[index, 'NLL'] = res_nll
    df5.at[index, 'BIC'] = BIC

    # print/plot Q values for subject   
    print(fr'subject {index+1:02}: alpha_self_pos={param_fits[0]:.2f}, alpha_self_neg={param_fits[1]:.2f}, alpha_other_pos={param_fits[2]:.2f}, alpha_other_neg={param_fits[3]:.2f}, theta={param_fits[4]:.2f}; negLL={res_nll:.2f}; BIC={BIC:.2f}')
    nll = SocialValence_4a1t(param_fits, c, o, block, plot=True)

subject 01: alpha_self_pos=0.32, alpha_self_neg=0.98, alpha_other_pos=0.02, alpha_other_neg=1.00, theta=9.44; negLL=38.02; BIC=98.85

../_images/module-04-01_Prosocial-RL-Exercises-Solutions_36_1.png

subject 02: alpha_self_pos=0.12, alpha_self_neg=0.07, alpha_other_pos=0.01, alpha_other_neg=0.02, theta=12.00; negLL=46.04; BIC=114.91

../_images/module-04-01_Prosocial-RL-Exercises-Solutions_36_3.png

subject 03: alpha_self_pos=0.14, alpha_self_neg=0.08, alpha_other_pos=0.15, alpha_other_neg=0.01, theta=12.00; negLL=33.46; BIC=89.74

../_images/module-04-01_Prosocial-RL-Exercises-Solutions_36_5.png

subject 04: alpha_self_pos=0.00, alpha_self_neg=0.00, alpha_other_pos=0.00, alpha_other_neg=0.01, theta=12.00; negLL=66.25; BIC=155.32

../_images/module-04-01_Prosocial-RL-Exercises-Solutions_36_7.png

subject 05: alpha_self_pos=0.25, alpha_self_neg=0.32, alpha_other_pos=0.12, alpha_other_neg=0.34, theta=7.99; negLL=34.10; BIC=91.02

../_images/module-04-01_Prosocial-RL-Exercises-Solutions_36_9.png

subject 06: alpha_self_pos=0.12, alpha_self_neg=0.63, alpha_other_pos=0.39, alpha_other_neg=0.85, theta=8.01; negLL=25.12; BIC=73.05

../_images/module-04-01_Prosocial-RL-Exercises-Solutions_36_11.png

subject 07: alpha_self_pos=0.14, alpha_self_neg=0.14, alpha_other_pos=0.14, alpha_other_neg=0.10, theta=12.00; negLL=18.93; BIC=60.69

../_images/module-04-01_Prosocial-RL-Exercises-Solutions_36_13.png

subject 08: alpha_self_pos=0.03, alpha_self_neg=0.14, alpha_other_pos=0.48, alpha_other_neg=0.04, theta=9.78; negLL=44.02; BIC=110.86

../_images/module-04-01_Prosocial-RL-Exercises-Solutions_36_15.png

subject 09: alpha_self_pos=0.50, alpha_self_neg=0.38, alpha_other_pos=0.18, alpha_other_neg=0.62, theta=8.13; negLL=26.99; BIC=76.79

../_images/module-04-01_Prosocial-RL-Exercises-Solutions_36_17.png

subject 10: alpha_self_pos=0.23, alpha_self_neg=0.56, alpha_other_pos=0.89, alpha_other_neg=0.29, theta=12.00; negLL=16.22; BIC=55.26

../_images/module-04-01_Prosocial-RL-Exercises-Solutions_36_19.png

subject 11: alpha_self_pos=0.23, alpha_self_neg=0.40, alpha_other_pos=0.31, alpha_other_neg=0.16, theta=12.00; negLL=11.30; BIC=45.42

../_images/module-04-01_Prosocial-RL-Exercises-Solutions_36_21.png

display(df5)

	alpha_self_pos	alpha_self_neg	alpha_other_pos	alpha_other_neg	theta	NLL	BIC
0	0.320348	0.978904	0.023847	1.0	9.437487	38.016359	98.854458
1	0.11808	0.065096	0.014785	0.016731	12.0	46.042017	114.905776
2	0.139103	0.081481	0.152275	0.01182	12.0	33.457502	89.736745
3	0.0	0.0	0.0	0.007885	12.0	66.247088	155.315916
4	0.2519	0.318756	0.116209	0.342456	7.98573	34.10134	91.024422
5	0.116973	0.634377	0.385897	0.849267	8.011111	25.116216	73.054172
6	0.139103	0.140344	0.139586	0.098146	12.0	18.933282	60.688305
7	0.030877	0.139475	0.480044	0.038782	9.780295	44.018283	110.858306
8	0.502454	0.381941	0.184442	0.616257	8.130312	26.986115	76.793971
9	0.226649	0.556591	0.887385	0.294739	12.0	16.218402	55.258546
10	0.229854	0.403204	0.308511	0.163654	12.0	11.301063	45.423867

Now, we should have run 3-4 models that we can compare to assess which model might explain the data best. Remember we store the parameter fits in Pandas DataFrames: df1, df2, df3, df4, and maybe df5 (if you completed the bonus).

Let’s use this information to compute Bayesian Information Criterion (\(BIC\)) across subjects for each model. One way in which we can do so is by summing the negative log likelihoods across subjects for each model and using that value to compute a \(BIC\) score. Then, we can plot each \(BIC\) to assess model fits. Please do this below.

Hints

to compute the sum of the negative likelihoods for the first model, you would use the command: np.nansum(df1['NLL'].values).
You can refer to this page to review Bayesian Information Criterion (\(BIC\)), but note that tutorial used \(MSE\) to compute the \(BIC\). Here, we will use the negative loglikelihood (like in this tutorial). Therefore, we won’t need to multiply by a \(-1\)! Also, remember to factor in the number of free parameters in each of your models!
Remember we will have 4-5 \(BIC\) values by the end of this tutorial.

# Compute BIC for each Model and Plot (6 out of 30 points)
def calculate_bic(n, nll, num_params):
    bic = (2 * nll) + (num_params * np.log(n))
    return bic

def plot_bic(list_of_dfs, list_of_num_params, df_names):
    # 96 trials across 11 subjects
    all_num_trials = 96 * 11

    #init list to input bic values
    bic_list = []

    all_mods = list_of_dfs
    all_num_params = list_of_num_params

    # get integrated BIC for each model
    for mod, num_params in zip(all_mods, all_num_params):
        bic_temp = calculate_bic(n=all_num_trials, nll=np.nansum(mod['NLL'].values), num_params=num_params)
        bic_list.append(bic_temp)

    #plot these values:
    plt.bar(range(0,len(all_mods)), bic_list)
    plt.xlabel("Model")
    plt.ylabel("BIC Score")
    plt.title("BIC Scores for Different Models")

    plt.xticks(range(0,len(all_mods)), df_names)
    plt.show()
    
list_of_dfs = [df1, df2, df3, df4, df5]
df_names = ['df1', 'df2', 'df3', 'df4', 'df5']
list_of_num_params = [2, 3, 3, 4, 5]
plot_bic(list_of_dfs, list_of_num_params, df_names)

../_images/module-04-01_Prosocial-RL-Exercises-Solutions_39_0.png

Which of these models fit the data best? Think about whether there be others that could do better!

Prosocial RL Exercises Solutions