Open In Colab

Solutions to Python Exercises

For our first exercise, we will apply some of the skills we learned in Python. Please use all of the resources available to you, including Google, StackOverflow, and the Canvas Discussion Board.

For these exercises, we will be using data from O’Connell, et al. (2021). Reduced social distancing during the COVID-19 pandemic is associated with antisocial behaviors in an online United States sample. PLoS ONE.

This study assessed whether social distancing behaviors (early in the COVID-19 pandemic) was associated with self-reported antisocial behavior. To measure one index of social distancing behavior, participants were presented with an image of an adult silhouette surrounded by a rectangular border. They were asked to click a point in the image that represents how far away they typically stood from other individuals.

Here is a heatmap showing how far participants reported standing from other individuals in the past week, with dark maroon indicating a higher density of responses obtained from a kernel density estimation. The mean response coordinate, +, represents a distance of approximately 98 inches (8.2 feet; 2.5 m).

Figure 1

Table of Contents

  1. Basics (importing modules, basic syntax, types of variables)

  2. If statements, For loops

  3. Functions

Key

  • # [INSERT CODE BELOW]: indicates where you should insert your own code, feel free to replace with a comment of your own

  • ...: indicates a location where you should insert your own code

  • raise NotImplementedError("Student exercise: *"): delete this line once you have added your code

Basics

# We usually, start a notebook with a brief overview 
# in the first cell using Markdown (see above)

# Then, it is common practice to load all the 
# packages/modules that we will use in our first 
# code cell. Please import pandas and numpy 
# below so we can load our data:

# [INSERT CODE BELOW]
raise NotImplementedError("Student exercise: import pandas and numpy, then delete this line")

import requests
import ... as ...
import ... as ...
# solution:

import requests
import pandas as pd
import numpy as np
# Now, we will load in our dataframe into
# a variable called `df` and view the first few rows:

# here, we are just going to read data from the web 
# as a Pandas DataFrame
url = 'https://raw.githubusercontent.com/shawnrhoads/gu-psyc-347/master/docs/static/data/OConnell_COVID_MTurk_noPII_post_peerreview.csv'
df = pd.read_csv(url)

# [INSERT CODE BELOW]
raise NotImplementedError("Student exercise: display the contents of the DataFrame")

display(...)
# solution:

url = 'https://raw.githubusercontent.com/shawnrhoads/gu-psyc-347/master/docs/static/data/OConnell_COVID_MTurk_noPII_post_peerreview.csv'
df = pd.read_csv(url)

display(df.head())
subID mturk_randID suspect_itaysisso Country Region ISP loc_US loc_state loc_zipcode loc_County ... education_4yr STAB_total_centered STAB_total_min32 silhouette_dist_X_min81 silhouette_dist_X_inches violated_distancing STAB_rulebreak_rmECONOMIC STAB_total_rmECONOMIC STAB_total_rmECONOMIC_centered household_income_coded_centered
0 1001 8797 0 United States CT AS7015 Comcast Cable Communications, LLC Yes Connecticut 6511 New Haven County ... 0 -3.946565 19 441.0 110.332750 0 9 48 -2.076336 1.269231
1 1002 3756 0 United States IL AS7018 AT&T Services, Inc. Yes California 90280 Los Angeles County ... 0 39.053436 62 287.0 71.803856 1 24 88 37.923664 -3.730769
2 1003 3798 0 United States OH AS10796 Charter Communications Inc Yes Ohio 44883 Seneca County ... 0 40.053436 63 313.0 78.308731 0 23 85 34.923664 -2.730769
3 1004 2965 0 United States TX AS7018 AT&T Services, Inc. Yes Texas 77019 Harris County ... 1 -9.946565 13 452.0 113.084820 0 8 42 -8.076336 NaN
4 1005 5953 0 United States NC AS20115 Charter Communications Yes North Carolina 28334 Sampson County ... 0 -17.946566 5 297.0 74.305733 0 8 34 -16.076336 -2.730769

5 rows × 126 columns

# Great, now that we have our data. Let's store 
# data into two variables of interest into lists:
    # - silhouette_dist_X_min81 : 
    #      distance from others in pixels (x-axis)
    # - STAB_total_min32 : 
    #      antisocial behavior measured using 
    #      the Subtypes of Antisocial 
    #      Behavior Questionnaire (STAB)

# No need to add any code here. Just execute this cell!

distance = list(df['silhouette_dist_X_min81'].values)
antisociality = list(df['STAB_total_min32'].values)
# Let's verify that both of these variables are 
# indeed stored in memory as lists using the 
# `print()` and `type()` functions

# [INSERT CODE BELOW]
raise NotImplementedError("Student exercise: print the type of each variable, then delete this line")

print(type(...))
print(type(...))
# solution:

print(type(distance))
print(type(antisociality))
<class 'list'>
<class 'list'>
# Let's also explore the data a bit more. 
# Remember, both of these lists should 
# contain the same number of observations. 
# Let's store number of elements of each 
# list and print them out. 

# [INSERT CODE BELOW]
raise NotImplementedError("Student exercise: store number of elements of each list, then delete this line")

length_of_dist_data = ...
length_of_stab_data = ...

print(f'list containing distance data contains {length_of_dist_data} observations')
print(f'list containing STAB data contains {length_of_stab_data} observations')
# solution:

length_of_dist_data = len(distance)
length_of_stab_data = len(antisociality)

print(f'list containing distance data contains {length_of_dist_data} observations')
print(f'list containing STAB data contains {length_of_stab_data} observations')
list containing distance data contains 131 observations
list containing STAB data contains 131 observations

If statements, For loops

# Rather than printing out the lengths of each 
# list above and qualitatively assessing whether 
# they contain the same number of observations, 
# we could have just used an if-statement. 
# Let's do that now. If they are the same length, 
# then print one line with the number of observations; 
# if they are not, then print two lines with the 
# number of observations for each list.

# [INSERT CODE BELOW]
raise NotImplementedError("Student exercise: use if-statement to check if lists contain the same number of elements, then delete this line")

length_of_dist_data = ...
length_of_stab_data = ...

if ...
    print(...)
else:
    print(...)
    print(...)
# solution:

length_of_dist_data = len(distance)
length_of_stab_data = len(antisociality)

if length_of_dist_data == length_of_stab_data:
    print(f'lists contain {int((length_of_dist_data+length_of_stab_data)/2)} observations')
else:
    print(f'list containing distance data contains {length_of_dist_data} observations')
    print(f'list containing STAB data contains {length_of_stab_data} observations')
lists contain 131 observations
# We might be missing data for some of the 
# observations in these lists (i.e., a 
# participant did not complete this question, 
# so the element in the list is a `nan` 
# or not a number). Let's write a for-loop 
# to loop through the observations in `distance` 
# and then check whether each observation is a nan. 
# If the observation is a nan, then print 
# out the location of that observation in the list

# Hint: this will require you to put an 
# if-statement within the for-loop

# [INSERT CODE BELOW]
raise NotImplementedError("Student exercise: loop through elements in list and check if any are nans, then delete this line")

for index, ... in enumerate(...):
    if ...
        print(f'observation #{index} is nan')
# solution:

for index, i in enumerate(distance):
    if np.isnan(i):
        print(f'observation #{index} is nan')
observation #17 is nan
observation #22 is nan
observation #24 is nan
observation #25 is nan
observation #39 is nan
observation #51 is nan
observation #60 is nan
observation #67 is nan
observation #71 is nan
observation #94 is nan
# Okay (spoiler alert), `distance` contains nans.
# Let's take the same for-loop code from above 
# and add a "counter" to count how 
# many nans we actually have

# [INSERT CODE BELOW]
raise NotImplementedError("Student exercise: loop through elements in list, check if any are nans, and update counter for each nan, then delete this line")

counter = 0 #initialize counter with 0
for index, ... in enumerate(...):
    if ...
        counter = ... #update counter if nan
        print(f'observation #{index} is nan')

# Let's print out the number of nans. Note that this final line is outside of the for-loop
print(f'the list contains {counter} nans') 
# solution:

counter = 0 #initialize counter with 0
for index, i in enumerate(distance):
    if np.isnan(i):
        counter += 1 #update counter if nan
        print(f'observation #{index} is nan')

# Let's print out the number of nans. Note that this final line is outside of the for-loop
print(f'the list contains {counter} nans') 
observation #17 is nan
observation #22 is nan
observation #24 is nan
observation #25 is nan
observation #39 is nan
observation #51 is nan
observation #60 is nan
observation #67 is nan
observation #71 is nan
observation #94 is nan
the list contains 10 nans

Functions

# We can also make our code above "general-purpose",
#  so we can apply it to any list. In this cell,
#  write a function called `check_for_nans()`
#  that takes two inputs [a list and a string 
# ("the list name")] and two outputs [a boolean
#  whether the list contains any nans (i.e., if
#  the counter is greater than 0) and the
#  number of nans in list (zero if no nans)]. 

# Note that there are many ways to accomplish
#  this task, feel free to experiment 
# around with different approaches

# Fill out this function, then try to 
# excecute the next cell to see if it works

def check_for_nans(list_input, list_name='list'):
    """Check whether a list contains any nans

    Args:
        list_input (list): a list that contains the observations
        list_name (string): a string containing the name of the variable
    
    Returns:
        boolean: True if the list contains nans, False if not
        int: number of nans found in list, zero if no nans
    """

    ############################
    # [INSERT CODE BELOW]
    raise NotImplementedError("Student exercise: check if any inputted list contains nans, then delete this line")
    ############################

    # loop through elements/observations in list
    counter = 0 #initialize counter with 0
    for index, ... in enumerate(...):
        if ...
            counter = ... #update counter if nan
    
    # check if list contains any nans
    contains_nans = ...
    
    # print if contains_nans==True
    if contains_nans:
        print(f'{list_name} contains {counter} nans')
    else:
        print(f'{list_name} contains no nans')

    return contains_nans, counter
# solution:

def check_for_nans(list_input, list_name='list'):
    """Check whether a list contains any nans

    Args:
        list_input (list): a list that contains the observations
        list_name (string): a string containing the name of the variable
    
    Returns:
        boolean: True if the list contains nans, False if not
        int: number of nans found in list, zero if no nans
    """

    # loop through elements/observations in list
    counter = 0 #initialize counter with 0
    for index, i in enumerate(list_input):
        if np.isnan(i):
            counter += 1 #update counter if nan
    
    # check if list contains any nans
    contains_nans = counter > 0 
    
    # print if contains_nans==True
    if contains_nans:
        print(f'{list_name} contains {counter} nans')
    else:
        print(f'{list_name} contains no nans')

    return contains_nans, counter
# Run this cell to check your work. 
# This cell should output the line:
# "CONGRATS! LOOKS LIKE YOU DID IT!"
# No need to edit, just execute cell!

antisociality_contains_nans, antisociality_nan_count = check_for_nans(antisociality, 
                                                                      list_name='antisociality')
distance_contains_nans, distance_nan_count = check_for_nans(distance, 
                                                            list_name='distance')

# This is a check to see if it works; 
# bonus point if you can summarize what we do here!
##############
new_list = [[1, np.nan, 2, 3, np.nan, 4, 5, 6, np.nan,7, 8, 9, np.nan],           # 4 
            [1, 2, 3, 4, 5, 6, 7, 8, 9, 10],                                      # 0 
            [np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, 10]] # 8

list_of_booleans = []
list_of_counts = []
for index, item in enumerate(new_list):
    nans, counts = check_for_nans(item, list_name=f'list{index}')
    list_of_booleans.append(nans)
    list_of_counts.append(counts)

if (list_of_booleans==[True,False,True]) and (list_of_counts==[4,0,8]):
    print("CONGRATS! LOOKS LIKE YOU DID IT!")
##############
antisociality contains no nans
distance contains 10 nans
list0 contains 4 nans
list1 contains no nans
list2 contains 8 nans
CONGRATS! LOOKS LIKE YOU DID IT!

Notebook Feedback

Please convert this cell to a Markdown cell.

Create a Heading named “Notebook Feedback,” then provide 1-2 sentences about your experience with this Jupyter Notebook (e.g., Did you enjoy the exercises? Were they too easy/difficult? Would you have like to see anything different? Were you able to apply some skills we learned during class? Anything still confusing?). Finally, please rate your experience from (0) “did not enjoy at all” to (10) “enjoyed a great deal.” Only your instructor will see these responses.