Solutions to Python Exercises¶

For our first exercise, we will apply some of the skills we learned in Python. Please use all of the resources available to you, including Google, StackOverflow, and the Canvas Discussion Board.

For these exercises, we will be using data from O’Connell, et al. (2021). Reduced social distancing during the COVID-19 pandemic is associated with antisocial behaviors in an online United States sample. PLoS ONE.

This study assessed whether social distancing behaviors (early in the COVID-19 pandemic) was associated with self-reported antisocial behavior. To measure one index of social distancing behavior, participants were presented with an image of an adult silhouette surrounded by a rectangular border. They were asked to click a point in the image that represents how far away they typically stood from other individuals.

Here is a heatmap showing how far participants reported standing from other individuals in the past week, with dark maroon indicating a higher density of responses obtained from a kernel density estimation. The mean response coordinate, +, represents a distance of approximately 98 inches (8.2 feet; 2.5 m).

Table of Contents¶

Basics (importing modules, basic syntax, types of variables)
If statements, For loops
Functions

Key¶

# [INSERT CODE BELOW]: indicates where you should insert your own code, feel free to replace with a comment of your own
...: indicates a location where you should insert your own code
raise NotImplementedError("Student exercise: *"): delete this line once you have added your code

Basics¶

# We usually, start a notebook with a brief overview 
# in the first cell using Markdown (see above)

# Then, it is common practice to load all the 
# packages/modules that we will use in our first 
# code cell. Please import pandas and numpy 
# below so we can load our data:

# [INSERT CODE BELOW]
raise NotImplementedError("Student exercise: import pandas and numpy, then delete this line")

import requests
import ... as ...
import ... as ...

# solution:

import requests
import pandas as pd
import numpy as np

# Now, we will load in our dataframe into
# a variable called `df` and view the first few rows:

# here, we are just going to read data from the web 
# as a Pandas DataFrame
url = 'https://raw.githubusercontent.com/shawnrhoads/gu-psyc-347/master/docs/static/data/OConnell_COVID_MTurk_noPII_post_peerreview.csv'
df = pd.read_csv(url)

# [INSERT CODE BELOW]
raise NotImplementedError("Student exercise: display the contents of the DataFrame")

display(...)

# solution:

url = 'https://raw.githubusercontent.com/shawnrhoads/gu-psyc-347/master/docs/static/data/OConnell_COVID_MTurk_noPII_post_peerreview.csv'
df = pd.read_csv(url)

display(df.head())

	subID	mturk_randID	Country	Region	ISP	loc_US	loc_state	loc_zipcode	loc_County	...	education_4yr	STAB_total_centered	STAB_total_min32	silhouette_dist_X_min81	silhouette_dist_X_inches	violated_distancing	STAB_rulebreak_rmECONOMIC	STAB_total_rmECONOMIC	STAB_total_rmECONOMIC_centered	household_income_coded_centered
0	1001	8797	United States	CT	AS7015 Comcast Cable Communications, LLC	Yes	Connecticut	6511	New Haven County	...	0	-3.946565	19	441.0	110.332750	0	9	48	-2.076336	1.269231
1	1002	3756	United States	IL	AS7018 AT&T Services, Inc.	Yes	California	90280	Los Angeles County	...	0	39.053436	62	287.0	71.803856	1	24	88	37.923664	-3.730769
2	1003	3798	United States	OH	AS10796 Charter Communications Inc	Yes	Ohio	44883	Seneca County	...	0	40.053436	63	313.0	78.308731	0	23	85	34.923664	-2.730769
3	1004	2965	United States	TX	AS7018 AT&T Services, Inc.	Yes	Texas	77019	Harris County	...	1	-9.946565	13	452.0	113.084820	0	8	42	-8.076336	NaN
4	1005	5953	United States	NC	AS20115 Charter Communications	Yes	North Carolina	28334	Sampson County	...	0	-17.946566	5	297.0	74.305733	0	8	34	-16.076336	-2.730769

5 rows × 126 columns

# Great, now that we have our data. Let's store 
# data into two variables of interest into lists:
    # - silhouette_dist_X_min81 : 
    #      distance from others in pixels (x-axis)
    # - STAB_total_min32 : 
    #      antisocial behavior measured using 
    #      the Subtypes of Antisocial 
    #      Behavior Questionnaire (STAB)

# No need to add any code here. Just execute this cell!

distance = list(df['silhouette_dist_X_min81'].values)
antisociality = list(df['STAB_total_min32'].values)

# Let's verify that both of these variables are 
# indeed stored in memory as lists using the 
# `print()` and `type()` functions

# [INSERT CODE BELOW]
raise NotImplementedError("Student exercise: print the type of each variable, then delete this line")

print(type(...))
print(type(...))

# solution:

print(type(distance))
print(type(antisociality))

<class 'list'>
<class 'list'>

# Let's also explore the data a bit more. 
# Remember, both of these lists should 
# contain the same number of observations. 
# Let's store number of elements of each 
# list and print them out. 

# [INSERT CODE BELOW]
raise NotImplementedError("Student exercise: store number of elements of each list, then delete this line")

length_of_dist_data = ...
length_of_stab_data = ...

print(f'list containing distance data contains {length_of_dist_data} observations')
print(f'list containing STAB data contains {length_of_stab_data} observations')

# solution:

length_of_dist_data = len(distance)
length_of_stab_data = len(antisociality)

print(f'list containing distance data contains {length_of_dist_data} observations')
print(f'list containing STAB data contains {length_of_stab_data} observations')

list containing distance data contains 131 observations
list containing STAB data contains 131 observations

If statements, For loops¶

# Rather than printing out the lengths of each 
# list above and qualitatively assessing whether 
# they contain the same number of observations, 
# we could have just used an if-statement. 
# Let's do that now. If they are the same length, 
# then print one line with the number of observations; 
# if they are not, then print two lines with the 
# number of observations for each list.

# [INSERT CODE BELOW]
raise NotImplementedError("Student exercise: use if-statement to check if lists contain the same number of elements, then delete this line")

length_of_dist_data = ...
length_of_stab_data = ...

if ...
    print(...)
else:
    print(...)
    print(...)

# solution:

length_of_dist_data = len(distance)
length_of_stab_data = len(antisociality)

if length_of_dist_data == length_of_stab_data:
    print(f'lists contain {int((length_of_dist_data+length_of_stab_data)/2)} observations')
else:
    print(f'list containing distance data contains {length_of_dist_data} observations')
    print(f'list containing STAB data contains {length_of_stab_data} observations')

lists contain 131 observations

# We might be missing data for some of the 
# observations in these lists (i.e., a 
# participant did not complete this question, 
# so the element in the list is a `nan` 
# or not a number). Let's write a for-loop 
# to loop through the observations in `distance` 
# and then check whether each observation is a nan. 
# If the observation is a nan, then print 
# out the location of that observation in the list

# Hint: this will require you to put an 
# if-statement within the for-loop

# [INSERT CODE BELOW]
raise NotImplementedError("Student exercise: loop through elements in list and check if any are nans, then delete this line")

for index, ... in enumerate(...):
    if ...
        print(f'observation #{index} is nan')

# solution:

for index, i in enumerate(distance):
    if np.isnan(i):
        print(f'observation #{index} is nan')

observation #17 is nan
observation #22 is nan
observation #24 is nan
observation #25 is nan
observation #39 is nan
observation #51 is nan
observation #60 is nan
observation #67 is nan
observation #71 is nan
observation #94 is nan

# Okay (spoiler alert), `distance` contains nans.
# Let's take the same for-loop code from above 
# and add a "counter" to count how 
# many nans we actually have

# [INSERT CODE BELOW]
raise NotImplementedError("Student exercise: loop through elements in list, check if any are nans, and update counter for each nan, then delete this line")

counter = 0 #initialize counter with 0
for index, ... in enumerate(...):
    if ...
        counter = ... #update counter if nan
        print(f'observation #{index} is nan')

# Let's print out the number of nans. Note that this final line is outside of the for-loop
print(f'the list contains {counter} nans') 

# solution:

counter = 0 #initialize counter with 0
for index, i in enumerate(distance):
    if np.isnan(i):
        counter += 1 #update counter if nan
        print(f'observation #{index} is nan')

# Let's print out the number of nans. Note that this final line is outside of the for-loop
print(f'the list contains {counter} nans') 

observation #17 is nan
observation #22 is nan
observation #24 is nan
observation #25 is nan
observation #39 is nan
observation #51 is nan
observation #60 is nan
observation #67 is nan
observation #71 is nan
observation #94 is nan
the list contains 10 nans

Functions¶

# We can also make our code above "general-purpose",
#  so we can apply it to any list. In this cell,
#  write a function called `check_for_nans()`
#  that takes two inputs [a list and a string 
# ("the list name")] and two outputs [a boolean
#  whether the list contains any nans (i.e., if
#  the counter is greater than 0) and the
#  number of nans in list (zero if no nans)]. 

# Note that there are many ways to accomplish
#  this task, feel free to experiment 
# around with different approaches

# Fill out this function, then try to 
# excecute the next cell to see if it works

def check_for_nans(list_input, list_name='list'):
    """Check whether a list contains any nans

    Args:
        list_input (list): a list that contains the observations
        list_name (string): a string containing the name of the variable
    
    Returns:
        boolean: True if the list contains nans, False if not
        int: number of nans found in list, zero if no nans
    """

    ############################
    # [INSERT CODE BELOW]
    raise NotImplementedError("Student exercise: check if any inputted list contains nans, then delete this line")
    ############################

    # loop through elements/observations in list
    counter = 0 #initialize counter with 0
    for index, ... in enumerate(...):
        if ...
            counter = ... #update counter if nan
    
    # check if list contains any nans
    contains_nans = ...
    
    # print if contains_nans==True
    if contains_nans:
        print(f'{list_name} contains {counter} nans')
    else:
        print(f'{list_name} contains no nans')

    return contains_nans, counter

# solution:

def check_for_nans(list_input, list_name='list'):
    """Check whether a list contains any nans

    Args:
        list_input (list): a list that contains the observations
        list_name (string): a string containing the name of the variable
    
    Returns:
        boolean: True if the list contains nans, False if not
        int: number of nans found in list, zero if no nans
    """

    # loop through elements/observations in list
    counter = 0 #initialize counter with 0
    for index, i in enumerate(list_input):
        if np.isnan(i):
            counter += 1 #update counter if nan
    
    # check if list contains any nans
    contains_nans = counter > 0 
    
    # print if contains_nans==True
    if contains_nans:
        print(f'{list_name} contains {counter} nans')
    else:
        print(f'{list_name} contains no nans')

    return contains_nans, counter

# Run this cell to check your work. 
# This cell should output the line:
# "CONGRATS! LOOKS LIKE YOU DID IT!"
# No need to edit, just execute cell!

antisociality_contains_nans, antisociality_nan_count = check_for_nans(antisociality, 
                                                                      list_name='antisociality')
distance_contains_nans, distance_nan_count = check_for_nans(distance, 
                                                            list_name='distance')

# This is a check to see if it works; 
# bonus point if you can summarize what we do here!
##############
new_list = [[1, np.nan, 2, 3, np.nan, 4, 5, 6, np.nan,7, 8, 9, np.nan],           # 4 
            [1, 2, 3, 4, 5, 6, 7, 8, 9, 10],                                      # 0 
            [np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, 10]] # 8

list_of_booleans = []
list_of_counts = []
for index, item in enumerate(new_list):
    nans, counts = check_for_nans(item, list_name=f'list{index}')
    list_of_booleans.append(nans)
    list_of_counts.append(counts)

if (list_of_booleans==[True,False,True]) and (list_of_counts==[4,0,8]):
    print("CONGRATS! LOOKS LIKE YOU DID IT!")
##############

antisociality contains no nans
distance contains 10 nans
list0 contains 4 nans
list1 contains no nans
list2 contains 8 nans
CONGRATS! LOOKS LIKE YOU DID IT!

Solutions to Python Exercises

Contents