Data¶

Here’s a brief description about our sample. We will use a popular dataset that contains information on all the passengers aboard the Titanic.

Table 1¶

Let’s plot a table with summary statistics.

import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

titanic = sns.load_dataset("titanic")
display(titanic.groupby(["sex", "class"])["age"].describe())

		count	mean	std	min	25%	50%	75%	max
sex	class
female	First	85.0	34.611765	13.612052	2.00	23.000	35.0	44.00	63.0
	Second	74.0	28.722973	12.872702	2.00	22.250	28.0	36.00	57.0
	Third	102.0	21.750000	12.729964	0.75	14.125	21.5	29.75	63.0
male	First	101.0	41.281386	15.139570	0.92	30.000	40.0	51.00	80.0
	Second	99.0	30.740707	14.793894	0.67	23.000	30.0	36.75	70.0
	Third	253.0	26.507589	12.159514	0.42	20.000	25.0	33.00	74.0

We can also plot our summary statistics.

Supplemental Figure 1¶

sns.catplot(x="sex", y="survived", hue="class", kind="bar", data=titanic)
plt.show()