Introduction

This notebook contains all of the code from the corresponding post on the One Codex Blog. These snippets are exactly what are in the blog post, and let you perfectly reproduce those figures.

This is meant to be a starting off point for you to get started analyzing your own samples. You can copy this notebook straight into your account using the button in the header. To “run” or execute a cell, just hit Shift + Enter. A few other resources you may find useful include: notes on getting started with our One Codex library; the full documentation on our API (more technical); a cheat sheet on getting started with Pandas, a Python library for data manipulation; and reading a few of our blog posts (where we plan to have nice demos with these notebooks). As always, also feel free to send us questions or suggestions by clicking the chat icon in the bottom right!

Now we’re going to dive right in and start crunching some numbers!

Fetching data

To get started, we create an instance of our API, grab the DIABIMMUNE project, and download 500 samples from the cohort.

[1]:
from onecodex import Api

ocx = Api()

project = ocx.Projects.get("d53ad03b010542e3")  # get DIABIMMUNE project by ID
samples = ocx.Samples.where(project=project.id, public=True, limit=50)

samples.metadata[[
    "gender",
    "host_age",
    "geo_loc_name",
    "totalige",
    "eggs",
    "vegetables",
    "milk",
    "wheat",
    "rice",
]]
[1]:
gender host_age geo_loc_name totalige eggs vegetables milk wheat rice
classification_id
001b3ea2093b426d Male 1093 Finland:Espoo 62.90 True True False True True
850ba22531cd4cde Female 686 Russia:Petrozavodsk 91.50 True True True False True
db177a540a1c43b0 Female 673 Estonia:Tartu 112.00 True True True True True
d281a52b08f54b6a Female 173 Russia:Petrozavodsk NaN False False False False False
a0cc5e58e2074fab Male 493 Russia:Petrozavodsk 42.30 False True False False False
4b3aa0d6eabb48dc Male 229 Estonia:Tartu 36.50 False False False False False
d0238007374a4dab Male 502 Estonia:Tartu 7.39 True True True True True
8abf53a2cf4341fe Male 390 Estonia:Tartu 698.00 True True True True True
8599190018b045c8 Male 427 Estonia:Tartu 88.10 True True True True True
11ded36641bf4be2 Female 587 Estonia:Tartu 25.20 False True True True True
ffb265bc656c4afc Female 598 Estonia:Tartu 131.00 True True True True True
3bf2f958380647b5 Female 594 Estonia:Tartu 30.80 True True True True True
77cad5fb325f45fa Male 410 Estonia:Tartu 7.39 True True True True True
b84abe31b6bb4caf Male 500 Estonia:Tartu 86.60 True True True True True
081424d940bd4cd9 Female 406 Estonia:Tartu 13.70 True True True True True
3aa4f451de8149a2 Female 278 Russia:Petrozavodsk 12.10 True True True False True
f3cd80e4ec4f4169 Male 483 Estonia:Tartu 698.00 True True True True True
4661a4510b124ee5 Female 500 Estonia:Tartu 23.90 True True True False False
61cd95290ed84876 Male 675 Estonia:Tartu 25.00 True True True True True
efafd7a62a6e434f Male 479 Estonia:Tartu 88.10 True True True True True
12327f55ae0d4fb8 Male 400 Finland:Espoo 36.00 True True True True True
d4517cfd2a98419c Male 649 Finland:Espoo 36.00 True True True True True
e81430008e1347e4 Male 588 Finland:Espoo 36.00 True True True True True
ddddd16149da436c Male 212 Finland:Espoo 41.50 False True True False True
aa5ed3675cb54eb3 Male 304 Finland:Espoo 41.50 False True True True True
4ac0bfdeefc04893 Female 431 Finland:Espoo 4.16 True True True True True
7afcf62f9ea74a80 Male 703 Russia:Petrozavodsk NaN True True True True True
d9ce42afc60240c3 Male 1075 Finland:Espoo 36.00 True True True True True
f7283fc3621a4c8e Female 593 Estonia:Tartu 193.00 True True True True True
344a8bcb2c73426b Female 217 Estonia:Tartu 24.50 False True False False False
54986602b6fe4e0b Female 495 Estonia:Tartu 5.43 True True True True True
df91a248bca34f8d Male 210 Estonia:Tartu 24.00 True True True True True
2b8464bc86b448fa Male 392 Finland:Espoo 127.00 True True True True True
8f0aaff9b67944e1 Female 669 Finland:Espoo 15.30 True True False True True
ce65d5efd36d4c14 Female 676 Finland:Espoo 92.20 True True True True True
ee361483f00549a7 Male 670 Finland:Espoo 24.30 True True True True True
c000446b505e4d1d Male 556 Finland:Espoo 24.30 True True True True True
139c880885a544da Male 539 Russia:Petrozavodsk NaN True True True True True
2d287a0836964fef Female 541 Russia:Petrozavodsk NaN True True True True True
8dbdbdd3e7e0438e Male 743 Russia:Petrozavodsk 58.00 False False False False False
c19c28ff39c54cbc Male 535 Russia:Petrozavodsk 19.40 True True True True True
1693d10e542d4d21 Male 217 Russia:Petrozavodsk 13.00 True True False False True
f47fd5aa29a5434f Female 434 Russia:Petrozavodsk 10.30 False False False False False
8aa0a16b36cd4f3f Male 400 Russia:Petrozavodsk 30.20 False False False False False
c053a1cc63fe4752 Male 498 Russia:Petrozavodsk 30.20 False False False False False
b1e800f58204406b Female 286 Russia:Petrozavodsk 2.00 False True True False True
20b6217e78d643a4 Female 367 Russia:Petrozavodsk 10.30 False False False False False
0e6afb347fa14281 Female 248 Russia:Petrozavodsk 12.10 False True True False True
1e8467e5f1784fcf Male 213 Russia:Petrozavodsk 10.60 False False True False True
0466bf5d5a4145c5 Female 519 Russia:Petrozavodsk 8.91 False False False False False

Question #1: How does alpha diversity vary by sample group?

Here, we display observed taxa, Simpson’s Index, and Shannon Entropy side-by-side, grouped by the region of birth. Each group includes samples taken across the entire three-year longitudinal study.

[2]:
observed_taxa = samples.plot_metadata(vaxis="observed_taxa", haxis="geo_loc_name", return_chart=True)
simpson = samples.plot_metadata(vaxis="simpson", haxis="geo_loc_name", return_chart=True)
shannon = samples.plot_metadata(vaxis="shannon", haxis="geo_loc_name", return_chart=True)

observed_taxa | simpson | shannon
2024-08-17 00:27:09,188 WARNING: SampleCollection contains multiple analysis types: One Codex Database (2017), One Codex Database (2020)
2024-08-17 00:27:16,425 WARNING: observed_otus is deprecated as of 0.6.0.
2024-08-17 00:27:16,635 WARNING: the convert_dtype parameter is deprecated and will be removed in a future version.  Do ``ser.astype(object).apply()`` instead if you want ``convert_dtype=False``.
2024-08-17 00:27:16,636 WARNING: the convert_dtype parameter is deprecated and will be removed in a future version.  Do ``ser.astype(object).apply()`` instead if you want ``convert_dtype=False``.
2024-08-17 00:27:16,648 WARNING: the convert_dtype parameter is deprecated and will be removed in a future version.  Do ``ser.astype(object).apply()`` instead if you want ``convert_dtype=False``.
2024-08-17 00:27:16,649 WARNING: the convert_dtype parameter is deprecated and will be removed in a future version.  Do ``ser.astype(object).apply()`` instead if you want ``convert_dtype=False``.
2024-08-17 00:27:16,655 WARNING: the convert_dtype parameter is deprecated and will be removed in a future version.  Do ``ser.astype(object).apply()`` instead if you want ``convert_dtype=False``.
2024-08-17 00:27:16,656 WARNING: the convert_dtype parameter is deprecated and will be removed in a future version.  Do ``ser.astype(object).apply()`` instead if you want ``convert_dtype=False``.
[2]:
[3]:
from onecodex.notebooks.report import *

ref_text = 'Roo, et al. "How to Python." Nature, 2019.'
legend(f"Alpha diversity by location of birth{reference(text=ref_text, label='roo1')}")
[3]:
Figure 1. Alpha diversity by location of birth1

Question #2: How does the microbiome change over time?

The plot_metadata function can search through all taxa in your samples and pull out read counts or relative abundances.

[4]:
samples.plot_metadata(haxis="host_age", vaxis="Bacteroides", plot_type="scatter")
2024-08-17 00:27:26,210 WARNING: the convert_dtype parameter is deprecated and will be removed in a future version.  Do ``ser.astype(object).apply()`` instead if you want ``convert_dtype=False``.
2024-08-17 00:27:26,212 WARNING: the convert_dtype parameter is deprecated and will be removed in a future version.  Do ``ser.astype(object).apply()`` instead if you want ``convert_dtype=False``.
2024-08-17 00:27:26,214 WARNING: the convert_dtype parameter is deprecated and will be removed in a future version.  Do ``ser.astype(object).apply()`` instead if you want ``convert_dtype=False``.

Question #3: How does an individual subject’s gut change over time?

Here, we’re going to drop into a dataframe, slice it to fetch all the data points from a single subject of the study, and generate a stacked bar plot. It’s nice to see the expected high abundance of Bifidobacterium early in life, giving way to Bacteroides near age three!

[5]:
# generate a dataframe containing relative abundances
df_rel = samples.to_df(rank="genus")

# fetch all samples for subject P014839
subject_metadata = samples.metadata.loc[samples.metadata["host_subject_id"] == "P014839"]
subject_df = df_rel.loc[subject_metadata.index]

# put them in order of sample date
subject_df = subject_df.loc[subject_metadata["host_age"].sort_values().index]

# you can access our library using the ocx accessor on pandas dataframes!
subject_df.ocx.plot_bargraph(
    rank="genus",
    label=lambda metadata: str(metadata["host_age"]),
    title="Subject P014839 Over Time",
    xlabel="Host Age at Sampling Time (days)",
    ylabel="Relative Abundance",
    legend="Genus",
)
2024-08-17 00:27:28,127 WARNING: the convert_dtype parameter is deprecated and will be removed in a future version.  Do ``ser.astype(object).apply()`` instead if you want ``convert_dtype=False``.
2024-08-17 00:27:28,128 WARNING: the convert_dtype parameter is deprecated and will be removed in a future version.  Do ``ser.astype(object).apply()`` instead if you want ``convert_dtype=False``.
2024-08-17 00:27:28,130 WARNING: the convert_dtype parameter is deprecated and will be removed in a future version.  Do ``ser.astype(object).apply()`` instead if you want ``convert_dtype=False``.
2024-08-17 00:27:28,134 WARNING: the convert_dtype parameter is deprecated and will be removed in a future version.  Do ``ser.astype(object).apply()`` instead if you want ``convert_dtype=False``.
2024-08-17 00:27:28,136 WARNING: the convert_dtype parameter is deprecated and will be removed in a future version.  Do ``ser.astype(object).apply()`` instead if you want ``convert_dtype=False``.

Question #4: Heatmaps?!

[6]:
df_rel[:30].ocx.plot_heatmap(legend="Relative Abundance", tooltip="geo_loc_name")
2024-08-17 00:27:30,415 WARNING: the convert_dtype parameter is deprecated and will be removed in a future version.  Do ``ser.astype(object).apply()`` instead if you want ``convert_dtype=False``.
2024-08-17 00:27:30,417 WARNING: the convert_dtype parameter is deprecated and will be removed in a future version.  Do ``ser.astype(object).apply()`` instead if you want ``convert_dtype=False``.
2024-08-17 00:27:30,420 WARNING: the convert_dtype parameter is deprecated and will be removed in a future version.  Do ``ser.astype(object).apply()`` instead if you want ``convert_dtype=False``.
2024-08-17 00:27:30,422 WARNING: the convert_dtype parameter is deprecated and will be removed in a future version.  Do ``ser.astype(object).apply()`` instead if you want ``convert_dtype=False``.
2024-08-17 00:27:30,424 WARNING: the convert_dtype parameter is deprecated and will be removed in a future version.  Do ``ser.astype(object).apply()`` instead if you want ``convert_dtype=False``.
2024-08-17 00:27:30,425 WARNING: the convert_dtype parameter is deprecated and will be removed in a future version.  Do ``ser.astype(object).apply()`` instead if you want ``convert_dtype=False``.

Question #5: How do samples cluster?

First up, we’ll plot a heatmap of weighted UniFrac distance between the first 30 samples in the dataset. This requires unnormalized read counts, so we’ll generate a new, unnormalized dataframe.

[7]:
# generate a dataframe containing read counts
df_abs = samples.to_df()

df_abs[:30].ocx.plot_distance(metric="weighted_unifrac")
2024-08-17 00:27:32,491 WARNING: the convert_dtype parameter is deprecated and will be removed in a future version.  Do ``ser.astype(object).apply()`` instead if you want ``convert_dtype=False``.
2024-08-17 00:27:32,493 WARNING: the convert_dtype parameter is deprecated and will be removed in a future version.  Do ``ser.astype(object).apply()`` instead if you want ``convert_dtype=False``.
2024-08-17 00:27:32,496 WARNING: the convert_dtype parameter is deprecated and will be removed in a future version.  Do ``ser.astype(object).apply()`` instead if you want ``convert_dtype=False``.
2024-08-17 00:27:32,497 WARNING: the convert_dtype parameter is deprecated and will be removed in a future version.  Do ``ser.astype(object).apply()`` instead if you want ``convert_dtype=False``.

Question #6: Can I do PCA?

[8]:
samples.plot_pca(color="geo_loc_name", size="Bifidobacterium", title="My PCoA Plot")
2024-08-17 00:27:32,994 WARNING: the convert_dtype parameter is deprecated and will be removed in a future version.  Do ``ser.astype(object).apply()`` instead if you want ``convert_dtype=False``.
2024-08-17 00:27:32,997 WARNING: the convert_dtype parameter is deprecated and will be removed in a future version.  Do ``ser.astype(object).apply()`` instead if you want ``convert_dtype=False``.
2024-08-17 00:27:32,998 WARNING: the convert_dtype parameter is deprecated and will be removed in a future version.  Do ``ser.astype(object).apply()`` instead if you want ``convert_dtype=False``.
2024-08-17 00:27:32,999 WARNING: the convert_dtype parameter is deprecated and will be removed in a future version.  Do ``ser.astype(object).apply()`` instead if you want ``convert_dtype=False``.

Question #6: Can I do something better than PCA?

[9]:
samples.plot_mds(
    metric="weighted_unifrac", method="pcoa", color="geo_loc_name", title="My PCoA Plot"
)
2024-08-17 00:27:33,834 WARNING: the convert_dtype parameter is deprecated and will be removed in a future version.  Do ``ser.astype(object).apply()`` instead if you want ``convert_dtype=False``.
2024-08-17 00:27:33,837 WARNING: the convert_dtype parameter is deprecated and will be removed in a future version.  Do ``ser.astype(object).apply()`` instead if you want ``convert_dtype=False``.
2024-08-17 00:27:33,838 WARNING: the convert_dtype parameter is deprecated and will be removed in a future version.  Do ``ser.astype(object).apply()`` instead if you want ``convert_dtype=False``.
2024-08-17 00:27:33,840 WARNING: the convert_dtype parameter is deprecated and will be removed in a future version.  Do ``ser.astype(object).apply()`` instead if you want ``convert_dtype=False``.
[10]:
page_break()
[10]:
[11]:
bibliography()
[11]:

References

1
Roo, et al. "How to Python." Nature, 2019.
[ ]: