Introduction
This notebook contains all of the code from the corresponding post on the One Codex Blog. These snippets are exactly what are in the blog post, and let you perfectly reproduce those figures.
This is meant to be a starting off point for you to get started analyzing your own samples. You can copy this notebook straight into your account using the button in the header. To “run” or execute a cell, just hit Shift + Enter
. A few other resources you may find useful include: notes on getting started with our One Codex library; the full documentation on our API (more technical); a cheat sheet on getting started
with Pandas, a Python library for data manipulation; and reading a few of our blog posts (where we plan to have nice demos with these notebooks). As always, also feel free to send us questions or suggestions by clicking the chat icon in the bottom right!
Now we’re going to dive right in and start crunching some numbers!
Fetching data
To get started, we create an instance of our API, grab the DIABIMMUNE project, and download 500 samples from the cohort.
[1]:
from onecodex import Api
ocx = Api()
project = ocx.Projects.get("d53ad03b010542e3") # get DIABIMMUNE project by ID
samples = ocx.Samples.where(project=project.id, public=True, limit=50)
samples.metadata[[
"gender",
"host_age",
"geo_loc_name",
"totalige",
"eggs",
"vegetables",
"milk",
"wheat",
"rice",
]]
[1]:
gender | host_age | geo_loc_name | totalige | eggs | vegetables | milk | wheat | rice | |
---|---|---|---|---|---|---|---|---|---|
classification_id | |||||||||
001b3ea2093b426d | Male | 1093 | Finland:Espoo | 62.90 | True | True | False | True | True |
850ba22531cd4cde | Female | 686 | Russia:Petrozavodsk | 91.50 | True | True | True | False | True |
db177a540a1c43b0 | Female | 673 | Estonia:Tartu | 112.00 | True | True | True | True | True |
d281a52b08f54b6a | Female | 173 | Russia:Petrozavodsk | NaN | False | False | False | False | False |
a0cc5e58e2074fab | Male | 493 | Russia:Petrozavodsk | 42.30 | False | True | False | False | False |
4b3aa0d6eabb48dc | Male | 229 | Estonia:Tartu | 36.50 | False | False | False | False | False |
d0238007374a4dab | Male | 502 | Estonia:Tartu | 7.39 | True | True | True | True | True |
8abf53a2cf4341fe | Male | 390 | Estonia:Tartu | 698.00 | True | True | True | True | True |
8599190018b045c8 | Male | 427 | Estonia:Tartu | 88.10 | True | True | True | True | True |
11ded36641bf4be2 | Female | 587 | Estonia:Tartu | 25.20 | False | True | True | True | True |
ffb265bc656c4afc | Female | 598 | Estonia:Tartu | 131.00 | True | True | True | True | True |
3bf2f958380647b5 | Female | 594 | Estonia:Tartu | 30.80 | True | True | True | True | True |
77cad5fb325f45fa | Male | 410 | Estonia:Tartu | 7.39 | True | True | True | True | True |
b84abe31b6bb4caf | Male | 500 | Estonia:Tartu | 86.60 | True | True | True | True | True |
081424d940bd4cd9 | Female | 406 | Estonia:Tartu | 13.70 | True | True | True | True | True |
3aa4f451de8149a2 | Female | 278 | Russia:Petrozavodsk | 12.10 | True | True | True | False | True |
f3cd80e4ec4f4169 | Male | 483 | Estonia:Tartu | 698.00 | True | True | True | True | True |
4661a4510b124ee5 | Female | 500 | Estonia:Tartu | 23.90 | True | True | True | False | False |
61cd95290ed84876 | Male | 675 | Estonia:Tartu | 25.00 | True | True | True | True | True |
efafd7a62a6e434f | Male | 479 | Estonia:Tartu | 88.10 | True | True | True | True | True |
12327f55ae0d4fb8 | Male | 400 | Finland:Espoo | 36.00 | True | True | True | True | True |
d4517cfd2a98419c | Male | 649 | Finland:Espoo | 36.00 | True | True | True | True | True |
e81430008e1347e4 | Male | 588 | Finland:Espoo | 36.00 | True | True | True | True | True |
ddddd16149da436c | Male | 212 | Finland:Espoo | 41.50 | False | True | True | False | True |
aa5ed3675cb54eb3 | Male | 304 | Finland:Espoo | 41.50 | False | True | True | True | True |
4ac0bfdeefc04893 | Female | 431 | Finland:Espoo | 4.16 | True | True | True | True | True |
7afcf62f9ea74a80 | Male | 703 | Russia:Petrozavodsk | NaN | True | True | True | True | True |
d9ce42afc60240c3 | Male | 1075 | Finland:Espoo | 36.00 | True | True | True | True | True |
f7283fc3621a4c8e | Female | 593 | Estonia:Tartu | 193.00 | True | True | True | True | True |
344a8bcb2c73426b | Female | 217 | Estonia:Tartu | 24.50 | False | True | False | False | False |
54986602b6fe4e0b | Female | 495 | Estonia:Tartu | 5.43 | True | True | True | True | True |
df91a248bca34f8d | Male | 210 | Estonia:Tartu | 24.00 | True | True | True | True | True |
2b8464bc86b448fa | Male | 392 | Finland:Espoo | 127.00 | True | True | True | True | True |
8f0aaff9b67944e1 | Female | 669 | Finland:Espoo | 15.30 | True | True | False | True | True |
ce65d5efd36d4c14 | Female | 676 | Finland:Espoo | 92.20 | True | True | True | True | True |
ee361483f00549a7 | Male | 670 | Finland:Espoo | 24.30 | True | True | True | True | True |
c000446b505e4d1d | Male | 556 | Finland:Espoo | 24.30 | True | True | True | True | True |
139c880885a544da | Male | 539 | Russia:Petrozavodsk | NaN | True | True | True | True | True |
2d287a0836964fef | Female | 541 | Russia:Petrozavodsk | NaN | True | True | True | True | True |
8dbdbdd3e7e0438e | Male | 743 | Russia:Petrozavodsk | 58.00 | False | False | False | False | False |
c19c28ff39c54cbc | Male | 535 | Russia:Petrozavodsk | 19.40 | True | True | True | True | True |
1693d10e542d4d21 | Male | 217 | Russia:Petrozavodsk | 13.00 | True | True | False | False | True |
f47fd5aa29a5434f | Female | 434 | Russia:Petrozavodsk | 10.30 | False | False | False | False | False |
8aa0a16b36cd4f3f | Male | 400 | Russia:Petrozavodsk | 30.20 | False | False | False | False | False |
c053a1cc63fe4752 | Male | 498 | Russia:Petrozavodsk | 30.20 | False | False | False | False | False |
b1e800f58204406b | Female | 286 | Russia:Petrozavodsk | 2.00 | False | True | True | False | True |
20b6217e78d643a4 | Female | 367 | Russia:Petrozavodsk | 10.30 | False | False | False | False | False |
0e6afb347fa14281 | Female | 248 | Russia:Petrozavodsk | 12.10 | False | True | True | False | True |
1e8467e5f1784fcf | Male | 213 | Russia:Petrozavodsk | 10.60 | False | False | True | False | True |
0466bf5d5a4145c5 | Female | 519 | Russia:Petrozavodsk | 8.91 | False | False | False | False | False |
Question #1: How does alpha diversity vary by sample group?
Here, we display observed taxa, Simpson’s Index, and Shannon Entropy side-by-side, grouped by the region of birth. Each group includes samples taken across the entire three-year longitudinal study.
[2]:
observed_taxa = samples.plot_metadata(vaxis="observed_taxa", haxis="geo_loc_name", return_chart=True)
simpson = samples.plot_metadata(vaxis="simpson", haxis="geo_loc_name", return_chart=True)
shannon = samples.plot_metadata(vaxis="shannon", haxis="geo_loc_name", return_chart=True)
observed_taxa | simpson | shannon
2024-11-12 16:17:13,375 WARNING: SampleCollection contains multiple analysis types: One Codex Database (2020), One Codex Database (2017)
2024-11-12 16:17:19,369 WARNING: observed_otus is deprecated as of 0.6.0.
2024-11-12 16:17:19,571 WARNING: the convert_dtype parameter is deprecated and will be removed in a future version. Do ``ser.astype(object).apply()`` instead if you want ``convert_dtype=False``.
2024-11-12 16:17:19,573 WARNING: the convert_dtype parameter is deprecated and will be removed in a future version. Do ``ser.astype(object).apply()`` instead if you want ``convert_dtype=False``.
2024-11-12 16:17:19,584 WARNING: the convert_dtype parameter is deprecated and will be removed in a future version. Do ``ser.astype(object).apply()`` instead if you want ``convert_dtype=False``.
2024-11-12 16:17:19,585 WARNING: the convert_dtype parameter is deprecated and will be removed in a future version. Do ``ser.astype(object).apply()`` instead if you want ``convert_dtype=False``.
2024-11-12 16:17:19,590 WARNING: the convert_dtype parameter is deprecated and will be removed in a future version. Do ``ser.astype(object).apply()`` instead if you want ``convert_dtype=False``.
2024-11-12 16:17:19,591 WARNING: the convert_dtype parameter is deprecated and will be removed in a future version. Do ``ser.astype(object).apply()`` instead if you want ``convert_dtype=False``.
WARNING:tornado.access:404 GET /favicon.ico (::1) 0.35ms
[2]:
[3]:
from onecodex.notebooks.report import *
ref_text = 'Roo, et al. "How to Python." Nature, 2019.'
legend(f"Alpha diversity by location of birth{reference(text=ref_text, label='roo1')}")
[3]:
Question #2: How does the microbiome change over time?
The plot_metadata
function can search through all taxa in your samples and pull out read counts or relative abundances.
[4]:
samples.plot_metadata(haxis="host_age", vaxis="Bacteroides", plot_type="scatter")
2024-11-12 16:17:37,744 WARNING: the convert_dtype parameter is deprecated and will be removed in a future version. Do ``ser.astype(object).apply()`` instead if you want ``convert_dtype=False``.
2024-11-12 16:17:37,746 WARNING: the convert_dtype parameter is deprecated and will be removed in a future version. Do ``ser.astype(object).apply()`` instead if you want ``convert_dtype=False``.
2024-11-12 16:17:37,747 WARNING: the convert_dtype parameter is deprecated and will be removed in a future version. Do ``ser.astype(object).apply()`` instead if you want ``convert_dtype=False``.
Question #3: How does an individual subject’s gut change over time?
Here, we’re going to drop into a dataframe, slice it to fetch all the data points from a single subject of the study, and generate a stacked bar plot. It’s nice to see the expected high abundance of Bifidobacterium early in life, giving way to Bacteroides near age three!
[5]:
# generate a dataframe containing relative abundances
df_rel = samples.to_df(rank="genus")
# fetch all samples for subject P014839
subject_metadata = samples.metadata.loc[samples.metadata["host_subject_id"] == "P014839"]
subject_df = df_rel.loc[subject_metadata.index]
# put them in order of sample date
subject_df = subject_df.loc[subject_metadata["host_age"].sort_values().index]
# you can access our library using the ocx accessor on pandas dataframes!
subject_df.ocx.plot_bargraph(
rank="genus",
label=lambda metadata: str(metadata["host_age"]),
title="Subject P014839 Over Time",
xlabel="Host Age at Sampling Time (days)",
ylabel="Relative Abundance",
legend="Genus",
)
2024-11-12 16:17:39,817 WARNING: the convert_dtype parameter is deprecated and will be removed in a future version. Do ``ser.astype(object).apply()`` instead if you want ``convert_dtype=False``.
2024-11-12 16:17:39,819 WARNING: the convert_dtype parameter is deprecated and will be removed in a future version. Do ``ser.astype(object).apply()`` instead if you want ``convert_dtype=False``.
2024-11-12 16:17:39,822 WARNING: the convert_dtype parameter is deprecated and will be removed in a future version. Do ``ser.astype(object).apply()`` instead if you want ``convert_dtype=False``.
2024-11-12 16:17:39,825 WARNING: the convert_dtype parameter is deprecated and will be removed in a future version. Do ``ser.astype(object).apply()`` instead if you want ``convert_dtype=False``.
2024-11-12 16:17:39,826 WARNING: the convert_dtype parameter is deprecated and will be removed in a future version. Do ``ser.astype(object).apply()`` instead if you want ``convert_dtype=False``.
Question #4: Heatmaps?!
[6]:
df_rel[:30].ocx.plot_heatmap(legend="Relative Abundance", tooltip="geo_loc_name")
2024-11-12 16:17:42,260 WARNING: the convert_dtype parameter is deprecated and will be removed in a future version. Do ``ser.astype(object).apply()`` instead if you want ``convert_dtype=False``.
2024-11-12 16:17:42,262 WARNING: the convert_dtype parameter is deprecated and will be removed in a future version. Do ``ser.astype(object).apply()`` instead if you want ``convert_dtype=False``.
2024-11-12 16:17:42,265 WARNING: the convert_dtype parameter is deprecated and will be removed in a future version. Do ``ser.astype(object).apply()`` instead if you want ``convert_dtype=False``.
2024-11-12 16:17:42,266 WARNING: the convert_dtype parameter is deprecated and will be removed in a future version. Do ``ser.astype(object).apply()`` instead if you want ``convert_dtype=False``.
2024-11-12 16:17:42,268 WARNING: the convert_dtype parameter is deprecated and will be removed in a future version. Do ``ser.astype(object).apply()`` instead if you want ``convert_dtype=False``.
2024-11-12 16:17:42,270 WARNING: the convert_dtype parameter is deprecated and will be removed in a future version. Do ``ser.astype(object).apply()`` instead if you want ``convert_dtype=False``.
Question #5: How do samples cluster?
First up, we’ll plot a heatmap of weighted UniFrac distance between the first 30 samples in the dataset. This requires unnormalized read counts, so we’ll generate a new, unnormalized dataframe.
[7]:
# generate a dataframe containing read counts
df_abs = samples.to_df()
df_abs[:30].ocx.plot_distance(metric="weighted_unifrac")
2024-11-12 16:17:44,450 WARNING: the convert_dtype parameter is deprecated and will be removed in a future version. Do ``ser.astype(object).apply()`` instead if you want ``convert_dtype=False``.
2024-11-12 16:17:44,452 WARNING: the convert_dtype parameter is deprecated and will be removed in a future version. Do ``ser.astype(object).apply()`` instead if you want ``convert_dtype=False``.
2024-11-12 16:17:44,454 WARNING: the convert_dtype parameter is deprecated and will be removed in a future version. Do ``ser.astype(object).apply()`` instead if you want ``convert_dtype=False``.
2024-11-12 16:17:44,456 WARNING: the convert_dtype parameter is deprecated and will be removed in a future version. Do ``ser.astype(object).apply()`` instead if you want ``convert_dtype=False``.
Question #6: Can I do PCA?
[8]:
samples.plot_pca(color="geo_loc_name", size="Bifidobacterium", title="My PCoA Plot")
2024-11-12 16:17:45,114 WARNING: the convert_dtype parameter is deprecated and will be removed in a future version. Do ``ser.astype(object).apply()`` instead if you want ``convert_dtype=False``.
2024-11-12 16:17:45,118 WARNING: the convert_dtype parameter is deprecated and will be removed in a future version. Do ``ser.astype(object).apply()`` instead if you want ``convert_dtype=False``.
2024-11-12 16:17:45,119 WARNING: the convert_dtype parameter is deprecated and will be removed in a future version. Do ``ser.astype(object).apply()`` instead if you want ``convert_dtype=False``.
2024-11-12 16:17:45,121 WARNING: the convert_dtype parameter is deprecated and will be removed in a future version. Do ``ser.astype(object).apply()`` instead if you want ``convert_dtype=False``.
Question #6: Can I do something better than PCA?
[9]:
samples.plot_mds(
metric="weighted_unifrac", method="pcoa", color="geo_loc_name", title="My PCoA Plot"
)
2024-11-12 16:17:46,091 WARNING: the convert_dtype parameter is deprecated and will be removed in a future version. Do ``ser.astype(object).apply()`` instead if you want ``convert_dtype=False``.
2024-11-12 16:17:46,094 WARNING: the convert_dtype parameter is deprecated and will be removed in a future version. Do ``ser.astype(object).apply()`` instead if you want ``convert_dtype=False``.
2024-11-12 16:17:46,095 WARNING: the convert_dtype parameter is deprecated and will be removed in a future version. Do ``ser.astype(object).apply()`` instead if you want ``convert_dtype=False``.
2024-11-12 16:17:46,096 WARNING: the convert_dtype parameter is deprecated and will be removed in a future version. Do ``ser.astype(object).apply()`` instead if you want ``convert_dtype=False``.
[10]:
page_break()
[10]:
[11]:
bibliography()
[11]:
References
- 1
- Roo, et al. "How to Python." Nature, 2019.
[ ]: