Visualization

`plot_bargraph`

import onecodex

ocx = onecodex.Api()
project = ocx.Projects.get("d53ad03b010542e3")
samples = ocx.Samples.where(project=project, public=True, limit=10)

# note: return_chart is not needed if using a Jupyter notebook
samples.plot_bargraph(return_chart=True)

SampleCollection.plot_bargraph(rank=Rank.Auto, normalize='auto', top_n='auto', threshold='auto', title=None, xlabel=None, ylabel=None, tooltip=None, return_chart=False, haxis=None, legend='auto', label=None, sort_x=None, include_taxa_missing_rank=None, include_other=True, width=None, height=None, group_by=None, link=Link.Ocx)

Plot a bargraph of relative abundance of taxa for multiple samples.

Parameters

rank{‘auto’, ‘kingdom’, ‘phylum’, ‘class’, ‘order’, ‘family’, ‘genus’, ‘species’}, optional: Analysis will be restricted to abundances of taxa at the specified level.
normalize‘auto’ or bool, optional: Convert read counts to relative abundances such that each sample sums to 1.0. Setting ‘auto’ will choose automatically based on the data.
return_chartbool, optional: When True, return an altair.Chart object instead of displaying the resulting plot in the current notebook.
top_nint, optional: Display the top N most abundant taxa in the entire cohort of samples.
thresholdfloat: Display only taxa that are more abundant that this threshold in one or more samples.
titlestring, optional: Text label at the top of the plot.
xlabelstring, optional: Text label along the horizontal axis.
ylabelstring, optional: Text label along the vertical axis.
tooltipstring or list, optional: A string or list containing strings representing metadata fields. When a point in the plot is hovered over, the value of the metadata associated with that sample will be displayed in a modal.
haxisstring, optional: The metadata field (or tuple containing multiple categorical fields) used to facet samples.
legend: string or altair.Legend, optional: If a string is provided, it will be used as the legend title. Defaults to the metric used to generate the plot, e.g. readcount_w_children or abundance. Alternatively, an altair.Legend instance may be provided for legend customization.
labelstring or callable, optional: A metadata field (or function) used to label each analysis. If passing a function, a dict containing the metadata for each analysis is passed as the first and only positional argument. The callable function must return a string.
sort_xlist or callable, optional: Either a list of sorted labels or a function that will be called with a list of x-axis labels as the only argument, and must return the same list in a user-specified order.
include_taxa_missing_rankbool, optional: Whether or not a row should be plotted for taxa that do not have a designated parent at rank.
group_bystring, optional: The metadata field used to group samples together. Readcounts or abundances will be averaged within each group.
link: {‘ocx’, ‘ncbi’}, optional: If link is ‘ocx’, clicking a sample will open its classification results in the One Codex app. If link is ‘ncbi’, clicking a taxon will open the NCBI taxonomy browser.

Examples

Plot a bargraph of the top 10 most abundant genera

>>> samples.plot_bargraph(rank='genus', top_n=10)

`plot_distance`

import onecodex

ocx = onecodex.Api()
project = ocx.Projects.get("d53ad03b010542e3")
samples = ocx.Samples.where(project=project, public=True, limit=10)

# note: return_chart is not needed if using a Jupyter notebook
samples.plot_distance(return_chart=True)

SampleCollection.plot_distance(rank=Rank.Auto, metric=BetaDiversityMetric.BrayCurtis, title=None, xlabel=None, ylabel=None, tooltip=None, return_chart=False, linkage=Linkage.Average, label=None, width=None, height=None)

Plot beta diversity distance matrix as a heatmap and dendrogram.

Parameters

rank{‘auto’, ‘kingdom’, ‘phylum’, ‘class’, ‘order’, ‘family’, ‘genus’, ‘species’}, optional: Analysis will be restricted to abundances of taxa at the specified level.
metric{‘braycurtis’, ‘cityblock’, ‘manhattan’, ‘jaccard’, ‘unifrac’, ‘unweighted_unifrac’, ‘aitchison’}, optional: Function to use when calculating the distance between two samples. Note that ‘cityblock’ and ‘manhattan’ are equivalent metrics.
linkage{‘average’, ‘single’, ‘complete’, ‘weighted’, ‘centroid’, ‘median’}: The type of linkage to use when clustering axes.
titlestring, optional: Text label at the top of the plot.
xlabelstring, optional: Text label along the horizontal axis.
ylabelstring, optional: Text label along the vertical axis.
tooltipstring or list, optional: A string or list containing strings representing metadata fields. When a point in the plot is hovered over, the value of the metadata associated with that sample will be displayed in a modal.
labelstring or callable, optional: A metadata field (or function) used to label each analysis. If passing a function, a dict containing the metadata for each analysis is passed as the first and only positional argument. The callable function must return a string.

Examples

Plot the weighted UniFrac distance between all our samples, using counts at the genus level.

>>> samples.plot_distance(rank='genus', metric='unifrac')

`plot_functional_heatmap`

SampleCollection.plot_functional_heatmap(top_n=10, annotation=FunctionalAnnotations.Go, metric=None, sort_x=None, label=None, function_label=FunctionalLabel.Name, haxis=None, return_chart=False, xlabel='Sample', ylabel='Function', title=None, width=None, height=None)

Plot heatmap of functional profile data.

Parameters

top_nint, optional

Display the top N most abundant or covered functions in the entire cohort of samples.

annotationFunctionalAnnotations or str, optional

{‘go’, ‘eggnog’, ‘ko’, ‘ec’, ‘pfam’, ‘pathways’} Annotation sub-database used to group gene families by.

metricFunctionalAnnotationsMetric or str, optional

{‘cpm’, ‘rpk’, ‘abundance’, ‘coverage’} Normalization or value to display. If annotation is one of ‘go’, ‘eggnog’, ‘ko’, ‘ec’ or ‘pfam’, then available metrics include

‘rpk’ (read counts normalized by kilobase of gene length), or ‘cpm’ (relative copy of gene depth, normalized to a million RPK total).

If pathways are selected for annotation, then available metrics include: ‘abundance’ (summed copy numbers of reactions’ constituent enzymes) ‘coverage’ (probabilistic measure of a complete metabolic pathway,

where 1 is high confidence that a complete pathway is covered, and 0 is low confidence)

sort_xlist or callable, optional

Either a list of sorted labels or a function that will be called with a list of x-axis labels as the only argument, and must return the same list in a user-specified order.

labelstr or callable, optional

A metadata field (or function) used to label each analysis. If passing a function, a dict containing the metadata for each analysis is passed as the first and only positional argument. The callable function must return a string.

function_labelstr, optional

{‘name’, ‘id’ } Used to label functions. Defaults to name.

haxisstring, optional

The metadata field (or tuple containing multiple categorical fields) used to facet samples.

return_chartbool, optional

When True, return an altair.Chart object instead of displaying the resulting plot in the current notebook.

xlabel`str, optional

Text label along the horizontal axis.

ylabel`str, optional

Text label along the vertical axis.

titlestr, optional

Text label at the top of the plot.

widthfloat or str or dict, optional

Set altair.Chart.width.

heightfloat or str or dict, optional

Set altair.Chart.height.

`plot_heatmap`

import onecodex

ocx = onecodex.Api()
project = ocx.Projects.get("d53ad03b010542e3")
samples = ocx.Samples.where(project=project, public=True, limit=10)

# note: return_chart is not needed if using a Jupyter notebook
samples.plot_heatmap(return_chart=True)

SampleCollection.plot_heatmap(rank=Rank.Auto, normalize='auto', top_n='auto', threshold='auto', title=None, xlabel=None, ylabel=None, tooltip=None, return_chart=False, linkage=Linkage.Average, haxis=None, metric='euclidean', legend='auto', label=None, sort_x=None, sort_y=None, width=None, height=None, link=Link.Ocx)

Plot heatmap of taxa abundance/count data for several samples.

Parameters

rank{‘auto’, ‘kingdom’, ‘phylum’, ‘class’, ‘order’, ‘family’, ‘genus’, ‘species’}, optional: Analysis will be restricted to abundances of taxa at the specified level.
normalize‘auto’ or bool, optional: Convert read counts to relative abundances such that each sample sums to 1.0. Setting ‘auto’ will choose automatically based on the data.
return_chartbool, optional: When True, return an altair.Chart object instead of displaying the resulting plot in the current notebook.
haxisstring, optional: The metadata field (or tuple containing multiple categorical fields) used to group samples together. Each group of samples will be clustered independently.
metric{‘euclidean’, ‘braycurtis’, ‘cityblock’, ‘manhattan’, ‘jaccard’, ‘unifrac’, ‘unweighted_unifrac’, ‘aitchison’}, optional: Function to use when calculating the distance between two samples. Note that ‘cityblock’ and ‘manhattan’ are equivalent metrics.
linkage{‘average’, ‘single’, ‘complete’, ‘weighted’, ‘centroid’, ‘median’}: The type of linkage to use when clustering axes.
top_nint, optional: Display the top N most abundant taxa in the entire cohort of samples.
thresholdfloat: Display only taxa that are more abundant that this threshold in one or more samples.
titlestring, optional: Text label at the top of the plot.
xlabelstring, optional: Text label along the horizontal axis.
ylabelstring, optional: Text label along the vertical axis.
tooltipstring or list, optional: A string or list containing strings representing metadata fields. When a point in the plot is hovered over, the value of the metadata associated with that sample will be displayed in a modal.
legend: string, optional: Title for color scale. Defaults to the field used to generate the plot, e.g. readcount_w_children or abundance.
labelstring or callable, optional: A metadata field (or function) used to label each analysis. If passing a function, a dict containing the metadata for each analysis is passed as the first and only positional argument. The callable function must return a string.
sort_xlist or callable, optional: Either a list of sorted labels or a function that will be called with a list of x-axis labels as the only argument, and must return the same list in a user-specified order.
sort_ylist or callable, optional: Either a list of sorted labels or a function that will be called with a list of y-axis labels as the only argument, and must return the same list in a user-specified order.
link{‘ocx’, ‘ncbi’}, optional: If link is ‘ocx’, clicking a sample will open its classification results in the One Codex app. If link is ‘ncbi’, clicking a taxon will open the NCBI taxonomy browser.

Examples

Plot a heatmap of the relative abundances of the top 10 most abundant families.

>>> samples.plot_heatmap(rank='family', top_n=10)

`plot_mds`

import onecodex

ocx = onecodex.Api()
project = ocx.Projects.get("d53ad03b010542e3")
samples = ocx.Samples.where(project=project, public=True, limit=10)

# note: return_chart is not needed if using a Jupyter notebook
samples.plot_mds(return_chart=True, color="country")

SampleCollection.plot_mds(rank=Rank.Auto, metric=BetaDiversityMetric.BrayCurtis, method=OrdinationMethod.Pcoa, title=None, xlabel=None, ylabel=None, color=None, size=None, tooltip=None, return_chart=False, label=None, mark_size=100, width=None, height=None)

Plot beta diversity distance matrix using multidimensional scaling (MDS).

Parameters

rank{‘auto’, ‘kingdom’, ‘phylum’, ‘class’, ‘order’, ‘family’, ‘genus’, ‘species’}, optional: Analysis will be restricted to abundances of taxa at the specified level.
metric{‘braycurtis’, ‘cityblock’, ‘manhattan’, ‘jaccard’, ‘unifrac’, ‘unweighted_unifrac’, ‘aitchison’}, optional: Function to use when calculating the distance between two samples. Note that ‘cityblock’ and ‘manhattan’ are equivalent metrics.
method{‘pcoa’, ‘smacof’}: Algorithm to use for ordination. PCoA uses eigenvalue decomposition and is not well suited to non-euclidean distance functions. SMACOF is an iterative optimization strategy that can be used as an alternative.
titlestring, optional: Text label at the top of the plot.
xlabelstring, optional: Text label along the horizontal axis.
ylabelstring, optional: Text label along the vertical axis.
sizestring or tuple, optional: A string or a tuple containing strings representing metadata fields. The size of points in the resulting plot will change based on the metadata associated with each sample.
colorstring or tuple, optional: A string or a tuple containing strings representing metadata fields. The color of points in the resulting plot will change based on the metadata associated with each sample.
tooltipstring or list, optional: A string or list containing strings representing metadata fields. When a point in the plot is hovered over, the value of the metadata associated with that sample will be displayed in a modal.
labelstring or callable, optional: A metadata field (or function) used to label each analysis. If passing a function, a dict containing the metadata for each analysis is passed as the first and only positional argument. The callable function must return a string.

Examples

Scatter plot of weighted UniFrac distance between all our samples, using counts at the genus level.

>>> samples.plot_mds(rank='genus', metric='unifrac')

Notes

For `smacof`: The values reported on the axis labels are Pearson’s correlations between the distances between points on each axis alone, and the corresponding distances in the distance matrix calculated using the user-specified metric. These values are related to the effectiveness of the MDS algorithm in placing points on the scatter plot in such a way that they truly represent the calculated distances. They do not reflect how well the distance metric captures similarities between the underlying data (in this case, an OTU table).

`plot_metadata`

A general plotting tool which can be used to plot boxplots and scatter plots of individual abundances or alpha-diversity metrics.

Alpha Diversity

import onecodex

ocx = onecodex.Api()
project = ocx.Projects.get("d53ad03b010542e3")
samples = ocx.Samples.where(project=project, public=True, limit=20)

# note: return_chart is not needed if using a Jupyter notebook
samples.plot_metadata(return_chart=True, haxis="country")

2D Abundance Scatterplot

import onecodex

ocx = onecodex.Api()
project = ocx.Projects.get("d53ad03b010542e3")
samples = ocx.Samples.where(project=project, public=True, limit=20)

# note: return_chart is not needed if using a Jupyter notebook
samples.plot_metadata(return_chart=True, haxis="Bacteroides", vaxis="Firmicutes")

Boxplot

import onecodex

ocx = onecodex.Api()
project = ocx.Projects.get("d53ad03b010542e3")
samples = ocx.Samples.where(project=project, public=True, limit=20)

# note: return_chart is not needed if using a Jupyter notebook
samples.plot_metadata(return_chart=True, vaxis="Bacteroides", haxis="country")

SampleCollection.plot_metadata(rank=Rank.Auto, haxis='Label', vaxis=AlphaDiversityMetric.Shannon, title=None, xlabel=None, ylabel=None, return_chart=False, plot_type=PlotType.Auto, label=None, sort_x=None, width: int | Literal['container'] | None = 200, height: int | Literal['container'] | None = 400, facet_by=None, coerce_haxis_dates=True, secondary_haxis=None)

Plot an arbitrary metadata field versus an arbitrary quantity as a boxplot or scatter plot.

Parameters

rank{‘auto’, ‘kingdom’, ‘phylum’, ‘class’, ‘order’, ‘family’, ‘genus’, ‘species’}, optional

Analysis will be restricted to abundances of taxa at the specified level.

haxisstring, optional

The metadata field (or tuple containing multiple categorical fields) to be plotted on the horizontal axis.

vaxisstring, optional

Data to be plotted on the vertical axis. Can be any one of the following:

A metadata field: the name of a metadata field containing numerical data
{‘simpson’, ‘observed_taxa’, ‘shannon’}: an alpha diversity statistic to calculate for each sample
A taxon name: the name of a taxon in the analysis
A taxon ID: the ID of a taxon in the analysis

titlestring, optional

Text label at the top of the plot.

xlabelstring, optional

Text label along the horizontal axis.

ylabelstring, optional

Text label along the vertical axis.

plot_type{‘auto’, ‘boxplot’, ‘scatter’}

By default, will determine plot type automatically based on the data. Otherwise, specify one of ‘boxplot’ or ‘scatter’ to set the type of plot manually.

labelstring or callable, optional

A metadata field (or function) used to label each analysis. If passing a function, a dict containing the metadata for each analysis is passed as the first and only positional argument. The callable function must return a string.

sort_xlist or callable, optional

Either a list of sorted labels or a function that will be called with a list of x-axis labels as the only argument, and must return the same list in a user-specified order.

widthint or str, optional

Sets altair.Chart.width. If “container”, chart width will respond to the width of the HTML container it is rendered in.

heightint or str, optional

Sets altair.Chart.height. If “container”, chart height will respond to the height of the HTML container it is rendered in.

facet_bystring, optional

The metadata field used to facet samples by (i.e. to create a separate subplot for each group of samples).

coerce_haxis_datesbool, optional

If True, haxis field name(s) containing the word “date” (after splitting on underscores) will be coerced to datetime dtype. For example, the field “date_collected” will be coerced if coerce_haxis_dates is True.

secondary_haxisstr or tuple of str, optional

The secondary metadata field (or tuple containing multiple categorical fields) to be plotted on the horizontal axis.

Examples

Generate a boxplot of the abundance of Bacteroides (genus) of samples grouped by whether the individuals are allergic to dogs, cats, both, or neither.

>>> plot_metadata(haxis=('allergy_dogs', 'allergy_cats'), vaxis='Bacteroides')

`plot_pca`

import onecodex

ocx = onecodex.Api()
project = ocx.Projects.get("d53ad03b010542e3")
samples = ocx.Samples.where(project=project, public=True, limit=10)

# note: return_chart is not needed if using a Jupyter notebook
samples.plot_pca(return_chart=True, color="country")

SampleCollection.plot_pca(rank=Rank.Auto, normalize='auto', org_vectors=0, org_vectors_scale=None, title=None, xlabel=None, ylabel=None, color=None, size=None, tooltip=None, return_chart=False, label=None, mark_size=100, width=None, height=None)

Perform principal component analysis and plot first two axes.

Parameters

rank{‘auto’, ‘kingdom’, ‘phylum’, ‘class’, ‘order’, ‘family’, ‘genus’, ‘species’}, optional: Analysis will be restricted to abundances of taxa at the specified level.
normalize‘auto’ or bool, optional: Convert read counts to relative abundances such that each sample sums to 1.0. Setting ‘auto’ will choose automatically based on the data.
org_vectorsint, optional: Plot this many of the top-contributing eigenvectors from the PCA results.
org_vectors_scalefloat, optional: Multiply the length of the lines representing the eigenvectors by this constant.
titlestring, optional: Text label at the top of the plot.
xlabelstring, optional: Text label along the horizontal axis.
ylabelstring, optional: Text label along the vertical axis.
sizestring or tuple, optional: A string or a tuple containing strings representing metadata fields. The size of points in the resulting plot will change based on the metadata associated with each sample.
colorstring or tuple, optional: A string or a tuple containing strings representing metadata fields. The color of points in the resulting plot will change based on the metadata associated with each sample.
tooltipstring or list, optional: A string or list containing strings representing metadata fields. When a point in the plot is hovered over, the value of the metadata associated with that sample will be displayed in a modal.
labelstring or callable, optional: A metadata field (or function) used to label each analysis. If passing a function, a dict containing the metadata for each analysis is passed as the first and only positional argument. The callable function must return a string.
mark_size: int, optional: The size of the points in the scatter plot.

Examples

Perform PCA on relative abundances at the species-level and color the resulting points by ‘geo_loc_name’, a metadata field representing the geographical origin of each sample.

>>> samples.plot_pca(rank='species', normalize=True, color='geo_loc_name')

Change the size of each point in the plot based on the abundance of Bacteroides.

>>> samples.plot_pca(size='Bacteroides')

Display the abundances of Bacteroides, Prevotella, and Bifidobacterium in each sample when hovering over points in the plot.

>>> samples.plot_pca(tooltip=['Bacteroides', 'Prevotella', 'Bifidobacterium'])

Visualization

plot_bargraph

Parameters

Examples

plot_distance

Parameters

Examples

plot_functional_heatmap

Parameters

plot_heatmap

Parameters

Examples

plot_mds

Parameters

Examples

Notes

plot_metadata

Alpha Diversity

2D Abundance Scatterplot

Boxplot

Parameters

Examples

plot_pca

Parameters

Examples

`plot_bargraph`

`plot_distance`

`plot_functional_heatmap`

`plot_heatmap`

`plot_mds`

`plot_metadata`

`plot_pca`