Visualization

See Also

Visualization functions are implemented as part of SampleCollection. For more information, see SampleCollection.

plot_bargraph

import onecodex

ocx = onecodex.Api()
project = ocx.Projects.get("d53ad03b010542e3")
samples = ocx.Samples.where(project=project, public=True, limit=10)

# note: return_chart is not needed if using a Jupyter notebook
samples.plot_bargraph(return_chart=True)
SampleCollection.plot_bargraph(rank=Rank.Auto, normalize='auto', top_n='auto', threshold='auto', title=None, xlabel=None, ylabel=None, tooltip=None, return_chart=False, haxis=None, legend='auto', label=None, sort_x=None, include_taxa_missing_rank=None, include_other=True, width=None, height=None, group_by=None, link=Link.Ocx)

Plot a bargraph of relative abundance of taxa for multiple samples.

Parameters

rank{‘auto’, ‘kingdom’, ‘phylum’, ‘class’, ‘order’, ‘family’, ‘genus’, ‘species’}, optional

Analysis will be restricted to abundances of taxa at the specified level.

normalize‘auto’ or bool, optional

Convert read counts to relative abundances such that each sample sums to 1.0. Setting ‘auto’ will choose automatically based on the data.

return_chartbool, optional

When True, return an altair.Chart object instead of displaying the resulting plot in the current notebook.

top_nint, optional

Display the top N most abundant taxa in the entire cohort of samples.

thresholdfloat

Display only taxa that are more abundant that this threshold in one or more samples.

titlestring, optional

Text label at the top of the plot.

xlabelstring, optional

Text label along the horizontal axis.

ylabelstring, optional

Text label along the vertical axis.

tooltipstring or list, optional

A string or list containing strings representing metadata fields. When a point in the plot is hovered over, the value of the metadata associated with that sample will be displayed in a modal.

haxisstring, optional

The metadata field (or tuple containing multiple categorical fields) used to facet samples.

legend: string or altair.Legend, optional

If a string is provided, it will be used as the legend title. Defaults to the metric used to generate the plot, e.g. readcount_w_children or abundance. Alternatively, an altair.Legend instance may be provided for legend customization.

labelstring or callable, optional

A metadata field (or function) used to label each analysis. If passing a function, a dict containing the metadata for each analysis is passed as the first and only positional argument. The callable function must return a string.

sort_xlist or callable, optional

Either a list of sorted labels or a function that will be called with a list of x-axis labels as the only argument, and must return the same list in a user-specified order.

include_taxa_missing_rankbool, optional

Whether or not a row should be plotted for taxa that do not have a designated parent at rank.

group_bystring, optional

The metadata field used to group samples together. Readcounts or abundances will be averaged within each group.

link: {‘ocx’, ‘ncbi’}, optional

If link is ‘ocx’, clicking a sample will open its classification results in the One Codex app. If link is ‘ncbi’, clicking a taxon will open the NCBI taxonomy browser.

Examples

Plot a bargraph of the top 10 most abundant genera

>>> samples.plot_bargraph(rank='genus', top_n=10)

plot_distance

import onecodex

ocx = onecodex.Api()
project = ocx.Projects.get("d53ad03b010542e3")
samples = ocx.Samples.where(project=project, public=True, limit=10)

# note: return_chart is not needed if using a Jupyter notebook
samples.plot_distance(return_chart=True)
SampleCollection.plot_distance(rank=Rank.Auto, metric=BetaDiversityMetric.BrayCurtis, title=None, xlabel=None, ylabel=None, tooltip=None, return_chart=False, linkage=Linkage.Average, label=None, width=None, height=None)

Plot beta diversity distance matrix as a heatmap and dendrogram.

Parameters

rank{‘auto’, ‘kingdom’, ‘phylum’, ‘class’, ‘order’, ‘family’, ‘genus’, ‘species’}, optional

Analysis will be restricted to abundances of taxa at the specified level.

metric{‘braycurtis’, ‘cityblock’, ‘manhattan’, ‘jaccard’, ‘unifrac’, ‘unweighted_unifrac’, ‘aitchison’}, optional

Function to use when calculating the distance between two samples. Note that ‘cityblock’ and ‘manhattan’ are equivalent metrics.

linkage{‘average’, ‘single’, ‘complete’, ‘weighted’, ‘centroid’, ‘median’}

The type of linkage to use when clustering axes.

titlestring, optional

Text label at the top of the plot.

xlabelstring, optional

Text label along the horizontal axis.

ylabelstring, optional

Text label along the vertical axis.

tooltipstring or list, optional

A string or list containing strings representing metadata fields. When a point in the plot is hovered over, the value of the metadata associated with that sample will be displayed in a modal.

labelstring or callable, optional

A metadata field (or function) used to label each analysis. If passing a function, a dict containing the metadata for each analysis is passed as the first and only positional argument. The callable function must return a string.

Examples

Plot the weighted UniFrac distance between all our samples, using counts at the genus level.

>>> samples.plot_distance(rank='genus', metric='unifrac')

plot_functional_heatmap

SampleCollection.plot_functional_heatmap(top_n=10, annotation=FunctionalAnnotations.Go, metric=None, sort_x=None, label=None, function_label=FunctionalLabel.Name, haxis=None, return_chart=False, xlabel='Sample', ylabel='Function', title=None, width=None, height=None)

Plot heatmap of functional profile data.

Parameters

top_nint, optional

Display the top N most abundant or covered functions in the entire cohort of samples.

annotationFunctionalAnnotations or str, optional

{‘go’, ‘eggnog’, ‘ko’, ‘ec’, ‘pfam’, ‘pathways’} Annotation sub-database used to group gene families by.

metricFunctionalAnnotationsMetric or str, optional

{‘cpm’, ‘rpk’, ‘abundance’, ‘coverage’} Normalization or value to display. If annotation is one of ‘go’, ‘eggnog’, ‘ko’, ‘ec’ or ‘pfam’, then available metrics include

‘rpk’ (read counts normalized by kilobase of gene length), or ‘cpm’ (relative copy of gene depth, normalized to a million RPK total).

If pathways are selected for annotation, then available metrics include

‘abundance’ (summed copy numbers of reactions’ constituent enzymes) ‘coverage’ (probabilistic measure of a complete metabolic pathway,

where 1 is high confidence that a complete pathway is covered, and 0 is low confidence)

sort_xlist or callable, optional

Either a list of sorted labels or a function that will be called with a list of x-axis labels as the only argument, and must return the same list in a user-specified order.

labelstr or callable, optional

A metadata field (or function) used to label each analysis. If passing a function, a dict containing the metadata for each analysis is passed as the first and only positional argument. The callable function must return a string.

function_labelstr, optional

{‘name’, ‘id’ } Used to label functions. Defaults to name.

haxisstring, optional

The metadata field (or tuple containing multiple categorical fields) used to facet samples.

return_chartbool, optional

When True, return an altair.Chart object instead of displaying the resulting plot in the current notebook.

xlabel`str, optional

Text label along the horizontal axis.

ylabel`str, optional

Text label along the vertical axis.

titlestr, optional

Text label at the top of the plot.

widthfloat or str or dict, optional

Set altair.Chart.width.

heightfloat or str or dict, optional

Set altair.Chart.height.

plot_heatmap

import onecodex

ocx = onecodex.Api()
project = ocx.Projects.get("d53ad03b010542e3")
samples = ocx.Samples.where(project=project, public=True, limit=10)

# note: return_chart is not needed if using a Jupyter notebook
samples.plot_heatmap(return_chart=True)
SampleCollection.plot_heatmap(rank=Rank.Auto, normalize='auto', top_n='auto', threshold='auto', title=None, xlabel=None, ylabel=None, tooltip=None, return_chart=False, linkage=Linkage.Average, haxis=None, metric='euclidean', legend='auto', label=None, sort_x=None, sort_y=None, width=None, height=None, link=Link.Ocx)

Plot heatmap of taxa abundance/count data for several samples.

Parameters

rank{‘auto’, ‘kingdom’, ‘phylum’, ‘class’, ‘order’, ‘family’, ‘genus’, ‘species’}, optional

Analysis will be restricted to abundances of taxa at the specified level.

normalize‘auto’ or bool, optional

Convert read counts to relative abundances such that each sample sums to 1.0. Setting ‘auto’ will choose automatically based on the data.

return_chartbool, optional

When True, return an altair.Chart object instead of displaying the resulting plot in the current notebook.

haxisstring, optional

The metadata field (or tuple containing multiple categorical fields) used to group samples together. Each group of samples will be clustered independently.

metric{‘euclidean’, ‘braycurtis’, ‘cityblock’, ‘manhattan’, ‘jaccard’, ‘unifrac’, ‘unweighted_unifrac’, ‘aitchison’}, optional

Function to use when calculating the distance between two samples. Note that ‘cityblock’ and ‘manhattan’ are equivalent metrics.

linkage{‘average’, ‘single’, ‘complete’, ‘weighted’, ‘centroid’, ‘median’}

The type of linkage to use when clustering axes.

top_nint, optional

Display the top N most abundant taxa in the entire cohort of samples.

thresholdfloat

Display only taxa that are more abundant that this threshold in one or more samples.

titlestring, optional

Text label at the top of the plot.

xlabelstring, optional

Text label along the horizontal axis.

ylabelstring, optional

Text label along the vertical axis.

tooltipstring or list, optional

A string or list containing strings representing metadata fields. When a point in the plot is hovered over, the value of the metadata associated with that sample will be displayed in a modal.

legend: string, optional

Title for color scale. Defaults to the field used to generate the plot, e.g. readcount_w_children or abundance.

labelstring or callable, optional

A metadata field (or function) used to label each analysis. If passing a function, a dict containing the metadata for each analysis is passed as the first and only positional argument. The callable function must return a string.

sort_xlist or callable, optional

Either a list of sorted labels or a function that will be called with a list of x-axis labels as the only argument, and must return the same list in a user-specified order.

sort_ylist or callable, optional

Either a list of sorted labels or a function that will be called with a list of y-axis labels as the only argument, and must return the same list in a user-specified order.

link{‘ocx’, ‘ncbi’}, optional

If link is ‘ocx’, clicking a sample will open its classification results in the One Codex app. If link is ‘ncbi’, clicking a taxon will open the NCBI taxonomy browser.

Examples

Plot a heatmap of the relative abundances of the top 10 most abundant families.

>>> samples.plot_heatmap(rank='family', top_n=10)

plot_mds

import onecodex

ocx = onecodex.Api()
project = ocx.Projects.get("d53ad03b010542e3")
samples = ocx.Samples.where(project=project, public=True, limit=10)

# note: return_chart is not needed if using a Jupyter notebook
samples.plot_mds(return_chart=True, color="country")
SampleCollection.plot_mds(rank=Rank.Auto, metric=BetaDiversityMetric.BrayCurtis, method=OrdinationMethod.Pcoa, title=None, xlabel=None, ylabel=None, color=None, size=None, tooltip=None, return_chart=False, label=None, mark_size=100, width=None, height=None)

Plot beta diversity distance matrix using multidimensional scaling (MDS).

Parameters

rank{‘auto’, ‘kingdom’, ‘phylum’, ‘class’, ‘order’, ‘family’, ‘genus’, ‘species’}, optional

Analysis will be restricted to abundances of taxa at the specified level.

metric{‘braycurtis’, ‘cityblock’, ‘manhattan’, ‘jaccard’, ‘unifrac’, ‘unweighted_unifrac’, ‘aitchison’}, optional

Function to use when calculating the distance between two samples. Note that ‘cityblock’ and ‘manhattan’ are equivalent metrics.

method{‘pcoa’, ‘smacof’}

Algorithm to use for ordination. PCoA uses eigenvalue decomposition and is not well suited to non-euclidean distance functions. SMACOF is an iterative optimization strategy that can be used as an alternative.

titlestring, optional

Text label at the top of the plot.

xlabelstring, optional

Text label along the horizontal axis.

ylabelstring, optional

Text label along the vertical axis.

sizestring or tuple, optional

A string or a tuple containing strings representing metadata fields. The size of points in the resulting plot will change based on the metadata associated with each sample.

colorstring or tuple, optional

A string or a tuple containing strings representing metadata fields. The color of points in the resulting plot will change based on the metadata associated with each sample.

tooltipstring or list, optional

A string or list containing strings representing metadata fields. When a point in the plot is hovered over, the value of the metadata associated with that sample will be displayed in a modal.

labelstring or callable, optional

A metadata field (or function) used to label each analysis. If passing a function, a dict containing the metadata for each analysis is passed as the first and only positional argument. The callable function must return a string.

Examples

Scatter plot of weighted UniFrac distance between all our samples, using counts at the genus level.

>>> samples.plot_mds(rank='genus', metric='unifrac')

Notes

For `smacof`: The values reported on the axis labels are Pearson’s correlations between the distances between points on each axis alone, and the corresponding distances in the distance matrix calculated using the user-specified metric. These values are related to the effectiveness of the MDS algorithm in placing points on the scatter plot in such a way that they truly represent the calculated distances. They do not reflect how well the distance metric captures similarities between the underlying data (in this case, an OTU table).

plot_metadata

A general plotting tool which can be used to plot boxplots and scatter plots of individual abundances or alpha-diversity metrics.

Alpha Diversity

import onecodex

ocx = onecodex.Api()
project = ocx.Projects.get("d53ad03b010542e3")
samples = ocx.Samples.where(project=project, public=True, limit=20)

# note: return_chart is not needed if using a Jupyter notebook
samples.plot_metadata(return_chart=True, haxis="country")

2D Abundance Scatterplot

import onecodex

ocx = onecodex.Api()
project = ocx.Projects.get("d53ad03b010542e3")
samples = ocx.Samples.where(project=project, public=True, limit=20)

# note: return_chart is not needed if using a Jupyter notebook
samples.plot_metadata(return_chart=True, haxis="Bacteroides", vaxis="Firmicutes")

Boxplot

import onecodex

ocx = onecodex.Api()
project = ocx.Projects.get("d53ad03b010542e3")
samples = ocx.Samples.where(project=project, public=True, limit=20)

# note: return_chart is not needed if using a Jupyter notebook
samples.plot_metadata(return_chart=True, vaxis="Bacteroides", haxis="country")
SampleCollection.plot_metadata(rank=Rank.Auto, haxis='Label', vaxis=AlphaDiversityMetric.Shannon, title=None, xlabel=None, ylabel=None, return_chart=False, plot_type=PlotType.Auto, label=None, sort_x=None, width=200, height=400, facet_by=None, coerce_haxis_dates=True, secondary_haxis=None)

Plot an arbitrary metadata field versus an arbitrary quantity as a boxplot or scatter plot.

Parameters

rank{‘auto’, ‘kingdom’, ‘phylum’, ‘class’, ‘order’, ‘family’, ‘genus’, ‘species’}, optional

Analysis will be restricted to abundances of taxa at the specified level.

haxisstring, optional

The metadata field (or tuple containing multiple categorical fields) to be plotted on the horizontal axis.

vaxisstring, optional

Data to be plotted on the vertical axis. Can be any one of the following:

  • A metadata field: the name of a metadata field containing numerical data

  • {‘simpson’, ‘observed_taxa’, ‘shannon’}: an alpha diversity statistic to calculate for each sample

  • A taxon name: the name of a taxon in the analysis

  • A taxon ID: the ID of a taxon in the analysis

titlestring, optional

Text label at the top of the plot.

xlabelstring, optional

Text label along the horizontal axis.

ylabelstring, optional

Text label along the vertical axis.

plot_type{‘auto’, ‘boxplot’, ‘scatter’}

By default, will determine plot type automatically based on the data. Otherwise, specify one of ‘boxplot’ or ‘scatter’ to set the type of plot manually.

labelstring or callable, optional

A metadata field (or function) used to label each analysis. If passing a function, a dict containing the metadata for each analysis is passed as the first and only positional argument. The callable function must return a string.

sort_xlist or callable, optional

Either a list of sorted labels or a function that will be called with a list of x-axis labels as the only argument, and must return the same list in a user-specified order.

facet_bystring, optional

The metadata field used to facet samples by (i.e. to create a separate subplot for each group of samples).

coerce_haxis_datesbool, optional

If True, haxis field name(s) containing the word “date” (after splitting on underscores) will be coerced to datetime dtype. For example, the field “date_collected” will be coerced if coerce_haxis_dates is True.

secondary_haxisstr or tuple of str, optional

The secondary metadata field (or tuple containing multiple categorical fields) to be plotted on the horizontal axis.

Examples

Generate a boxplot of the abundance of Bacteroides (genus) of samples grouped by whether the individuals are allergic to dogs, cats, both, or neither.

>>> plot_metadata(haxis=('allergy_dogs', 'allergy_cats'), vaxis='Bacteroides')

plot_pca

import onecodex

ocx = onecodex.Api()
project = ocx.Projects.get("d53ad03b010542e3")
samples = ocx.Samples.where(project=project, public=True, limit=10)

# note: return_chart is not needed if using a Jupyter notebook
samples.plot_pca(return_chart=True, color="country")
SampleCollection.plot_pca(rank=Rank.Auto, normalize='auto', org_vectors=0, org_vectors_scale=None, title=None, xlabel=None, ylabel=None, color=None, size=None, tooltip=None, return_chart=False, label=None, mark_size=100, width=None, height=None)

Perform principal component analysis and plot first two axes.

Parameters

rank{‘auto’, ‘kingdom’, ‘phylum’, ‘class’, ‘order’, ‘family’, ‘genus’, ‘species’}, optional

Analysis will be restricted to abundances of taxa at the specified level.

normalize‘auto’ or bool, optional

Convert read counts to relative abundances such that each sample sums to 1.0. Setting ‘auto’ will choose automatically based on the data.

org_vectorsint, optional

Plot this many of the top-contributing eigenvectors from the PCA results.

org_vectors_scalefloat, optional

Multiply the length of the lines representing the eigenvectors by this constant.

titlestring, optional

Text label at the top of the plot.

xlabelstring, optional

Text label along the horizontal axis.

ylabelstring, optional

Text label along the vertical axis.

sizestring or tuple, optional

A string or a tuple containing strings representing metadata fields. The size of points in the resulting plot will change based on the metadata associated with each sample.

colorstring or tuple, optional

A string or a tuple containing strings representing metadata fields. The color of points in the resulting plot will change based on the metadata associated with each sample.

tooltipstring or list, optional

A string or list containing strings representing metadata fields. When a point in the plot is hovered over, the value of the metadata associated with that sample will be displayed in a modal.

labelstring or callable, optional

A metadata field (or function) used to label each analysis. If passing a function, a dict containing the metadata for each analysis is passed as the first and only positional argument. The callable function must return a string.

mark_size: int, optional

The size of the points in the scatter plot.

Examples

Perform PCA on relative abundances at the species-level and color the resulting points by ‘geo_loc_name’, a metadata field representing the geographical origin of each sample.

>>> samples.plot_pca(rank='species', normalize=True, color='geo_loc_name')

Change the size of each point in the plot based on the abundance of Bacteroides.

>>> samples.plot_pca(size='Bacteroides')

Display the abundances of Bacteroides, Prevotella, and Bifidobacterium in each sample when hovering over points in the plot.

>>> samples.plot_pca(tooltip=['Bacteroides', 'Prevotella', 'Bifidobacterium'])