Visualization
See Also
Visualization functions are implemented as part of SampleCollection
. For
more information, see SampleCollection.
plot_bargraph
import onecodex
ocx = onecodex.Api()
project = ocx.Projects.get("d53ad03b010542e3")
samples = ocx.Samples.where(project=project, public=True, limit=10)
# note: return_chart is not needed if using a Jupyter notebook
samples.plot_bargraph(return_chart=True)
- SampleCollection.plot_bargraph(rank=Rank.Auto, normalize='auto', top_n='auto', threshold='auto', title=None, xlabel=None, ylabel=None, tooltip=None, return_chart=False, haxis=None, legend='auto', label=None, sort_x=None, include_taxa_missing_rank=None, include_other=True, width=None, height=None, group_by=None, link=Link.Ocx)
Plot a bargraph of relative abundance of taxa for multiple samples.
Parameters
- rank{‘auto’, ‘kingdom’, ‘phylum’, ‘class’, ‘order’, ‘family’, ‘genus’, ‘species’}, optional
Analysis will be restricted to abundances of taxa at the specified level.
- normalize‘auto’ or bool, optional
Convert read counts to relative abundances such that each sample sums to 1.0. Setting ‘auto’ will choose automatically based on the data.
- return_chartbool, optional
When True, return an altair.Chart object instead of displaying the resulting plot in the current notebook.
- top_nint, optional
Display the top N most abundant taxa in the entire cohort of samples.
- thresholdfloat
Display only taxa that are more abundant that this threshold in one or more samples.
- titlestring, optional
Text label at the top of the plot.
- xlabelstring, optional
Text label along the horizontal axis.
- ylabelstring, optional
Text label along the vertical axis.
- tooltipstring or list, optional
A string or list containing strings representing metadata fields. When a point in the plot is hovered over, the value of the metadata associated with that sample will be displayed in a modal.
- haxisstring, optional
The metadata field (or tuple containing multiple categorical fields) used to facet samples.
- legend: string or altair.Legend, optional
If a string is provided, it will be used as the legend title. Defaults to the metric used to generate the plot, e.g. readcount_w_children or abundance. Alternatively, an altair.Legend instance may be provided for legend customization.
- labelstring or callable, optional
A metadata field (or function) used to label each analysis. If passing a function, a dict containing the metadata for each analysis is passed as the first and only positional argument. The callable function must return a string.
- sort_xlist or callable, optional
Either a list of sorted labels or a function that will be called with a list of x-axis labels as the only argument, and must return the same list in a user-specified order.
- include_taxa_missing_rankbool, optional
Whether or not a row should be plotted for taxa that do not have a designated parent at rank.
- group_bystring, optional
The metadata field used to group samples together. Readcounts or abundances will be averaged within each group.
- link: {‘ocx’, ‘ncbi’}, optional
If link is ‘ocx’, clicking a sample will open its classification results in the One Codex app. If link is ‘ncbi’, clicking a taxon will open the NCBI taxonomy browser.
Examples
Plot a bargraph of the top 10 most abundant genera
>>> samples.plot_bargraph(rank='genus', top_n=10)
plot_distance
import onecodex
ocx = onecodex.Api()
project = ocx.Projects.get("d53ad03b010542e3")
samples = ocx.Samples.where(project=project, public=True, limit=10)
# note: return_chart is not needed if using a Jupyter notebook
samples.plot_distance(return_chart=True)
- SampleCollection.plot_distance(rank=Rank.Auto, metric=BetaDiversityMetric.BrayCurtis, title=None, xlabel=None, ylabel=None, tooltip=None, return_chart=False, linkage=Linkage.Average, label=None, width=None, height=None)
Plot beta diversity distance matrix as a heatmap and dendrogram.
Parameters
- rank{‘auto’, ‘kingdom’, ‘phylum’, ‘class’, ‘order’, ‘family’, ‘genus’, ‘species’}, optional
Analysis will be restricted to abundances of taxa at the specified level.
- metric{‘braycurtis’, ‘cityblock’, ‘manhattan’, ‘jaccard’, ‘unifrac’, ‘unweighted_unifrac’, ‘aitchison’}, optional
Function to use when calculating the distance between two samples. Note that ‘cityblock’ and ‘manhattan’ are equivalent metrics.
- linkage{‘average’, ‘single’, ‘complete’, ‘weighted’, ‘centroid’, ‘median’}
The type of linkage to use when clustering axes.
- titlestring, optional
Text label at the top of the plot.
- xlabelstring, optional
Text label along the horizontal axis.
- ylabelstring, optional
Text label along the vertical axis.
- tooltipstring or list, optional
A string or list containing strings representing metadata fields. When a point in the plot is hovered over, the value of the metadata associated with that sample will be displayed in a modal.
- labelstring or callable, optional
A metadata field (or function) used to label each analysis. If passing a function, a dict containing the metadata for each analysis is passed as the first and only positional argument. The callable function must return a string.
Examples
Plot the weighted UniFrac distance between all our samples, using counts at the genus level.
>>> samples.plot_distance(rank='genus', metric='unifrac')
plot_functional_heatmap
- SampleCollection.plot_functional_heatmap(top_n=10, annotation=FunctionalAnnotations.Go, metric=None, sort_x=None, label=None, function_label=FunctionalLabel.Name, haxis=None, return_chart=False, xlabel='Sample', ylabel='Function', title=None, width=None, height=None)
Plot heatmap of functional profile data.
Parameters
- top_nint, optional
Display the top N most abundant or covered functions in the entire cohort of samples.
- annotationFunctionalAnnotations or str, optional
{‘go’, ‘eggnog’, ‘ko’, ‘ec’, ‘pfam’, ‘pathways’} Annotation sub-database used to group gene families by.
- metricFunctionalAnnotationsMetric or str, optional
{‘cpm’, ‘rpk’, ‘abundance’, ‘coverage’} Normalization or value to display. If annotation is one of ‘go’, ‘eggnog’, ‘ko’, ‘ec’ or ‘pfam’, then available metrics include
‘rpk’ (read counts normalized by kilobase of gene length), or ‘cpm’ (relative copy of gene depth, normalized to a million RPK total).
- If pathways are selected for annotation, then available metrics include
‘abundance’ (summed copy numbers of reactions’ constituent enzymes) ‘coverage’ (probabilistic measure of a complete metabolic pathway,
where 1 is high confidence that a complete pathway is covered, and 0 is low confidence)
- sort_xlist or callable, optional
Either a list of sorted labels or a function that will be called with a list of x-axis labels as the only argument, and must return the same list in a user-specified order.
- labelstr or callable, optional
A metadata field (or function) used to label each analysis. If passing a function, a dict containing the metadata for each analysis is passed as the first and only positional argument. The callable function must return a string.
- function_labelstr, optional
{‘name’, ‘id’ } Used to label functions. Defaults to name.
- haxisstring, optional
The metadata field (or tuple containing multiple categorical fields) used to facet samples.
- return_chartbool, optional
When True, return an altair.Chart object instead of displaying the resulting plot in the current notebook.
- xlabel`str, optional
Text label along the horizontal axis.
- ylabel`str, optional
Text label along the vertical axis.
- titlestr, optional
Text label at the top of the plot.
- widthfloat or str or dict, optional
Set altair.Chart.width.
- heightfloat or str or dict, optional
Set altair.Chart.height.
plot_heatmap
import onecodex
ocx = onecodex.Api()
project = ocx.Projects.get("d53ad03b010542e3")
samples = ocx.Samples.where(project=project, public=True, limit=10)
# note: return_chart is not needed if using a Jupyter notebook
samples.plot_heatmap(return_chart=True)
- SampleCollection.plot_heatmap(rank=Rank.Auto, normalize='auto', top_n='auto', threshold='auto', title=None, xlabel=None, ylabel=None, tooltip=None, return_chart=False, linkage=Linkage.Average, haxis=None, metric='euclidean', legend='auto', label=None, sort_x=None, sort_y=None, width=None, height=None, link=Link.Ocx)
Plot heatmap of taxa abundance/count data for several samples.
Parameters
- rank{‘auto’, ‘kingdom’, ‘phylum’, ‘class’, ‘order’, ‘family’, ‘genus’, ‘species’}, optional
Analysis will be restricted to abundances of taxa at the specified level.
- normalize‘auto’ or bool, optional
Convert read counts to relative abundances such that each sample sums to 1.0. Setting ‘auto’ will choose automatically based on the data.
- return_chartbool, optional
When True, return an altair.Chart object instead of displaying the resulting plot in the current notebook.
- haxisstring, optional
The metadata field (or tuple containing multiple categorical fields) used to group samples together. Each group of samples will be clustered independently.
- metric{‘euclidean’, ‘braycurtis’, ‘cityblock’, ‘manhattan’, ‘jaccard’, ‘unifrac’, ‘unweighted_unifrac’, ‘aitchison’}, optional
Function to use when calculating the distance between two samples. Note that ‘cityblock’ and ‘manhattan’ are equivalent metrics.
- linkage{‘average’, ‘single’, ‘complete’, ‘weighted’, ‘centroid’, ‘median’}
The type of linkage to use when clustering axes.
- top_nint, optional
Display the top N most abundant taxa in the entire cohort of samples.
- thresholdfloat
Display only taxa that are more abundant that this threshold in one or more samples.
- titlestring, optional
Text label at the top of the plot.
- xlabelstring, optional
Text label along the horizontal axis.
- ylabelstring, optional
Text label along the vertical axis.
- tooltipstring or list, optional
A string or list containing strings representing metadata fields. When a point in the plot is hovered over, the value of the metadata associated with that sample will be displayed in a modal.
- legend: string, optional
Title for color scale. Defaults to the field used to generate the plot, e.g. readcount_w_children or abundance.
- labelstring or callable, optional
A metadata field (or function) used to label each analysis. If passing a function, a dict containing the metadata for each analysis is passed as the first and only positional argument. The callable function must return a string.
- sort_xlist or callable, optional
Either a list of sorted labels or a function that will be called with a list of x-axis labels as the only argument, and must return the same list in a user-specified order.
- sort_ylist or callable, optional
Either a list of sorted labels or a function that will be called with a list of y-axis labels as the only argument, and must return the same list in a user-specified order.
- link{‘ocx’, ‘ncbi’}, optional
If link is ‘ocx’, clicking a sample will open its classification results in the One Codex app. If link is ‘ncbi’, clicking a taxon will open the NCBI taxonomy browser.
Examples
Plot a heatmap of the relative abundances of the top 10 most abundant families.
>>> samples.plot_heatmap(rank='family', top_n=10)
plot_mds
import onecodex
ocx = onecodex.Api()
project = ocx.Projects.get("d53ad03b010542e3")
samples = ocx.Samples.where(project=project, public=True, limit=10)
# note: return_chart is not needed if using a Jupyter notebook
samples.plot_mds(return_chart=True, color="country")
- SampleCollection.plot_mds(rank=Rank.Auto, metric=BetaDiversityMetric.BrayCurtis, method=OrdinationMethod.Pcoa, title=None, xlabel=None, ylabel=None, color=None, size=None, tooltip=None, return_chart=False, label=None, mark_size=100, width=None, height=None)
Plot beta diversity distance matrix using multidimensional scaling (MDS).
Parameters
- rank{‘auto’, ‘kingdom’, ‘phylum’, ‘class’, ‘order’, ‘family’, ‘genus’, ‘species’}, optional
Analysis will be restricted to abundances of taxa at the specified level.
- metric{‘braycurtis’, ‘cityblock’, ‘manhattan’, ‘jaccard’, ‘unifrac’, ‘unweighted_unifrac’, ‘aitchison’}, optional
Function to use when calculating the distance between two samples. Note that ‘cityblock’ and ‘manhattan’ are equivalent metrics.
- method{‘pcoa’, ‘smacof’}
Algorithm to use for ordination. PCoA uses eigenvalue decomposition and is not well suited to non-euclidean distance functions. SMACOF is an iterative optimization strategy that can be used as an alternative.
- titlestring, optional
Text label at the top of the plot.
- xlabelstring, optional
Text label along the horizontal axis.
- ylabelstring, optional
Text label along the vertical axis.
- sizestring or tuple, optional
A string or a tuple containing strings representing metadata fields. The size of points in the resulting plot will change based on the metadata associated with each sample.
- colorstring or tuple, optional
A string or a tuple containing strings representing metadata fields. The color of points in the resulting plot will change based on the metadata associated with each sample.
- tooltipstring or list, optional
A string or list containing strings representing metadata fields. When a point in the plot is hovered over, the value of the metadata associated with that sample will be displayed in a modal.
- labelstring or callable, optional
A metadata field (or function) used to label each analysis. If passing a function, a dict containing the metadata for each analysis is passed as the first and only positional argument. The callable function must return a string.
Examples
Scatter plot of weighted UniFrac distance between all our samples, using counts at the genus level.
>>> samples.plot_mds(rank='genus', metric='unifrac')
Notes
For `smacof`: The values reported on the axis labels are Pearson’s correlations between the distances between points on each axis alone, and the corresponding distances in the distance matrix calculated using the user-specified metric. These values are related to the effectiveness of the MDS algorithm in placing points on the scatter plot in such a way that they truly represent the calculated distances. They do not reflect how well the distance metric captures similarities between the underlying data (in this case, an OTU table).
plot_metadata
A general plotting tool which can be used to plot boxplots and scatter plots of individual abundances or alpha-diversity metrics.
Alpha Diversity
import onecodex
ocx = onecodex.Api()
project = ocx.Projects.get("d53ad03b010542e3")
samples = ocx.Samples.where(project=project, public=True, limit=20)
# note: return_chart is not needed if using a Jupyter notebook
samples.plot_metadata(return_chart=True, haxis="country")
2D Abundance Scatterplot
import onecodex
ocx = onecodex.Api()
project = ocx.Projects.get("d53ad03b010542e3")
samples = ocx.Samples.where(project=project, public=True, limit=20)
# note: return_chart is not needed if using a Jupyter notebook
samples.plot_metadata(return_chart=True, haxis="Bacteroides", vaxis="Firmicutes")
Boxplot
import onecodex
ocx = onecodex.Api()
project = ocx.Projects.get("d53ad03b010542e3")
samples = ocx.Samples.where(project=project, public=True, limit=20)
# note: return_chart is not needed if using a Jupyter notebook
samples.plot_metadata(return_chart=True, vaxis="Bacteroides", haxis="country")
- SampleCollection.plot_metadata(rank=Rank.Auto, haxis='Label', vaxis=AlphaDiversityMetric.Shannon, title=None, xlabel=None, ylabel=None, return_chart=False, plot_type=PlotType.Auto, label=None, sort_x=None, width: int | Literal['container'] | None = 200, height: int | Literal['container'] | None = 400, facet_by=None, coerce_haxis_dates=True, secondary_haxis=None)
Plot an arbitrary metadata field versus an arbitrary quantity as a boxplot or scatter plot.
Parameters
- rank{‘auto’, ‘kingdom’, ‘phylum’, ‘class’, ‘order’, ‘family’, ‘genus’, ‘species’}, optional
Analysis will be restricted to abundances of taxa at the specified level.
- haxisstring, optional
The metadata field (or tuple containing multiple categorical fields) to be plotted on the horizontal axis.
- vaxisstring, optional
Data to be plotted on the vertical axis. Can be any one of the following:
A metadata field: the name of a metadata field containing numerical data
{‘simpson’, ‘observed_taxa’, ‘shannon’}: an alpha diversity statistic to calculate for each sample
A taxon name: the name of a taxon in the analysis
A taxon ID: the ID of a taxon in the analysis
- titlestring, optional
Text label at the top of the plot.
- xlabelstring, optional
Text label along the horizontal axis.
- ylabelstring, optional
Text label along the vertical axis.
- plot_type{‘auto’, ‘boxplot’, ‘scatter’}
By default, will determine plot type automatically based on the data. Otherwise, specify one of ‘boxplot’ or ‘scatter’ to set the type of plot manually.
- labelstring or callable, optional
A metadata field (or function) used to label each analysis. If passing a function, a dict containing the metadata for each analysis is passed as the first and only positional argument. The callable function must return a string.
- sort_xlist or callable, optional
Either a list of sorted labels or a function that will be called with a list of x-axis labels as the only argument, and must return the same list in a user-specified order.
- widthint or str, optional
Sets altair.Chart.width. If “container”, chart width will respond to the width of the HTML container it is rendered in.
- heightint or str, optional
Sets altair.Chart.height. If “container”, chart height will respond to the height of the HTML container it is rendered in.
- facet_bystring, optional
The metadata field used to facet samples by (i.e. to create a separate subplot for each group of samples).
- coerce_haxis_datesbool, optional
If
True
,haxis
field name(s) containing the word “date” (after splitting on underscores) will be coerced to datetime dtype. For example, the field “date_collected” will be coerced ifcoerce_haxis_dates
isTrue
.- secondary_haxisstr or tuple of str, optional
The secondary metadata field (or tuple containing multiple categorical fields) to be plotted on the horizontal axis.
Examples
Generate a boxplot of the abundance of Bacteroides (genus) of samples grouped by whether the individuals are allergic to dogs, cats, both, or neither.
>>> plot_metadata(haxis=('allergy_dogs', 'allergy_cats'), vaxis='Bacteroides')
plot_pca
import onecodex
ocx = onecodex.Api()
project = ocx.Projects.get("d53ad03b010542e3")
samples = ocx.Samples.where(project=project, public=True, limit=10)
# note: return_chart is not needed if using a Jupyter notebook
samples.plot_pca(return_chart=True, color="country")
- SampleCollection.plot_pca(rank=Rank.Auto, normalize='auto', org_vectors=0, org_vectors_scale=None, title=None, xlabel=None, ylabel=None, color=None, size=None, tooltip=None, return_chart=False, label=None, mark_size=100, width=None, height=None)
Perform principal component analysis and plot first two axes.
Parameters
- rank{‘auto’, ‘kingdom’, ‘phylum’, ‘class’, ‘order’, ‘family’, ‘genus’, ‘species’}, optional
Analysis will be restricted to abundances of taxa at the specified level.
- normalize‘auto’ or bool, optional
Convert read counts to relative abundances such that each sample sums to 1.0. Setting ‘auto’ will choose automatically based on the data.
- org_vectorsint, optional
Plot this many of the top-contributing eigenvectors from the PCA results.
- org_vectors_scalefloat, optional
Multiply the length of the lines representing the eigenvectors by this constant.
- titlestring, optional
Text label at the top of the plot.
- xlabelstring, optional
Text label along the horizontal axis.
- ylabelstring, optional
Text label along the vertical axis.
- sizestring or tuple, optional
A string or a tuple containing strings representing metadata fields. The size of points in the resulting plot will change based on the metadata associated with each sample.
- colorstring or tuple, optional
A string or a tuple containing strings representing metadata fields. The color of points in the resulting plot will change based on the metadata associated with each sample.
- tooltipstring or list, optional
A string or list containing strings representing metadata fields. When a point in the plot is hovered over, the value of the metadata associated with that sample will be displayed in a modal.
- labelstring or callable, optional
A metadata field (or function) used to label each analysis. If passing a function, a dict containing the metadata for each analysis is passed as the first and only positional argument. The callable function must return a string.
- mark_size: int, optional
The size of the points in the scatter plot.
Examples
Perform PCA on relative abundances at the species-level and color the resulting points by ‘geo_loc_name’, a metadata field representing the geographical origin of each sample.
>>> samples.plot_pca(rank='species', normalize=True, color='geo_loc_name')
Change the size of each point in the plot based on the abundance of Bacteroides.
>>> samples.plot_pca(size='Bacteroides')
Display the abundances of Bacteroides, Prevotella, and Bifidobacterium in each sample when hovering over points in the plot.
>>> samples.plot_pca(tooltip=['Bacteroides', 'Prevotella', 'Bifidobacterium'])