SampleCollection

Data export, analysis and visualization functions are all contained within the SampleCollection class.

A SampleCollection is returned whenever multiple Samples are returned via the One Codex API using a model.

Usage

SampleCollection contains useful tools for data export, analysis, visualization and statistics. See the following sections for more information:

import onecodex

ocx = onecodex.Api()

project = ocx.Project.get("d53ad03b010542e3")
samples = ocx.Samples.where(project=project)

type(samples) # SampleCollection

A SampleCollection can also be created manually from a list of samples:

import onecodex.models.collection.SampleCollection

sample_list = [
    ocx.Samples.get("cee3b512605a43c6"),
    ocx.Samples.get("01f703ac505e4a30")
]

samples = SampleCollection(sample_list)

# convert classification results to a Pandas DataFrame
samples.to_df()

filter

SampleCollection.filter(filter_func)

Return a new SampleCollection containing only samples meeting the filter criteria.

Will pass any kwargs (e.g., metric or skip_missing) used when instantiating the current class on to the new SampleCollection that is returned.

Parameters

filter_funccallable

A function that will be evaluated on every object in the collection. The function must return a bool. If True, the object will be kept. If False, it will be removed from the SampleCollection that is returned.

Returns

onecodex.models.SampleCollection containing only objects filter_func returned True on.

Examples

Generate a new collection of Samples that have a specific filename extension:

>>> new_collection = samples.filter(lambda s: s.filename.endswith('.fastq.gz'))

to_otu

SampleCollection.to_otu(biom_id=None, include_ranks=('superkingdom', 'kingdom', 'phylum', 'class', 'order', 'family', 'genus', 'species'))

Generate a BIOM-formatted data structure.

Parameters

biom_idstring, optional

Optionally specify an id field for the generated v1 BIOM file.

include_rankslist

A list of ranks to include in the taxonomy/OTU table. Uses onecodex.models.collection.CANONICAL_RANKS by default.

Returns

otu_tableOrderedDict

A BIOM OTU table, returned as a Python OrderedDict (can be dumped to JSON)

to_df

SampleCollection.to_df(analysis_type=AnalysisType.Classification, **kwargs)

Transform Analyses of samples in a SampleCollection into tabular format.

Parameters

analysis_type{‘classification’, ‘functional’}, optional

The analysis_type to aggregate, corresponding to AnalysisJob.analysis_type

kwargsdict, optional

Keyword arguments specific to the analysis_type; see each individual function definition

_to_classification_df

SampleCollection._to_classification_df(rank=Rank.Auto, top_n=None, threshold=None, remove_zeros=True, normalize='auto', table_format='wide', include_taxa_missing_rank=False, fill_missing=True, filler=0)

Generate a ClassificationsDataFrame, performing any specified transformations.

Takes the ClassificationsDataFrame associated with these samples, or SampleCollection, does some filtering, and returns a ClassificationsDataFrame copy.

Parameters

rank{‘auto’, ‘superkingdom’, ‘kingdom’, ‘phylum’, ‘class’, ‘order’, ‘family’, ‘genus’, ‘species’}, optional

Analysis will be restricted to abundances of taxa at the specified level.

top_ninteger, optional

Return only the top N most abundant taxa.

thresholdfloat, optional

Return only taxa more abundant than this threshold in one or more samples.

remove_zerosbool, optional

Do not return taxa that have zero abundance in every sample.

normalize{‘auto’, True, False}

Convert read counts to relative abundances (each sample sums to 1.0).

table_format{‘long’, ‘wide’}

If wide, rows are classifications, cols are taxa, elements are counts. If long, rows are observations with three cols each: classification_id, tax_id, and count.

include_taxa_missing_rankbool, optional

Whether or not to include taxa that do not have a designated parent at rank (will be grouped into a “No <rank>” column).

fill_missingbool, optional

Fill np.nan values

fillerfloat, optional

Value with which to fill np.nans

Returns

ClassificationsDataFrame

_to_functional_df

SampleCollection._to_functional_df(annotation=FunctionalAnnotations.Pathways, taxa_stratified=True, metric=FunctionalAnnotationsMetric.Coverage, fill_missing=False, filler=0)

Generate a FunctionalDataFrame associated with functional analysis results.

Parameters

annotation{onecodex.lib.enum.FunctionalAnnotations, str}, optional

Annotation data to return, defaults to pathways

taxa_stratifiedbool, optional

Return taxonomically stratified data, defaults to True

metric{onecodex.lib.enum.FunctionalAnnotationsMetric, str}, optional

Metric values to return {‘coverage’, ‘abundance’} for annotation==FunctionalAnnotations.Pathways or {‘rpk’, ‘cpm’} for other annotations, defaults to coverage

fill_missingbool, optional

Fill np.nan values

fillerfloat, optional

Value with which to fill np.nans