SampleCollection
Data export, analysis and visualization functions are all contained within the
SampleCollection
class.
A SampleCollection
is returned whenever multiple Samples are returned via
the One Codex API using a model.
Usage
SampleCollection
contains useful tools for data export, analysis,
visualization and statistics. See the following sections for more information:
import onecodex
ocx = onecodex.Api()
project = ocx.Project.get("d53ad03b010542e3")
samples = ocx.Samples.where(project=project)
type(samples) # SampleCollection
A SampleCollection
can also be created manually from a list of samples:
import onecodex.models.collection.SampleCollection
sample_list = [
ocx.Samples.get("cee3b512605a43c6"),
ocx.Samples.get("01f703ac505e4a30")
]
samples = SampleCollection(sample_list)
# convert classification results to a Pandas DataFrame
samples.to_df()
filter
- SampleCollection.filter(filter_func)
Return a new SampleCollection containing only samples meeting the filter criteria.
Will pass any kwargs (e.g., metric or skip_missing) used when instantiating the current class on to the new SampleCollection that is returned.
Parameters
- filter_funccallable
A function that will be evaluated on every object in the collection. The function must return a bool. If True, the object will be kept. If False, it will be removed from the SampleCollection that is returned.
Returns
onecodex.models.SampleCollection containing only objects filter_func returned True on.
Examples
Generate a new collection of Samples that have a specific filename extension:
>>> new_collection = samples.filter(lambda s: s.filename.endswith('.fastq.gz'))
to_otu
- SampleCollection.to_otu(biom_id=None, include_ranks=('superkingdom', 'kingdom', 'phylum', 'class', 'order', 'family', 'genus', 'species'))
Generate a BIOM-formatted data structure.
Parameters
- biom_idstring, optional
Optionally specify an id field for the generated v1 BIOM file.
- include_rankslist
A list of ranks to include in the taxonomy/OTU table. Uses onecodex.models.collection.CANONICAL_RANKS by default.
Returns
- otu_tableOrderedDict
A BIOM OTU table, returned as a Python OrderedDict (can be dumped to JSON)
to_df
- SampleCollection.to_df(analysis_type=AnalysisType.Classification, **kwargs)
Transform Analyses of samples in a SampleCollection into tabular format.
Parameters
- analysis_type{‘classification’, ‘functional’}, optional
The analysis_type to aggregate, corresponding to AnalysisJob.analysis_type
- kwargsdict, optional
Keyword arguments specific to the analysis_type; see each individual function definition
_to_classification_df
- SampleCollection._to_classification_df(rank=Rank.Auto, top_n=None, threshold=None, remove_zeros=True, normalize='auto', table_format='wide', include_taxa_missing_rank=False, fill_missing=True, filler=0)
Generate a ClassificationsDataFrame, performing any specified transformations.
Takes the ClassificationsDataFrame associated with these samples, or SampleCollection, does some filtering, and returns a ClassificationsDataFrame copy.
Parameters
- rank{‘auto’, ‘superkingdom’, ‘kingdom’, ‘phylum’, ‘class’, ‘order’, ‘family’, ‘genus’, ‘species’}, optional
Analysis will be restricted to abundances of taxa at the specified level.
- top_ninteger, optional
Return only the top N most abundant taxa.
- thresholdfloat, optional
Return only taxa more abundant than this threshold in one or more samples.
- remove_zerosbool, optional
Do not return taxa that have zero abundance in every sample.
- normalize{‘auto’, True, False}
Convert read counts to relative abundances (each sample sums to 1.0).
- table_format{‘long’, ‘wide’}
If wide, rows are classifications, cols are taxa, elements are counts. If long, rows are observations with three cols each: classification_id, tax_id, and count.
- include_taxa_missing_rankbool, optional
Whether or not to include taxa that do not have a designated parent at rank (will be grouped into a “No <rank>” column).
- fill_missingbool, optional
Fill np.nan values
- fillerfloat, optional
Value with which to fill np.nans
Returns
ClassificationsDataFrame
_to_functional_df
- SampleCollection._to_functional_df(annotation=FunctionalAnnotations.Pathways, taxa_stratified=True, metric=FunctionalAnnotationsMetric.Coverage, fill_missing=False, filler=0)
Generate a FunctionalDataFrame associated with functional analysis results.
Parameters
- annotation{onecodex.lib.enum.FunctionalAnnotations, str}, optional
Annotation data to return, defaults to pathways
- taxa_stratifiedbool, optional
Return taxonomically stratified data, defaults to True
- metric{onecodex.lib.enum.FunctionalAnnotationsMetric, str}, optional
Metric values to return {‘coverage’, ‘abundance’} for annotation==FunctionalAnnotations.Pathways or {‘rpk’, ‘cpm’} for other annotations, defaults to coverage
- fill_missingbool, optional
Fill np.nan values
- fillerfloat, optional
Value with which to fill np.nans