Statistics

See Also

Statistics functions are implemented as part of SampleCollection. For more information, see SampleCollection.

aitchison_distance

SampleCollection.aitchison_distance(rank=Rank.Auto)

Calculate the Aitchison distance between samples.

Aitchison distance is the Euclidean distance between centre logratio-normalized samples (abundances). As this requires log-transforms, we first need to ‘estimate’ zeros in the data; i.e. replace zeros with small, positive values, while maintaining a constant sum to 1.

Parameters

rank{‘auto’, ‘kingdom’, ‘phylum’, ‘class’, ‘order’, ‘family’, ‘genus’, ‘species’}, optional

Analysis will be restricted to abundances of taxa at the specified level.

Returns

skbio.stats.distance.DistanceMatrix, a distance matrix.

alpha_diversity

SampleCollection.alpha_diversity(metric=AlphaDiversityMetric.Shannon, rank=Rank.Auto)

Calculate the diversity within a community.

Parameters

metric{‘simpson’, ‘observed_taxa’, ‘shannon’}

The diversity metric to calculate.

rank{‘auto’, ‘kingdom’, ‘phylum’, ‘class’, ‘order’, ‘family’, ‘genus’, ‘species’}, optional

Analysis will be restricted to abundances of taxa at the specified level.

Returns

pandas.DataFrame, a distance matrix.

alpha_diversity_stats

SampleCollection.alpha_diversity_stats(*, group_by: str | tuple[str, ...] | list[str], paired_by: str | tuple[str, ...] | list[str] | None = None, test: AlphaDiversityStatsTest = AlphaDiversityStatsTest.Auto, metric: AlphaDiversityMetric = AlphaDiversityMetric.Shannon, rank: Rank = Rank.Auto, alpha: float = 0.05) AlphaDiversityStatsResults

Perform a test for significant differences between groups of alpha diversity values.

The following tests are supported: - Wilcoxon (2 groups, paired data) - Mann-Whitney U (2 groups, unpaired data) - Kruskal-Wallis with optional posthoc Dunn test (>=2 groups, unpaired data)

Parameters

group_bystr or tuple of str or list of str

Metadata variable to group samples by. At least two groups are required. If group_by is a tuple or list, field values are joined with an underscore character (“_”).

paired_bystr or tuple of str or list of str, optional

Metadata variable to pair samples in each group. May only be used with test=”wilcoxon”. If paired_by is a tuple or list, field values are joined with an underscore character (“_”).

test{‘auto’, ‘wilcoxon’, ‘mannwhitneyu’, ‘kruskal’}, optional

Stats test to perform. If ‘auto’, ‘mannwhitneyu’ will be chosen if there are two groups of unpaired data. ‘wilcoxon’ will be chosen if there are two groups and paired_by is specified. ‘kruskal’ will be chosen if there are more than 2 groups.

metric{‘shannon’, ‘simpson’, ‘observed_taxa’}, optional

The alpha diversity metric to calculate.

rank{‘auto’, ‘kingdom’, ‘phylum’, ‘class’, ‘order’, ‘family’, ‘genus’, ‘species’}, optional

Analysis will be restricted to abundances of taxa at the specified level.

alphafloat, optional

Threshold to determine statistical significance when test=”kruskal” (e.g. p < alpha). Must be between 0 and 1 (exclusive). If the Kruskal-Wallis p-value is significant and there are more than two groups, a posthoc Dunn test is performed.

Returns

AlphaDiversityStatsResults

A dataclass with these attributes: - test: the stats test that was performed - statistic: the computed test statistic (e.g. U statistic if test=”mannwhitneyu”) - pvalue: the computed p-value - group_by_variable: the name of the variable used to group samples by - groups: the names of the groups defined by group_by_variable - paired_by_variable: the name of the variable used to pair samples by (if the data

were paired)

  • posthoc_df: pd.DataFrame containing Dunn test p-values (if a Dunn test was performed). p-values are adjusted for false discovery rate using Benjamini-Hochberg. The index and columns are sorted group names.

See Also

scipy.stats.wilcoxon scipy.stats.mannwhitneyu scipy.stats.kruskal scikit_posthocs.posthoc_dunn

beta_diversity

SampleCollection.beta_diversity(metric=BetaDiversityMetric.BrayCurtis, rank=Rank.Auto)

Calculate the diversity between two communities.

Parameters

metric{‘jaccard’, ‘braycurtis’, ‘cityblock’, ‘manhattan’, ‘aitchison’}

The distance metric to calculate. Note that ‘cityblock’ and ‘manhattan’ are equivalent metrics.

rank{‘auto’, ‘kingdom’, ‘phylum’, ‘class’, ‘order’, ‘family’, ‘genus’, ‘species’}, optional

Analysis will be restricted to abundances of taxa at the specified level.

Returns

skbio.stats.distance.DistanceMatrix, a distance matrix.

unifrac

SampleCollection.unifrac(weighted=True, rank=Rank.Auto)

Calculate the UniFrac beta diversity metric.

UniFrac takes into account the relatedness of community members. Weighted UniFrac considers abundances, unweighted UniFrac considers presence.

Parameters

weightedbool

Calculate the weighted (True) or unweighted (False) distance metric.

rank{‘auto’, ‘kingdom’, ‘phylum’, ‘class’, ‘order’, ‘family’, ‘genus’, ‘species’}, optional

Analysis will be restricted to abundances of taxa at the specified level.

Returns

skbio.stats.distance.DistanceMatrix, a distance matrix.