Statistics

`aitchison_distance`

SampleCollection.aitchison_distance(rank=Rank.Auto)

Calculate the Aitchison distance between samples.

Aitchison distance is the Euclidean distance between centre logratio-normalized samples (abundances). As this requires log-transforms, we first need to ‘estimate’ zeros in the data; i.e. replace zeros with small, positive values, while maintaining a constant sum to 1.

Parameters

rank{‘auto’, ‘kingdom’, ‘phylum’, ‘class’, ‘order’, ‘family’, ‘genus’, ‘species’}, optional: Analysis will be restricted to abundances of taxa at the specified level.

Returns

skbio.stats.distance.DistanceMatrix, a distance matrix.

`alpha_diversity`

SampleCollection.alpha_diversity(metric=AlphaDiversityMetric.Shannon, rank=Rank.Auto)

Calculate the diversity within a community.

Parameters

metric{‘simpson’, ‘observed_taxa’, ‘shannon’}: The diversity metric to calculate. Note that Shannon diversity is calculated using log base e (natural log).
rank{‘auto’, ‘kingdom’, ‘phylum’, ‘class’, ‘order’, ‘family’, ‘genus’, ‘species’}, optional: Analysis will be restricted to abundances of taxa at the specified level.

Returns

pandas.DataFrame, a distance matrix.

`alpha_diversity_stats`

SampleCollection.alpha_diversity_stats(*, group_by: str | tuple[str, ...] | list[str], paired_by: str | tuple[str, ...] | list[str] | None = None, test: AlphaDiversityStatsTest = AlphaDiversityStatsTest.Auto, metric: AlphaDiversityMetric = AlphaDiversityMetric.Shannon, rank: Rank = Rank.Auto, alpha: float = 0.05) → AlphaDiversityStatsResults

Perform a test for significant differences between groups of alpha diversity values.

The following tests are supported:

Wilcoxon (2 groups, paired data)
Mann-Whitney U (2 groups, unpaired data)
Kruskal-Wallis with optional posthoc Dunn test (>=2 groups, unpaired data)

Parameters

group_bystr or tuple of str or list of str: Metadata variable to group samples by. At least two groups are required. If group_by is a tuple or list, field values are joined with an underscore character (“_”).
paired_bystr or tuple of str or list of str, optional: Metadata variable to pair samples in each group. May only be used with test=”wilcoxon”. If paired_by is a tuple or list, field values are joined with an underscore character (“_”).
test{‘auto’, ‘wilcoxon’, ‘mannwhitneyu’, ‘kruskal’}, optional: Stats test to perform. If ‘auto’, ‘mannwhitneyu’ will be chosen if there are two groups of unpaired data. ‘wilcoxon’ will be chosen if there are two groups and paired_by is specified. ‘kruskal’ will be chosen if there are more than 2 groups.
metric{‘shannon’, ‘simpson’, ‘observed_taxa’}, optional: The alpha diversity metric to calculate. Note that Shannon diversity is calculated using log base e (natural log).
rank{‘auto’, ‘kingdom’, ‘phylum’, ‘class’, ‘order’, ‘family’, ‘genus’, ‘species’}, optional: Analysis will be restricted to abundances of taxa at the specified level.
alphafloat, optional: Threshold to determine statistical significance when test=”kruskal” (e.g. p < alpha). Must be between 0 and 1 (exclusive). If the Kruskal-Wallis p-value is significant and there are more than two groups, a posthoc Dunn test is performed.

Returns

AlphaDiversityStatsResults

A dataclass with these attributes: - test: stats test that was performed - statistic: computed test statistic (e.g. U statistic if test=”mannwhitneyu”) - pvalue: computed p-value - sample_size: number of samples used in the test after filtering - group_by_variable: name of the variable used to group samples by - groups: names of the groups defined by group_by_variable - paired_by_variable: name of the variable used to pair samples by (if the data were

paired)

posthoc: A dataclass with these attributes (if posthoc results were computed): - adjusted_pvalues: pd.DataFrame containing Dunn test adjusted p-values.

p-values are adjusted for false discovery rate using Benjamini-Hochberg. The index and columns are sorted group names.

`beta_diversity`

SampleCollection.beta_diversity(metric=BetaDiversityMetric.BrayCurtis, rank=Rank.Auto)

Calculate the diversity between two communities.

Parameters

metric{‘jaccard’, ‘braycurtis’, ‘cityblock’, ‘manhattan’, ‘aitchison’}: The distance metric to calculate. Note that ‘cityblock’ and ‘manhattan’ are equivalent metrics.
rank{‘auto’, ‘kingdom’, ‘phylum’, ‘class’, ‘order’, ‘family’, ‘genus’, ‘species’}, optional: Analysis will be restricted to abundances of taxa at the specified level.

Returns

skbio.stats.distance.DistanceMatrix, a distance matrix.

`beta_diversity_stats`

SampleCollection.beta_diversity_stats(*, group_by: str | tuple[str, ...] | list[str], metric: BetaDiversityMetric = BetaDiversityMetric.BrayCurtis, rank: Rank = Rank.Auto, alpha: float = 0.05, num_permutations: int = 999) → BetaDiversityStatsResults

Test for significant differences between groups of samples based on their distances.

Beta diversity distances between samples are computed and a PERMANOVA test is performed to assess whether there are significant differences between groups of samples. Posthoc pairwise PERMANOVA tests are performed if the global test is found to be statistically significant and there are more than two groups.

Parameters

group_bystr or tuple of str or list of str: Metadata variable to group samples by. At least two groups are required. If group_by is a tuple or list, field values are joined with an underscore character (“_”).
metric{‘braycurtis’, ‘jaccard’, ‘cityblock’, ‘manhattan’, ‘aitchison’, ‘unweighted_unifrac’, ‘weighted_unifrac’}, optional: The beta diversity distance metric to calculate. Note that ‘cityblock’ and ‘manhattan’ are equivalent metrics.
rank{‘auto’, ‘kingdom’, ‘phylum’, ‘class’, ‘order’, ‘family’, ‘genus’, ‘species’}, optional: Analysis will be restricted to abundances of taxa at the specified level.
alphafloat, optional: Threshold to determine statistical significance (e.g. p < alpha). Must be between 0 and 1 (exclusive). If the p-value is significant and there are more than two groups, posthoc pairwise PERMANOVA tests are performed.
num_permutationsint, optional: Number of permutations to use when computing the p-value.

Returns

BetaDiversityStatsResults

A dataclass with these attributes: - test: stats test that was performed - statistic: PERMANOVA pseudo-F test statistic - pvalue: p-value based on num_permutations - num_permutations: number of permutations used to compute pvalue - sample_size: number of samples used in the test after filtering - group_by_variable: name of the variable used to group samples by - groups: names of the groups defined by group_by_variable - posthoc: A dataclass with these attributes (if posthoc results were computed):

statistics: pd.DataFrame containing pairwise PERMANOVA pseudo-F statistics. The index and columns are sorted group names.

pvalues: pd.DataFrame containing pairwise PERMANOVA unadjusted p-values. The index and columns are sorted group names.

adjusted_pvalues: pd.DataFrame containing pairwise PERMANOVA adjusted p-values. p-values are adjusted for false discovery rate using Benjamini-Hochberg. The index columns are sorted group names.

`unifrac`

SampleCollection.unifrac(weighted=True, rank=Rank.Auto)

Calculate the UniFrac beta diversity metric.

UniFrac takes into account the relatedness of community members. Weighted UniFrac considers abundances, unweighted UniFrac considers presence.

Parameters

weightedbool: Calculate the weighted (True) or unweighted (False) distance metric.
rank{‘auto’, ‘kingdom’, ‘phylum’, ‘class’, ‘order’, ‘family’, ‘genus’, ‘species’}, optional: Analysis will be restricted to abundances of taxa at the specified level.

Returns

skbio.stats.distance.DistanceMatrix, a distance matrix.

Statistics

`aitchison_distance`

Parameters

Returns

`alpha_diversity`

Parameters

Returns

`alpha_diversity_stats`

Parameters

Returns

See Also

`beta_diversity`

Parameters

Returns

`beta_diversity_stats`

Parameters

Returns

See Also

`unifrac`

Parameters

Returns

Statistics

aitchison_distance

Parameters

Returns

alpha_diversity

Parameters

Returns

alpha_diversity_stats

Parameters

Returns

See Also

beta_diversity

Parameters

Returns

beta_diversity_stats

Parameters

Returns

See Also

unifrac

Parameters

Returns

`aitchison_distance`

`alpha_diversity`

`alpha_diversity_stats`

`beta_diversity`

`beta_diversity_stats`

`unifrac`