Statistics
See Also
Statistics functions are implemented as part of SampleCollection
. For
more information, see SampleCollection.
aitchison_distance
- SampleCollection.aitchison_distance(rank=Rank.Auto)
Calculate the Aitchison distance between samples.
Aitchison distance is the Euclidean distance between centre logratio-normalized samples (abundances). As this requires log-transforms, we first need to ‘estimate’ zeros in the data; i.e. replace zeros with small, positive values, while maintaining a constant sum to 1.
Parameters
- rank{‘auto’, ‘kingdom’, ‘phylum’, ‘class’, ‘order’, ‘family’, ‘genus’, ‘species’}, optional
Analysis will be restricted to abundances of taxa at the specified level.
Returns
skbio.stats.distance.DistanceMatrix, a distance matrix.
alpha_diversity
- SampleCollection.alpha_diversity(metric=AlphaDiversityMetric.Shannon, rank=Rank.Auto)
Calculate the diversity within a community.
Parameters
- metric{‘simpson’, ‘observed_taxa’, ‘shannon’}
The diversity metric to calculate.
- rank{‘auto’, ‘kingdom’, ‘phylum’, ‘class’, ‘order’, ‘family’, ‘genus’, ‘species’}, optional
Analysis will be restricted to abundances of taxa at the specified level.
Returns
pandas.DataFrame, a distance matrix.
alpha_diversity_stats
- SampleCollection.alpha_diversity_stats(*, group_by: str | tuple[str, ...] | list[str], paired_by: str | tuple[str, ...] | list[str] | None = None, test: AlphaDiversityStatsTest = AlphaDiversityStatsTest.Auto, metric: AlphaDiversityMetric = AlphaDiversityMetric.Shannon, rank: Rank = Rank.Auto, alpha: float = 0.05) AlphaDiversityStatsResults
Perform a test for significant differences between groups of alpha diversity values.
The following tests are supported: - Wilcoxon (2 groups, paired data) - Mann-Whitney U (2 groups, unpaired data) - Kruskal-Wallis with optional posthoc Dunn test (>=2 groups, unpaired data)
Parameters
- group_bystr or tuple of str or list of str
Metadata variable to group samples by. At least two groups are required. If group_by is a tuple or list, field values are joined with an underscore character (“_”).
- paired_bystr or tuple of str or list of str, optional
Metadata variable to pair samples in each group. May only be used with test=”wilcoxon”. If paired_by is a tuple or list, field values are joined with an underscore character (“_”).
- test{‘auto’, ‘wilcoxon’, ‘mannwhitneyu’, ‘kruskal’}, optional
Stats test to perform. If ‘auto’, ‘mannwhitneyu’ will be chosen if there are two groups of unpaired data. ‘wilcoxon’ will be chosen if there are two groups and paired_by is specified. ‘kruskal’ will be chosen if there are more than 2 groups.
- metric{‘shannon’, ‘simpson’, ‘observed_taxa’}, optional
The alpha diversity metric to calculate.
- rank{‘auto’, ‘kingdom’, ‘phylum’, ‘class’, ‘order’, ‘family’, ‘genus’, ‘species’}, optional
Analysis will be restricted to abundances of taxa at the specified level.
- alphafloat, optional
Threshold to determine statistical significance when test=”kruskal” (e.g. p < alpha). Must be between 0 and 1 (exclusive). If the Kruskal-Wallis p-value is significant and there are more than two groups, a posthoc Dunn test is performed.
Returns
- AlphaDiversityStatsResults
A dataclass with these attributes: - test: the stats test that was performed - statistic: the computed test statistic (e.g. U statistic if test=”mannwhitneyu”) - pvalue: the computed p-value - group_by_variable: the name of the variable used to group samples by - groups: the names of the groups defined by group_by_variable - paired_by_variable: the name of the variable used to pair samples by (if the data
were paired)
posthoc_df: pd.DataFrame containing Dunn test p-values (if a Dunn test was performed). p-values are adjusted for false discovery rate using Benjamini-Hochberg. The index and columns are sorted group names.
See Also
scipy.stats.wilcoxon scipy.stats.mannwhitneyu scipy.stats.kruskal scikit_posthocs.posthoc_dunn
beta_diversity
- SampleCollection.beta_diversity(metric=BetaDiversityMetric.BrayCurtis, rank=Rank.Auto)
Calculate the diversity between two communities.
Parameters
- metric{‘jaccard’, ‘braycurtis’, ‘cityblock’, ‘manhattan’, ‘aitchison’}
The distance metric to calculate. Note that ‘cityblock’ and ‘manhattan’ are equivalent metrics.
- rank{‘auto’, ‘kingdom’, ‘phylum’, ‘class’, ‘order’, ‘family’, ‘genus’, ‘species’}, optional
Analysis will be restricted to abundances of taxa at the specified level.
Returns
skbio.stats.distance.DistanceMatrix, a distance matrix.
unifrac
- SampleCollection.unifrac(weighted=True, rank=Rank.Auto)
Calculate the UniFrac beta diversity metric.
UniFrac takes into account the relatedness of community members. Weighted UniFrac considers abundances, unweighted UniFrac considers presence.
Parameters
- weightedbool
Calculate the weighted (True) or unweighted (False) distance metric.
- rank{‘auto’, ‘kingdom’, ‘phylum’, ‘class’, ‘order’, ‘family’, ‘genus’, ‘species’}, optional
Analysis will be restricted to abundances of taxa at the specified level.
Returns
skbio.stats.distance.DistanceMatrix, a distance matrix.