Non-Parametric Tutorial

NOTE This notebook is not meant to teach statistics, but only demo how to run the py50 functions. There are plenty of resources available online. I particularly found introductory tutorials by DATAtab helpful.

The following page will show examples of non-parametric tests using the Stats() and Plots() modules of py50. There are many plot features available for py50, but they will not all be demoed here. Instead, to see all the plots available, take a look at the 006_statistics_quickstart.

The tests here will follow a path similar to the guidelines by Pingouin. However, different post-hoc tests can be used based on the given dataset. Use your best judgement for the analysis.

py50 has converted the tests for Wilcoxon and Mann-Whitney for pairwise analysis, which will be demoed below.

[1]:

from py50 import Stats, Plots
import seaborn as sns
from matplotlib import pyplot as plt

Pairwise Wilcoxon

The pairwise Wilcoxon test is a non-parametric test to determine statistically significant differences between paired samples. The test evaluates whether the median of the differences between paired observations is significantly different from zero.

Performing a pairwise Wilcoxon test requires the groups to be the same length. This is because each group observation is paired with a corresponding observation in the other group. Thus, a one-to-one pairing will need to be established.

For this example, the Wilcoxon test will be performed using the Iris dataset.

[2]:

iris = sns.load_dataset("iris")

# Initialize class
iris_stats = Stats(iris)
iris_plot = Plots(iris)

iris_stats.show()

[2]:

	sepal_length	sepal_width	petal_length	petal_width	species
0	5.1	3.5	1.4	0.2	setosa
1	4.9	3.0	1.4	0.2	setosa
2	4.7	3.2	1.3	0.2	setosa
3	4.6	3.1	1.5	0.2	setosa
4	5.0	3.6	1.4	0.2	setosa

Performing the pairwise Wilcoxon test works similarly for all tests and use the “get_X” script, where X is the name of the test.

[3]:

iris_stats.get_wilcoxon(group_col='species', value_col='sepal_width')

[3]:

	A	B	W-val	p-val	significance	RBC	CLES
0	setosa	versicolor	34.0	3.190027e-08	****	0.937095	0.9248
1	setosa	virginica	99.5	8.767532e-07	****	0.823582	0.8344
2	versicolor	virginica	247.0	6.368052e-03	**	-0.477801	0.3364

Generating annotated plots is similar to those detailed in 006_statistics_quickstart tutorial and 007_parametric_tutorial.

As the plots will essentially re-calculate results using the given tests, the figures can return a dataframe to compare with the test results above.

[4]:

title = "Swarm Plot with Pairwise WilCoxon Test"

iris_plot.swarmplot(test='wilcoxon', group_col='species', value_col='sepal_width', title=title, fontsize=30,
                    return_df=True)

[4]:

(            A           B  W-val         p-val significance       RBC    CLES
 0      setosa  versicolor   34.0  3.190027e-08         ****  0.937095  0.9248
 1      setosa   virginica   99.5  8.767532e-07         ****  0.823582  0.8344
 2  versicolor   virginica  247.0  6.368052e-03           ** -0.477801  0.3364,
 <statannotations.Annotator.Annotator at 0x131c8a140>)

../_images/tutorials_008_non-parametric_example_7_1.png

Pairwise Mann-Whitney U

This is a non-parametric test used to determine if statistically significant differences occur between two independent groups. Unlike the pairise Wilcoxon tests (above), the measures for the Mann-Whitney U tests are not paired and there is less restrictions on the input data format.

Both the pairwise Mann-Whitney U and pairwise Wilcoxon tests and perform tests for both groups and subgroups within each group. The usage with subgroups will be demoed with the Seaborn tips dataset.

[5]:

tips = sns.load_dataset("tips")

# Initialize class
tips_stats = Stats(tips)
tips_plot = Plots(tips)

tips_stats.show(100)

[5]:

	total_bill	tip	sex	smoker	day	time	size
0	16.99	1.01	Female	No	Sun	Dinner	2
1	10.34	1.66	Male	No	Sun	Dinner	3
2	21.01	3.50	Male	No	Sun	Dinner	3
3	23.68	3.31	Male	No	Sun	Dinner	2
4	24.59	3.61	Female	No	Sun	Dinner	4
...	...	...	...	...	...	...	...
95	40.17	4.73	Male	Yes	Fri	Dinner	4
96	27.28	4.00	Male	Yes	Fri	Dinner	2
97	12.03	1.50	Male	Yes	Fri	Dinner	2
98	21.01	3.00	Male	Yes	Fri	Dinner	2
99	12.46	1.50	Male	No	Fri	Dinner	2

100 rows × 7 columns

Pairwise Mann-Whitney U tests are again called using the “get_X” format, where X is the name of a specific test. Additional parameters include the “subgroup_col”.

[6]:

tips_stats.get_mannu(value_col='total_bill', group_col='time', subgroup_col='sex')

[6]:

	A	B	U-val	p-val	significance	RBC	CLES
0	(Dinner, Female)	(Dinner, Male)	2849.5	0.225232	n.s.	-0.116160	0.441920
1	(Dinner, Female)	(Lunch, Male)	943.0	0.446076	n.s.	0.099068	0.549534
2	(Dinner, Female)	(Lunch, Female)	1166.5	0.026695	*	0.281868	0.640934
3	(Dinner, Male)	(Lunch, Male)	2483.5	0.059743	n.s.	0.213832	0.606916
4	(Dinner, Male)	(Lunch, Female)	2994.5	0.000614	***	0.379954	0.689977
5	(Lunch, Male)	(Lunch, Female)	679.5	0.212925	n.s.	0.176623	0.588312

The inclusion of the subgroup will have an impact on the final plot. In this case, the subgroups will be color coded and a figure legend will be included. The figure legend will be drawn and placed in a default location. If this interferes with the plot, it can be moved using plt.legend().

For readability, groups that have no significance will not be annotated.

[7]:

title = "Pairwise Mann-Whitney U Test"

pairs = [
    (('Dinner', 'Female'), ('Lunch', 'Female')),
    (('Dinner', 'Male'), ('Lunch', 'Female')),
]

tips_plot.barplot(test='mannu', value_col='total_bill', group_col='time', subgroup_col='sex', pairs=pairs,
                  title=title, fontsize=30)
plt.legend(loc='upper left', bbox_to_anchor=(1.03, 1))

[7]:

<matplotlib.legend.Legend at 0x131c83a00>

../_images/tutorials_008_non-parametric_example_13_1.png

As a reminder, the plots in py50 come with a lot of different methods for modification. In the figure above, Dinner is listed first. That is weird. I usually eat Lunch before Dinner? We can quickly modify this by including the group_order parameter.

Changing the group orders will also modify the layout order for the significant figures. The significant figures can also be modified if users dislike the asterisk and prefer a more verbose description. This is done using the pvalue_order. The order of the pvalue_order parameter will follow the order of appearance in the statistics DataFrame.

[8]:

title = "Pairwise Mann-Whitney U Test with Group Order and Verbose Pvalue"

meal_order = ['Lunch', 'Dinner']
pvalue_order = ['p ≤ 0.05', 'p ≤ 0.001']

tips_plot.barplot(test='mannu', value_col='total_bill', group_col='time', subgroup_col='sex', pairs=pairs,
                  group_order=meal_order, pvalue_order=pvalue_order, title=title)
plt.legend(loc='upper right', bbox_to_anchor=(1.22, 1))

[8]:

<matplotlib.legend.Legend at 0x132a2f340>

../_images/tutorials_008_non-parametric_example_15_1.png

Friedman Test

The Friedman test is a non-parametric test for repeated measures. The dataset must be in long format, i.e. each row represents a measurement. This requires both the group_col and the subgroup_col.

For ease of use, the tips dataset will continue to be used.

[9]:

tips_stats.get_friedman(value_col='total_bill', group_col='sex', subgroup_col='day')

[9]:

	Source	W	ddof1	Q	p_unc	significance
Friedman	sex	1.0	1	4.0	0.0455	*

The Friedman test suggest there is a difference between the sex column. Post-hoc tests can be performed using the get_pairwise_tests(). As the Friedman Test is non-parametric, the parametric argument must be set to False. The results can be plotted as shown in previous examples.

[10]:

tips_stats.get_pairwise_tests(value_col='total_bill', group_col='sex', parametric=False)

[10]:

	Contrast	A	B	Paired	Parametric	U_val	alternative	p_unc	hedges	significance
0	sex	Female	Male	False	False	5613.5	two-sided	0.02135	-0.303494	*

[11]:

title = "Non-Parametric Pairwise Tests"
tips_plot.boxplot(value_col='total_bill', group_col='sex', test='pairwise-nonparametric', palette='Greens',
                  title=title, fontsize=30, orient='h')

[11]:

<statannotations.Annotator.Annotator at 0x132b58550>

../_images/tutorials_008_non-parametric_example_20_1.png

Kruskal-Wallis H Test

The Kruskal-Wallis test is a non-parametric method used to determine significant differences between two or more independent groups. Post-hoc tests can be performed to obtain pairwise differences. Again, the example here uses the pairwise-nonparametric as an example.

For ease, the tip dataset will continue to be used.

[12]:

tips_stats.get_kruskal(value_col='tip', group_col='day')

[12]:

	Source	ddof1	H	p_unc	significance
Kruskal	day	3	8.565588	0.035661	*

[13]:

tips_stats.get_pairwise_tests(value_col='tip', group_col='day', parametric=False)

[13]:

	Contrast	A	B	Paired	Parametric	U_val	alternative	p_unc	hedges	significance
0	day	Thur	Fri	False	False	561.0	two-sided	0.758381	0.030468	n.s.
1	day	Thur	Sat	False	False	2486.5	two-sided	0.417663	-0.148857	n.s.
2	day	Thur	Sun	False	False	1755.5	two-sided	0.010006	-0.388762	*
3	day	Fri	Sat	False	False	808.0	two-sided	0.881937	-0.166274	n.s.
4	day	Fri	Sun	False	False	533.5	two-sided	0.079741	-0.431509	n.s.
5	day	Sat	Sun	False	False	2652.0	two-sided	0.029497	-0.178644	*

For readability, groups with no significance were not annotated.

[14]:

# Plot annotations
title = "Non-Parametric Pairwise Tests with New Group Order"
pairs = [('Sun', 'Sat'), ('Thur', 'Sun')]
group_order = ['Sun', 'Thur', 'Fri', 'Sat']
palette = ['red', 'orange', 'green', 'blue']

tips_plot.boxplot(value_col='tip', group_col='day', test='pairwise-nonparametric', group_order=group_order,
                  pairs=pairs, palette=palette, title=title)

[14]:

<statannotations.Annotator.Annotator at 0x132a8bfa0>

../_images/tutorials_008_non-parametric_example_25_1.png