Types of T-Test
The T-Test has two major flavors: paired
and unpaired.
Sometimes your datasets are paired (also called dependent
). For example, you may be measuring the performance of the same
set of students on an exam before and after teaching them the course
content. To use a paired T-Test, you have to be able to measure an
item twice, usually before and after some treatment. This is the
ideal condition: by having before and after measurements of a
treatment, you control for other potential differences in the items
you mentioned, like performance between students.
Other times, you are measuring the difference between two sets of
measured data, but the individual measurements in each dataset are
unpaired (sometimes called independent ). This was the
case in our tests: different people contributed to each campaign,
and different people live in town 1 and 2. With unpaired datasets,
we lose the ability to control for differences between individuals,
so we'll likely need more data to achieve statistical significance.
Unpaired datasets come in all flavors. Depending on whether the
sizes of the sets are equal or unequal, and depending on whether the
variances of both sets are equal, you will run different versionf of
an unpaired T-Test. In our case, we made no assumptions about the
sizes of our datasets, and no assumptions on their variances,
either. So we went with an unpaired, unequal size, unequal variance
test. That's Welch's T-Test.
As with all life decisions, if you want more details, check out the
Wikipedia article on
T-Tests. There
are implementations of paired
T-Tests
and unpaired
ones
in scipy. The unequal variance case is not available in scipy,
which is why we included welchsttest.py. Enjoy it!
T-Test Assumptions we Broke:(
We've managed to sound like smartypantses that do all the right
things until this moment, but now we have to admit we broke a few
rules. The math behind T-Tests makes assumptions about the datasets
that makes it easier to achieve statistical significance if those
assumptions are true. The big assumption is that the data we used
came from a normal distribution.
The first thing we should have done is check whether or not our data
is actually normal. Luckily, the fine scipy folks have implemented
the Shapiro-Wilk test
test for normality. This test calculates a p-value, that, if low
enough (usually < 0.05), tells us there is a low chance the
distribution is normal.