I am twisting my career towards Data Science and, after some time refreshing my statistics base, I realized that statistics is one of the fields in which people tend to missunderstand more the basis, even teachers and specialists.
For that reason I am now a bit skeptical about some widely used techniques. For a project, I need to contrast two means from two different populations. Tthe standard deviation from both populations is unknown and sample distributions are not strictly normal (they have belled shape but their skewness and kurtosis differ from normal values).
Under this circumstances, Internet suggests me different approaches:
1) To asume the distirbutions are normal (despite they are not) and to apply a welch-test to determine if means differ (welch-test is like t-test but it is applied when it is not known that the two samples share the same sd value). People who suggest it argue that welch test is pretty robust to some degree of non-normality.
2) Applying Central Limit Theorem to both samples and then applying the Welch test to the sample mean distributions.
3) Applying a non parametric test to compare the two means.
4) Transforming both sample distributions into normal distributions and then applying a test. (For some irrational reason and considering my data is not far away from being normally distributed, I don't like this approach very much).
What do you think it is the best approach? I have been googlering about it the whole day, but I have not found a solid response. Maybe the question is a bit silly, but Internet is full of bad answers.
Thank you very much.
For that reason I am now a bit skeptical about some widely used techniques. For a project, I need to contrast two means from two different populations. Tthe standard deviation from both populations is unknown and sample distributions are not strictly normal (they have belled shape but their skewness and kurtosis differ from normal values).
Under this circumstances, Internet suggests me different approaches:
1) To asume the distirbutions are normal (despite they are not) and to apply a welch-test to determine if means differ (welch-test is like t-test but it is applied when it is not known that the two samples share the same sd value). People who suggest it argue that welch test is pretty robust to some degree of non-normality.
2) Applying Central Limit Theorem to both samples and then applying the Welch test to the sample mean distributions.
3) Applying a non parametric test to compare the two means.
4) Transforming both sample distributions into normal distributions and then applying a test. (For some irrational reason and considering my data is not far away from being normally distributed, I don't like this approach very much).
What do you think it is the best approach? I have been googlering about it the whole day, but I have not found a solid response. Maybe the question is a bit silly, but Internet is full of bad answers.
Thank you very much.