为了账号安全,请及时绑定邮箱和手机立即绑定

ANOVA in python with 2018 FIFA data

标签:
大数据

Data Science Day 21:

Last time we showed an example of using the independent T-test to compare the Age mean value between players in Real Madrid and Barcelona. What statistical method should we use if we want to compare the age mean among the players in Barcelona, Real Madrid, and Juventus?

webp

image

kappilrinesh / Pixabay

webp

image


RonnyK / Pixabay

webp

image


RonnyK / Pixabay[/caption]

Answer:

We will use ANOVA( analysis of variance) Test, a case of GLM(Generalized Linear Model), for comparing the means between more than 2 groups.

Null Hypothesis: Mean(A) = Mean(B) = Mean(C)

ANOVA Assumptions:

  • Normality of the dependent variable

  • Homogeneity of Variance

  • Independent of observations

Example: Kaggle FIFA 2018 dataset

Null Hypothesis: There is NO significance in the mean of players' age among Real Madrid, Barcelona, and Juventus.

H0:  Age.mean(Real Madrid) = Age.mean(Barcelona) = Age.mean(Juventus)

  1. Dataset

    We choose the variable Age and Club (Real Madrid, Barcelona, and Juventus).

    webp

    image

<pre class="EnlighterJSRAW" data-enlighter-language="python">data2=data1.loc[data1["club"].isin(["Real Madrid CF", "FC Barcelona","Juventus"])]</pre>

2.Histogram Plot

webp

image

plt.hist(data3.age, bins="auto", color="c" ,edgecolor="k",alpha=0.5)
plt.hist(data4.age, bins="auto", color="r",edgecolor="k", alpha=0.5)
plt.hist(data5.age, bins="auto", color="y",edgecolor="k", alpha=0.5)
plt.xlabel('Age')
plt.ylabel('Frequency')
plt.title('Age Dist in Barcelona vs MFC vs Juventus')

plt.show()

3. KDE Density Plot

webp

image

#kdedf=pd.DataFrame({"mfc": data3.age, "barcelona":data4.age,                "juventus": data5.age ,})
ax=df.plot.kde()
plt.title("Density Plot for Players' Age in Barcelona vs MFC vs Juventus")
plt.show()

4. ANOVA Test

stats.f_oneway(data3.age, data4.age, data5.age)
F_onewayResult(statistic=4.8827728579356524, pvalue=0.010152460067260918)

Outcome:

F-statistics 4.88 and P-value= 0.01 which is indicating there is an overall significance of the players' mean age among MFC, Barcelona, and Juventus. Both Histogram and Density plots supported the outcome.  However, we don't know where the difference lies between the groups, we can use the Bonferroni Method for further investigation.

Bonus:

I remember Song asked me, it is good to know what ANOVA is used for, but do you know which test generates the P-value of ANOVA?

I thought since ANOVA has similar application as T-test, so the t.test generates P-value.
However, the truth is F-test generates the ANOVA's P-value.

Later, little rain mentioned T-test and F-test is convertible with the relation T^{2}= F.
We will go over the relationship between T-test and F-test next time!

Happy Studying and Soccer game watching!



作者:乌然娅措
链接:https://www.jianshu.com/p/74aa145e1f02


点击查看更多内容
TA 点赞

若觉得本文不错,就分享一下吧!

评论

作者其他优质文章

正在加载中
  • 推荐
  • 评论
  • 收藏
  • 共同学习,写下你的评论
感谢您的支持,我会继续努力的~
扫码打赏,你说多少就多少
赞赏金额会直接到老师账户
支付方式
打开微信扫一扫,即可进行扫码打赏哦
今天注册有机会得

100积分直接送

付费专栏免费学

大额优惠券免费领

立即参与 放弃机会
意见反馈 帮助中心 APP下载
官方微信

举报

0/150
提交
取消