首页手记 Odds Ratio

Odds Ratio

标签：

Python 数据分析&挖掘数学

Data Science Day 12: Odds Ratio

Learning Objective:

Probability vs Odds Vs Odds Ratio

1. Probability = Event/Sample Space
2. Odds= Prob(Event)/Prob(Non-Event)
3. Odds Ratio = Odds(Group 1)/ Odds(Group 2)

Interpretation

The Odds Ratio is a measure of association between exposure and outcome.

OR=Odds(Group 1)/Odds(Group2)>1 indicates the increased occurrence of an event in Group 1 compared to Group 2.

OR=Odds(Group 1)/Odds(Group2) < 1 indicates the decreased occurrence of an event in Group 1 compared to Group 2.

The true Odds Ratio lies in between 95% Confidence interval and P-value represents the statistical significant

955169 / Pixabay

Example: UCLA Graduate School Admission dataset

calculate both theoretical and true Odds Ratio and interpret the meaning of odds ratio
<script class="lazyload" src="data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAAEAAAABCAYAAAAfFcSJAAAAAXNSR0IArs4c6QAAAARnQU1BAACxjwv8YQUAAAAJcEhZcwAADsQAAA7EAZUrDhsAAAANSURBVBhXYzh8+PB/AAffA0nNPuCLAAAAAElFTkSuQmCC" data-original="https://gist.github.com/fangya18/2cdf0ae21856edbaca0c1d3d0aefd501.js"></script>

   admit  gre   gpa  prestige
0      0  380  3.61         3
1      1  660  3.67         3
2      1  800  4.00         1
3      1  640  3.19         4
4      0  520  2.93         4

#1 is the most prestiges school.
# we make a dummy_rank to group prestige 1,2 as 1 and 3,4 as 2
df["dummy_rank"]=np.where(df["prestige"] <3 , 1 ,2)
df.hist()
pl.show()
#dummy_rank=pd.get_dummies(df["prestige"],prefix="prestige")
print (df.head())
#frequncy table prestiges vs admit
print(pd.crosstab(df['admit'],df["dummy_rank"]))

   admit  gre   gpa  prestige  dummy_rank
0      0  380  3.61         3           2
1      1  660  3.67         3           2
2      1  800  4.00         1           1
3      1  640  3.19         4           2
4      0  520  2.93         4           2
dummy_rank    1    2
admit               
0           125  148
1            87   40

#Apply logistic regression
X=df[["gre","gpa","dummy_rank"]]
logit=sm.Logit(df["admit"],X)
result=logit.fit()
print (result.summary())
print (result.conf_int())

Optimization terminated successfully.
         Current function value: 0.593637
         Iterations 5
                           Logit Regression Results                           
==============================================================================
Dep. Variable:                  admit   No. Observations:                  400
Model:                          Logit   Df Residuals:                      397
Method:                           MLE   Df Model:                            2
Date:                Fri, 19 Oct 2018   Pseudo R-squ.:                 0.05014
Time:                        17:44:14   Log-Likelihood:                -237.45
converged:                       True   LL-Null:                       -249.99
                                        LLR p-value:                 3.604e-06
==============================================================================
                 coef    std err          z      P>|z|      [0.025      0.975]
------------------------------------------------------------------------------
gre            0.0014      0.001      1.318      0.188      -0.001       0.003
gpa            0.0247      0.204      0.121      0.904      -0.375       0.425
dummy_rank    -1.1395      0.222     -5.144      0.000      -1.574      -0.705
==============================================================================
                   0         1
gre        -0.000660  0.003368
gpa        -0.375392  0.424737
dummy_rank -1.573685 -0.705355

# Theoratical odds ratio
print(np.exp(result.params))
params= result.params
conf=result.conf_int()
conf["OR"]=params
conf.columns=["2.5%","97.5%","OR"]
print(np.exp(conf))

gre           1.001355
gpa           1.024980
dummy_rank    0.319973
dtype: float64
               2.5%     97.5%        OR
gre         0.99934  1.003374  1.001355
gpa         0.68702  1.529189  1.024980
dummy_rank  0.20728  0.493933  0.319973

# Calculate Probality vs Odds vs Odds ratio
prob_rank1_accept=87/(125+87)
print(prob_rank1_accept)
prob_rank2_accept=40/(148+40)
print(prob_rank2_accept)
odds_rank1=87/125
odds_rank2=40/148
print(odds_rank1, odds_rank2)
odds_ratio=odds_rank2/odds_rank1
print(odds_ratio)

0.41037735849056606
0.2127659574468085
0.696 0.2702702702702703
0.38831935383659527

#Visulatization
%matplotlib inline
pd.crosstab(df.admit, df.dummy_rank).plot(kind="bar")
plt.title("Admit vs Prestige")
plt.xlabel("Admit")
plt.ylabel("Student Frequency Count")

Summary

Our theoretical Odds Ratio is 0.319 with a CI(0.20, 0.41), which is close to the true Odds ratio, 0.388. This indicates if the undergraduate students are from the school in prestige 3 or 4, the chances of them getting in graduate school is 38% that of the students from prestige 1 or 2 undergraduate schools. In other words, it is 2.5 times more likely for a student to get into a graduate school from undergraduate school rated in Prestige 1 or 2 compared to 3 or 4. Our graph supported the result!

Inspired by http://blog.yhat.com/posts/logistic-regression-and-python.html

Happy Studying!

点击查看更多内容

为 TA 点赞

若觉得本文不错，就分享一下吧！

评论

评论

共同学习，写下你的评论

评论加载中...

展开查看更多评论

作者其他优质文章

正在加载中

乌然娅措

学生

手记
篇

粉丝

22

获赞与收藏

12

关注作者，订阅最新文章

阅读免费教程

Python 办公自动化教程

17个小节 25916 878

Python 算法入门教程

15个小节 27667 1081

Python 进阶应用教程

38个小节 66356 1044

推荐

评论

收藏

共同学习，写下你的评论



感谢您的支持，我会继续努力的～

扫码打赏，你说多少就多少

赞赏金额会直接到老师账户

支付方式

打开微信扫一扫，即可进行扫码打赏哦

今天注册有机会得

100积分直接送

付费专栏免费学

大额优惠券免费领

立即参与放弃机会

点击
抽奖

慕课手记新用户专享福利

恭喜你，你的运气太好了，居然抽中了 100个积分！

恭喜你，抽中了价值元的专栏！

太棒了，直接落到你账户里！

积分商城里的罗技鼠标、机械键盘、
Kindle 阅读器、小米平衡车
Apple iPad （10.2英寸）、大额优惠券
在等着你去兑换了噢

作者：

免费赠送

兑换码：1111222211 复制

优惠券可用于购买实战课、体系课
无门槛使用

先去看看，有什么好东西马上兑换我爱学习，选课去


热搜

最近搜索清空

Odds Ratio

Data Science Day 12: Odds Ratio

Learning Objective:

Summary

阅读免费教程