前言
衔接上一篇数据比对后的结果,使用R包DSS进行处理。
Input文件准备
我们先来复习一下上一节课得到的数据结果:*.bismark.cov.gz
文件
$head test_data_bismark_bt2.deduplicated.bismark.cov 1 975476 975476 100 1 0 1 975488 975488 100 1 0 1 975490 975490 100 1 0 1 2224487 2224487 100 1 0 1 2224489 2224489 100 1 0 1 2224514 2224514 100 1 0 1 2224520 2224520 100 1 0 1 2313220 2313220 0 0 1 1 9313902 9313902 100 1 0 1 9313914 9313914 100 1 0# 第一列代表chromosome# 第二,三列代表location# 第四列代表甲基化百分比# 第五列代表甲基化数目# 第六列代表未甲基化数目
这里我们使用的R包为DSS,使用Bioconductor进行安装。
这个包可以对甲基化数据做两件事:
找出差异甲基化位点
根据甲基化位点找出差异化甲基化区域
DSS包对差异甲基化的检测基于β负二项分布的严格沃尔德检验。
DSS包可以对无重复数据进行处理
输入数据的格式如下:
第一列为染色体
第二列为位置
第三列为total reads
第四列为甲基化的reads
chr pos N X chr18 3014904 26 2chr18 3031032 33 12chr18 3031044 33 13chr18 3031065 48 24# 根据这个特性,我们可以将上面的输出文件通过Linux或者R进行简单的处理得到input文件。# 非常简单,所以这里请自行转化,凡事都需要自己摸索。
Call DML or DMR
DML是甲基化差异位点,DMR为甲基化差异区域。
使用DSS包自带的数据进行演示
1. 载入两个包DSS和bsseq(需要该包构建obj对象)
library(DSS)require(bsseq) path <- file.path(system.file(package="DSS"), "extdata") dat1.1 <- read.table(file.path(path, "cond1_1.txt"), header=TRUE) dat1.2 <- read.table(file.path(path, "cond1_2.txt"), header=TRUE) dat2.1 <- read.table(file.path(path, "cond2_1.txt"), header=TRUE) dat2.2 <- read.table(file.path(path, "cond2_2.txt"), header=TRUE) BSobj <- makeBSseqData( list(dat1.1, dat1.2, dat2.1, dat2.2), c("C1","C2", "N1", "N2") )[1:1000,]# 查看BSobj的类型BSobj An object of type 'BSseq' with 1000 methylation loci 4 samples has not been smoothed All assays are in-memory、
2. 利用DMLtest函数call DML,分为一下几个步骤:
预测所有CpG位点的平均甲基化水平
对每个CpG位点估算其甲基化水平的dispersions
进行沃尔德检验
在第一步过程中,smoothing可以更好帮助估算甲基化(针对whole-genome BS-seq)。
而RRBS不需要smoothing。
根据甲基化水平进行loci的差异分析
dmlTest <- DMLtest(BSobj, group1=c("C1", "C2"), group2=c("N1", "N2")) head(dmlTest) chr pos mu1 mu2 diff diff.se stat phi1 phi2 pval fdr1 chr18 3014904 0.3817233 0.4624549 -0.08073162 0.24997034 -0.3229648 0.300542998 0.01706260 0.74672190 0.99850942 chr18 3031032 0.3380579 0.1417008 0.19635711 0.11086362 1.7711592 0.008911745 0.04783892 0.07653423 0.67921273 chr18 3031044 0.3432172 0.3298853 0.01333190 0.12203116 0.1092500 0.010409029 0.01994821 0.91300423 0.99850944 chr18 3031065 0.4369377 0.3649218 0.07201587 0.10099395 0.7130711 0.010320888 0.01603200 0.47580174 0.99850945 chr18 3031069 0.2933572 0.5387464 -0.24538920 0.13178800 -1.8619996 0.012537553 0.02320887 0.06260315 0.61587976 chr18 3031082 0.3526311 0.3905718 -0.03794068 0.07847999 -0.4834440 0.007665696 0.01145531 0.62878051 0.9985094dmlTest.sm <- DMLtest(BSobj, group1=c("C1", "C2"), group2=c("N1", "N2"), smoothing=TRUE) head(dmlTest.sm) chr pos mu1 mu2 diff diff.se stat phi1 phi2 pval fdr1 chr18 3014904 0.3693669 0.4566563 -0.08728939 0.29967322 -0.2912819 0.30054300 0.01706260 0.7708357 0.96565152 chr18 3031032 0.3433882 0.3679732 -0.02458503 0.03970109 -0.6192533 0.03177894 0.28323422 0.5357495 0.86390363 chr18 3031044 0.3412867 0.3678807 -0.02659404 0.04032823 -0.6594397 0.02536938 0.02080295 0.5096134 0.85965224 chr18 3031065 0.3358830 0.3511983 -0.01531533 0.04799161 -0.3191252 0.01123412 0.01621926 0.7496316 0.96524175 chr18 3031069 0.3358830 0.3511983 -0.01531533 0.03205500 -0.4777830 0.02832889 0.05857316 0.6328047 0.89680296 chr18 3031082 0.3358830 0.3511983 -0.01531533 0.05846593 -0.2619531 0.01682981 0.01368466 0.7933576 0.9745116
3. 根据dmlTest来call DML
dmls <- callDML(dmlTest.sm, p.threshold=0.001) dmls chr pos mu1 mu2 diff diff.se stat phi1 phi2 pval fdr postprob.overThreshold447 chr18 3973699 0.8530694 0.3432547 0.5098147 0.05726793 8.902272 0.03662109 0.98947516 5.471481e-19 4.962634e-16 1.0000000709 chr18 4564190 0.7773858 0.1977036 0.5796822 0.10294267 5.631116 0.21725415 0.02952656 1.790467e-08 5.413180e-06 0.9999984710 chr18 4564237 0.7773858 0.1977036 0.5796822 0.15431085 3.756587 0.02931516 0.26238558 1.722462e-04 7.439396e-03 0.9990652dmls <- callDML(dmlTest, p.threshold=0.001) dmls chr pos mu1 mu2 diff diff.se stat phi1 phi2 pval fdr postprob.overThreshold450 chr18 3976129 0.01027497 0.939033927 -0.9287590 0.06544340 -14.191789 0.052591567 0.02428826 1.029974e-45 2.499403e-43 1.0000000451 chr18 3976138 0.01027497 0.939033927 -0.9287590 0.06544340 -14.191789 0.052591567 0.02428826 1.029974e-45 2.499403e-43 1.0000000638 chr18 4431501 0.01331553 0.943056638 -0.9297411 0.09273779 -10.025483 0.053172411 0.07746835 1.177826e-23 1.429096e-21 1.0000000639 chr18 4431511 0.01327049 0.943056638 -0.9297862 0.09270080 -10.029969 0.053121697 0.07746835 1.125518e-23 1.429096e-21 1.0000000710 chr18 4564237 0.91454619 0.011930005 0.9026162 0.05260037 17.159883 0.009528898 0.04942849 5.302004e-66 3.859859e-63 1.0000000782 chr18 4657576 0.98257334 0.067835497 0.9147378 0.06815000 13.422418 0.010424723 0.06755651 4.468885e-41 8.133371e-39 1.0000000582 chr18 4340682 0.95398081 0.030390730 0.9235901 0.10935874 8.445508 0.085494283 0.04540643 3.027264e-17 2.754810e-15 1.0000000583 chr18 4340709 0.95398081 0.030390730 0.9235901 0.10935874 8.445508 0.085494283 0.04540643 3.027264e-17 2.754810e-15 1.0000000340 chr18 3542732 0.95023554 0.034383112 0.9158524 0.11937407 7.672122 0.089137013 0.04474741 1.691739e-14 1.368429e-12 1.0000000395 chr18 3723448 0.06570765 0.751990744 -0.6862831 0.09825286 -6.984866 0.011958092 0.01646418 2.851278e-12 2.075730e-10 1.0000000188 chr18 3370113 0.01488553 0.787980174 -0.7730946 0.11380172 -6.793347 0.054190769 0.02752024 1.095611e-11 7.250956e-10 1.0000000400 chr18 3785543 0.79337804 0.118353679 0.6750244 0.12183251 5.540593 0.017720841 0.02442007 3.014493e-08 1.688116e-06 0.9999988683 chr18 4494490 0.77615275 0.104235359 0.6719174 0.12407577 5.415380 0.009219023 0.08210742 6.115879e-08 3.180257e-06 0.9999980783 chr18 4657592 0.96858371 0.266894590 0.7016891 0.13077255 5.365722 0.064228633 0.07721117 8.062618e-08 3.668491e-06 0.9999979642 chr18 4431618 0.43034706 0.965539700 -0.5351926 0.09578178 -5.587625 0.012064922 0.07497569 2.301966e-08 1.396526e-06 0.9999972189 chr18 3370141 0.01488553 0.676293642 -0.6614081 0.12554112 -5.268458 0.054190769 0.02138845 1.375746e-07 5.891428e-06 0.9999961738 chr18 4635185 0.44671015 0.007342128 0.4393680 0.08148745 5.391849 0.010069744 0.05154034 6.973627e-08 3.384534e-06 0.9999844330 chr18 3542403 0.02875655 0.714168320 -0.6854118 0.15294560 -4.481409 0.041892077 0.03368033 7.415192e-06 2.841190e-04 0.9999354185 chr18 3347936 0.49560648 0.017291187 0.4783153 0.10184167 4.696656 0.012173893 0.04533502 2.644552e-06 1.069574e-04 0.999898392 chr18 3217241 0.03876653 0.767643956 -0.7288774 0.17547605 -4.153714 0.038095316 0.03753383 3.271214e-05 9.602621e-04 0.9998319396 chr18 3723468 0.23043341 0.951779714 -0.7213463 0.18220738 -3.958930 0.173291593 0.07965750 7.528626e-05 1.838513e-03 0.999678640 chr18 3047980 0.07206183 0.658535570 -0.5864737 0.14396467 -4.073734 0.012517547 0.02615351 4.626536e-05 1.247451e-03 0.9996373614 chr18 4353399 0.01355584 0.547298573 -0.5337427 0.12855458 -4.151876 0.053422013 0.02055735 3.297603e-05 9.602621e-04 0.999630099 chr18 3226251 0.97443783 0.217343279 0.7570945 0.19807591 3.822244 0.012911947 0.27481269 1.322425e-04 3.105566e-03 0.9995532613 chr18 4353379 0.01355584 0.529341634 -0.5157858 0.13033377 -3.957422 0.053422013 0.02080043 7.576289e-05 1.838513e-03 0.9992902894 chr18 4921874 0.35815864 0.009172558 0.3489861 0.07955244 4.386868 0.009774839 0.05010151 1.149944e-05 3.986472e-04 0.9991255895 chr18 4921881 0.35815864 0.009172558 0.3489861 0.07955244 4.386868 0.009774839 0.05010151 1.149944e-05 3.986472e-04 0.9991255
注意,这里使用smoothing进行操作,事实上使用的示例数据我也不清楚是RRBS还是WGBS,但是使用dmlTest call DML会有更多的结果。
当然,用户也可以指定差异的阈值,只有差异大于阈值的才会被call出来。
dmls2 <- callDML(dmlTest, delta=0.1, p.threshold=0.001) head(dmls2) chr pos mu1 mu2 diff diff.se stat phi1 phi2 pval fdr postprob.overThreshold450 chr18 3976129 0.01027497 0.9390339 -0.9287590 0.06544340 -14.19179 0.052591567 0.02428826 1.029974e-45 2.499403e-43 1451 chr18 3976138 0.01027497 0.9390339 -0.9287590 0.06544340 -14.19179 0.052591567 0.02428826 1.029974e-45 2.499403e-43 1638 chr18 4431501 0.01331553 0.9430566 -0.9297411 0.09273779 -10.02548 0.053172411 0.07746835 1.177826e-23 1.429096e-21 1639 chr18 4431511 0.01327049 0.9430566 -0.9297862 0.09270080 -10.02997 0.053121697 0.07746835 1.125518e-23 1.429096e-21 1710 chr18 4564237 0.91454619 0.0119300 0.9026162 0.05260037 17.15988 0.009528898 0.04942849 5.302004e-66 3.859859e-63 1782 chr18 4657576 0.98257334 0.0678355 0.9147378 0.06815000 13.42242 0.010424723 0.06755651 4.468885e-41 8.133371e-39 1
4. 根据dmlTest Call DMR
dmrs <- callDMR(dmlTest.sm, p.threshold=0.01) dmrs chr start end length nCG meanMethy1 meanMethy2 diff.Methy areaStat23 chr18 4921637 4922059 423 27 0.11002940 0.01809674 0.09193266 86.295887 chr18 3507919 3508022 104 10 0.07524915 0.03294316 0.04230600 30.5594315 chr18 4340682 4340753 72 4 0.89237955 0.35052968 0.54184987 10.83526dmrs <- callDMR(dmlTest, p.threshold=0.01) dmrs chr start end length nCG meanMethy1 meanMethy2 diff.Methy areaStat27 chr18 4657576 4657639 64 4 0.506453 0.318348 0.188105 14.34236
我们可以发现,smoothing前后得到的结果差异还是很大,所以针对不同的实验类型我们需要注意是否使用smoothing。
同理,也可以使用的delta参数以及调整p.threshold得到合适的结果。
dmrs2 <- callDMR(dmlTest, delta=0.1, p.threshold=0.05) dmrs2 chr start end length nCG meanMethy1 meanMethy2 diff.Methy areaStat31 chr18 4657576 4657639 64 4 0.5064530 0.3183480 0.188105 14.3423619 chr18 4222533 4222608 76 4 0.7880276 0.3614195 0.426608 12.91667
5. 可视化
DSS包提供了一个不是很美观的可视化函数,用户其实可以使用coverage结果在R里面作图。
showOneDMR(dmrs[1,], BSobj)
结语
分析到此就告一段落了,随后就是介绍对差异甲基化区域的注释以及可视化文献作图。下一篇教程随后发。
作者:ChevyXu
链接:https://www.jianshu.com/p/971ff2ee533a
共同学习,写下你的评论
评论加载中...
作者其他优质文章