总结一个文本文件的内容

Python

jeck猫 2021-08-11 21:41:15

我有一个像这个例子的文本文件：chrX 7970000 8670000 3 2 7 7 RPS6KA6 4chrX 7970000 8670000 3 2 7 7 SATL1 3chrX 7970000 8670000 3 2 7 7 SH3BGRL 4chrX 7970000 8670000 3 2 7 7 VCX2 1chrX 86580000 86980000 1 1 1 5 KLHL4 2chrX 87370000 88620000 4 4 11 11 CPXCR1 2chrX 87370000 88620000 4 4 11 11 FAM9A 2chrX 89050000 91020000 11 6 10 13 FAM9B 3chrX 89050000 91020000 11 6 10 13 PABPC5 2我想计算每行重复的次数 ( only 1st, 2nd and 3rd columns)。在output，会有5 columns。the1st 3 columns将相同（每行仅重复一次），但4th column在 thesame column和 the 中会有多个字符same line（这些字符在8th columnof 中original file）。the5th column是1st 3 lines are repeatedin的次数original file。in short: 在input file,columns 4,5,6,7 and 9 are useless对于输出文件。我们应该算在其中的行数1st 3 columns are the same，因此，在output file该1st 3 column would be the same as input file（但only repeated once）。该5th column is the number of times行是重复的。的4th column of output是所有字符从8th column这些都是重复行。在expected output，这一行是repeated 4 times：chrX 7970000 8670000。所以，5th column is 4和4th column is: RPS6KA6,SATL1,SH3BGRL,VCX2。正如您在4th column are comma separated.这是预期的输出：chrX 7970000 8670000 RPS6KA6,SATL1,SH3BGRL,VCX2 4chrX 86580000 86980000 KLHL4 1chrX 87370000 88620000 CPXCR1,FAM9A 2chrX 89050000 91020000 FAM9B,PABPC5 2我试图在 Python 中做到这一点并编写了以下代码：file = open("myfile.txt", 'rb')infile = []for line in file: infile.append(line) count = 0 final = [] for i in range(len(infile)): count += 1 if infile[i-1] == infile[i] final.append(infile[0,1,2,7, count])这段代码没有返回我想要的。你知道如何解决吗？

查看完整描述

3 回答

喵喔喔

TA贡献1735条经验获得超5个赞

这应该做你想做的：

from collection import defaultdict # 1

lines = [line.rstrip().split() for line in open('file.txt').readlines()] # 2

counter = defaultdict(list) # 3

for line in lines:

counter[(line[0], line[1], line[2])].append(line[7]) # 4

for key, value in counter.iteritems(): # 5

print '{} {} {}'.format(' '.join(key), ','.join(value), len(value)) # 6

解释：

我们将使用一个方便的库，它为我们提供了一个带有默认值的字典
读取整个输入文件，删除末尾的新行并拆分为多个部分（在空白处）
为任何键创建一个默认值为空列表的字典
遍历行并填充字典

第 1-3 列是关键
对于第8列的每个字符序列，我们把它添加到列表中（如果我们没有使用defaultdict与list该会更复杂）

迭代字典的键值对
打印输出，将数据结构加入所需的格式。

希望这有帮助🙂。

反对回复 2021-08-11

热搜

最近搜索清空

总结一个文本文件的内容

总结一个文本文件的内容

3 回答

添加回答