3 回答
TA贡献1810条经验 获得超4个赞
head
tail
sed
sed 'NUMq;d' file
NUM
sed '10q;d' file
file
.
NUMq
NUM
.
d
q
NUM
sed "${NUM}q;d" file
TA贡献1878条经验 获得超4个赞
设置
我只需要提取行的一个子集就可以对数据做任何有用的事情。 阅读每一行,直到我关心的值,将需要很长的时间。 如果解决方案读取了我关心的行,并继续读取文件的其余部分,那么它将浪费时间读取将近30亿个不相关的行,并且花费比需要长6倍的时间。
time
基线
head
tail
$ time head -50000000 myfile.ascii | tail -1pgm_icnt = 0real 1m15.321s
切
$ time cut -f50000000 -d$'\n' myfile.ascii pgm_icnt = 0real 5m12.156s
AWK
exit
$ time awk 'NR == 50000000 {print; exit}' myfile.ascii pgm_icnt = 0real 1m16.583s
Perl
$ time perl -wnl -e '$.== 50000000 && print && exit;' myfile.ascii pgm_icnt = 0real 1m13.146s
SED
$ time sed "50000000q;d" myfile.ascii pgm_icnt = 0real 1m12.705s
地图档
结语
head
tail
sed
% = (runtime/baseline - 1) * 100
)
第50,000,000行
00:01:12.705 (-00:00:02.616 = -3.47%)
sed
00:01:13.146 (-00:00:02.175 = -2.89%)
perl
00:01:15.321 (+00:00:00.000 = +0.00%)
head|tail
00:01:16.583 (+00:00:01.262 = +1.68%)
awk
00:05:12.156 (+00:03:56.835 = +314.43%)
cut
第500,000,000行
00:12:07.050 (-00:00:26.160)
sed
00:12:11.460 (-00:00:21.750)
perl
00:12:33.210 (+00:00:00.000)
head|tail
00:12:45.830 (+00:00:12.620)
awk
00:52:01.560 (+00:40:31.650)
cut
第3,338,559,320行
01:20:54.599 (-00:03:05.327)
sed
01:21:24.045 (-00:02:25.227)
perl
01:23:49.273 (+00:00:00.000)
head|tail
01:25:13.548 (+00:02:35.735)
awk
05:47:23.026 (+04:24:26.246)
cut
- 3 回答
- 0 关注
- 711 浏览
添加回答
举报