我有一个data.table dt。此data.table首先按列date(我的分组变量)排序,然后按列排序age:library(data.table)setkeyv(dt, c("date", "age")) # Sorts table first by column "date" then by "age"> dt date age name1: 2000-01-01 3 Andrew2: 2000-01-01 4 Ben3: 2000-01-01 5 Charlie4: 2000-01-02 6 Adam5: 2000-01-02 7 Bob6: 2000-01-02 8 Campbell我的问题是:我想知道是否可以提取每个唯一日期的前两行?或更笼统地说:如何提取每个组中的前n行?在此示例中,结果dt.f为:> dt.f = ???????? # function of dt to extract the first 2 rows per unique date> dt.f date age name1: 2000-01-01 3 Andrew2: 2000-01-01 4 Ben3: 2000-01-02 6 Adam4: 2000-01-02 7 Bobps这是创建上述data.table的代码:install.packages("data.table")library(data.table)date <- c("2000-01-01","2000-01-01","2000-01-01", "2000-01-02","2000-01-02","2000-01-02")age <- c(3,4,5,6,7,8)name <- c("Andrew","Ben","Charlie","Adam","Bob","Campbell")dt <- data.table(date, age, name)setkeyv(dt,c("date","age")) # Sorts table first by column "date" then by "age"
2 回答
开满天机
TA贡献1786条经验 获得超13个赞
是的,只需.SD根据需要使用它并为其编制索引。
DT[, .SD[1:2], by=date]
date age name
1: 2000-01-01 3 Andrew
2: 2000-01-01 4 Ben
3: 2000-01-02 6 Adam
4: 2000-01-02 7 Bob
根据@eddi的建议进行编辑。
@eddi的建议是:
请改用此命令以提高速度:
DT[DT[, .I[1:2], by = date]$V1]
# using a slightly larger data set
> microbenchmark(SDstyle=DT[, .SD[1:2], by=date], IStyle=DT[DT[, .I[1:2], by = date]$V1], times=200L)
Unit: milliseconds
expr min lq median uq max neval
SDstyle 13.567070 16.224797 22.170302 24.239881 88.26719 200
IStyle 1.675185 2.018773 2.168818 2.269292 11.31072 200
- 2 回答
- 0 关注
- 474 浏览
添加回答
举报
0/150
提交
取消