3 回答
TA贡献1783条经验 获得超4个赞
用:
m1 = df['A'] == 'X'
g = m1.cumsum()
m = (df['A'] == '') | m1
df = df[~m.groupby(g).transform('all')]
print (df)
A B C
3 X Val Val
4 Foo 1 2
5 3 4
6 X Val Val
7 Fou 1 2
8 3 4
9 X Val Val
10 Bar 1 2
详细资料:
m1 = df['A'] == 'X'
g = m1.cumsum()
m = (df['A'] == '') | m1
print (pd.concat([df,
df['A'] == 'X',
m1.cumsum(),
(df['A'] == ''),
m,
m.groupby(g).transform('all'),
~m.groupby(g).transform('all')], axis=1,
keys=['orig','==X','g','==space','m', 'all', 'inverted all']))
orig ==X g ==space m all inverted all
A B C A A A A A A
0 X Val Val True 1 False True True False
1 1 2 False 1 True True True False
2 3 4 False 1 True True True False
3 X Val Val True 2 False True False True
4 Foo 1 2 False 2 False False False True
5 3 4 False 2 True True False True
6 X Val Val True 3 False True False True
7 Fou 1 2 False 3 False False False True
8 3 4 False 3 True True False True
9 X Val Val True 4 False True False True
10 Bar 1 2 False 4 False False False True
说明:
比较依据
X
并为组创建累积总和,起始X
于g
链2布尔型面罩-比较
X
并留空m
groupby
对于transform
和仅用于组的DataFrameGroupBy.all
returnTrue
sTrue
最后反转并过滤
boolean indexing
TA贡献1804条经验 获得超7个赞
这是您的解决方案:
(df['A'] == 'X').shift()
0 NaN
1 True
2 False
3 False
4 True
5 False
6 False
7 True
8 False
9 False
10 True
Name: A, dtype: object
In [15]:
(df['A'] == '')
Out[15]:
0 False
1 True
2 True
3 False
4 False
5 True
6 False
7 False
8 True
9 False
10 False
Name: A, dtype: bool
In [14]:
((df['A'] == '') & (df['A'] == 'X').shift())
Out[14]:
0 False
1 True
2 False
3 False
4 False
5 False
6 False
7 False
8 False
9 False
10 False
Name: A, dtype: bool
结果是:
df[~((df['A'] == '') & (df['A'] == 'X').shift())]
Out[16]:
A B C
0 X Val Val
2 3 4
3 X Val Val
4 Foo 1 2
5 3 4
6 X Val Val
7 Fou 1 2
8 3 4
9 X Val Val
10 Bar 1 2
编辑:如果需要,您可以在while循环中进行。old_size_df = df.size new_size_df = 0
while old_size_df != new_size_df:
old_size_df = df.size
df = df[~((df['A'] == '') & (df['A'] == 'X').shift())]
new_size_df = df.size
A B C
0 X Val Val
3 X Val Val
4 Foo 1 2
5 3 4
6 X Val Val
7 Fou 1 2
8 3 4
9 X Val Val
10 Bar 1 2
TA贡献1796条经验 获得超4个赞
这是具有自定义套用功能的解决方案:
d = ({
'A' : ['X','','','X','Foo','','X','Fou','','X','Bar'],
'B' : ['Val',1,3,'Val',1,3,'Val',1,3,'Val',1],
'C' : ['Val',2,4,'Val',2,4,'Val',2,4,'Val',2],
})
df = pd.DataFrame(data=d)
is_x = False
def fill_empty_a(row):
global is_x
if row['A'] == '' and is_x:
row['A'] = None
else:
is_x = row['A'] == 'X'
return row
(df.apply(fill_empty_a, axis=1)
.dropna()
.reset_index(drop=True))
# A B C
# 0 X Val Val
# 1 X Val Val
# 2 Foo 1 2
# 3 3 4
# 4 X Val Val
# 5 Fou 1 2
# 6 3 4
# 7 X Val Val
# 8 Bar 1 2
添加回答
举报