4 回答
TA贡献1877条经验 获得超6个赞
pd.eval与 一起使用Series.str.replace。
df['number_of_hrs'] = pd.eval(df['number_of_hrs'].str.replace('DAY','*24'))
print(df)
# number_of_hrs number_of_pts
#0 65 1
#1 7 1
#2 31 1
#3 144 1
#4 23 1
#5 21 1
#6 5 1
或者
from ast import literal_eval
df['number_of_hrs'] = df['number_of_hrs'].str.replace('DAY','*24').apply(literal_eval)
#Alternative
#df['number_of_hrs'] = [literal_eval(s) for s in df['number_of_hrs'].str.replace('DAY','*24')]
TA贡献1809条经验 获得超8个赞
使用.loc和str.extract
使用正则表达式模式可以为您提供更多的灵活性,但 ansev 的pd.eval解决方案更巧妙。
idx = df.loc[df['number_of_hrs'].str.contains('day',case=False)].index
pat = '(\d+)\s{1}DAY'
hrs = (df.loc[df['number_of_hrs'].str.contains('day',case=False)]["number_of_hrs"].str.extract(
pat
).astype(int) * 24)[0]
df.loc[idx,'number_of_hrs'] = hrs
print(df)
number_of_hrs number_of_pts
0 65 1
1 7 1
2 31 1
3 144 1
4 23 1
5 21 1
6 5.0 1
TA贡献1757条经验 获得超8个赞
我的猜测是,这是一个时间增量,您可以获取秒数并将其转换为小时数,如下所示
df.loc[df['number_of_hrs'].str.lower().contains('day'), 'number_of_hrs'] = df['number_of_hrs'].seconds//3600
TA贡献1886条经验 获得超2个赞
另一种解决方案:
import pandas as pd import re
数据:
df = pd.DataFrame({'number_of_hrs':[65,7,31,'6 DAY', 23,21,5.0], 'number of pts':[1,1,1,1,1,1,1]})
编码:
df['number_of_hrs'] = pd.eval(df['number_of_hrs'].apply(lambda x: re.sub(r' DAY', '*24', str(x))))
添加回答
举报