4 回答
TA贡献1966条经验 获得超4个赞
您可以在拆分游戏列后添加此代码
df['Away']=df['Away'].astype(str).str[0:-4] df['Home']=df['Home'].astype(str).str[0:-4]
TA贡献1805条经验 获得超9个赞
df['Away'] = df['Away'].str.replace('\(\d*\)', '').str.strip()
df['Home'] = df['Home'].str.replace('\(\d*\)', '').str.strip()
print (df.head())
Date Time Away Home Network
0 12/25/2011 12 PM BOS NY TNT
1 12/25/2011 2:30 PM MIA DAL ABC
2 12/25/2011 5 PM CHI LAL ABC
3 12/25/2011 8 PM ORL OKC ESPN
4 12/25/2011 10:30 PM LAC GS ESPN
TA贡献1830条经验 获得超3个赞
不要在 ' 处拆分 Game 列'at,不要特别声明分隔符。.split()将在每个空白处拆分,然后您只需要 0 索引和 3rd 索引值。所以真的只需要更改 1 行代码:
从df[['Away','Home']] = df.Game.str.split('at',expand=True) 到df[['Away','Home']] = df.Game.str.split(expand=True)[[0,3]]
import pandas as pd
import numpy as np
df = pd.read_html("https://www.sportsmediawatch.com/2011/12/revised-2011-12-nba-national-tv-schedule/", header=0)[0]
revisedCols = ['Date'] + [ col for col in df.columns if 'Revised' in col ]
df = df[revisedCols]
df.columns = df.iloc[0,:]
df = df.iloc[1:,:].reset_index(drop=True)
# Format Date to m/d/y
df['Date'] = np.where(df.Date.str.startswith(('10/', '11/', '12/')), df.Date + ' 11', df.Date + ' 12')
df['Date']=pd.to_datetime(df['Date'])
df['Date']=df['Date'].dt.strftime('%m/%d/%Y')
# Split the Game column
df[['Away','Home']] = df.Game.str.split(expand=True)[[0,3]]
# Final dataframe with desired columns
df = df[['Date','Time','Away','Home','Net']]
df.columns = ['Date', 'Time', 'Away', 'Home', 'Network']
print(df)
TA贡献1827条经验 获得超8个赞
您可以使用str.replace
括号和数字str.strip
,而且似乎在开头或结尾有一些空格:
df['Away'] = df['Away'].str.replace('\(\d*\)', '').str.strip()
df['Home'] = df['Home'].str.replace('\(\d*\)', '').str.strip()
print (df.head())
Date Time Away Home Network
0 12/25/2011 12 PM BOS NY TNT
1 12/25/2011 2:30 PM MIA DAL ABC
2 12/25/2011 5 PM CHI LAL ABC
3 12/25/2011 8 PM ORL OKC ESPN
4 12/25/2011 10:30 PM LAC GS ESPN
添加回答
举报