1 回答
TA贡献1880条经验 获得超4个赞
鉴于描述,我建议使用pd.concat
or merge
。这是一个测试示例:
import pandas as pd
#generating test data
index1 = pd.date_range('1/1/2000', periods=9, freq='D')
index2 = pd.date_range('1/4/2000', periods=9, freq='D')
series = range(9)
df1 = pd.DataFrame([index1,series]).T
df2 = pd.DataFrame([index2,series]).T
df1.columns = ['Time','Data']
df2.columns = ['Time','Data']
df1:
Time Data
0 2000-01-01 00:00:00 0
1 2000-01-02 00:00:00 1
2 2000-01-03 00:00:00 2
3 2000-01-04 00:00:00 3
4 2000-01-05 00:00:00 4
5 2000-01-06 00:00:00 5
6 2000-01-07 00:00:00 6
7 2000-01-08 00:00:00 7
8 2000-01-09 00:00:00 8
df2:
Time Data
0 2000-01-04 00:00:00 0
1 2000-01-05 00:00:00 1
2 2000-01-06 00:00:00 2
3 2000-01-07 00:00:00 3
4 2000-01-08 00:00:00 4
5 2000-01-09 00:00:00 5
6 2000-01-10 00:00:00 6
7 2000-01-11 00:00:00 7
8 2000-01-12 00:00:00 8
请注意,两个数据框中的数据可用于不同的日期。
#convert Time to pandas datetime format
#df1['Time'].to_datetime(df1['Time']) # <- uncomment this for your case
#df1['Time'].to_datetime(df1['Time']) # <- uncomment this for your case
#making the time the index of the dataframes
df1.set_index(['Time'],inplace=True)
df2.set_index(['Time'],inplace=True)
#concatenating the dataframe column wise (axis=1)
df3 = pd.concat([df1,df2],axis=1)
print(df3)
输出:
Data Data
Time
2000-01-01 0 NaN
2000-01-02 1 NaN
2000-01-03 2 NaN
2000-01-04 3 0
2000-01-05 4 1
2000-01-06 5 2
2000-01-07 6 3
2000-01-08 7 4
2000-01-09 8 5
2000-01-10 NaN 6
2000-01-11 NaN 7
2000-01-12 NaN 8
处理缺失值:
pd.concat correctly merges the data as per the data. NaN indicate the missing values after combining, which can be handled mainly with fillna(filling something inplace of NaN) or dropna (dropping the data containing NaN). Here is an example of fillna (dropna is used exactly the same way but without 0) :
#filling 0's inplace of `NaN`. You can use also method='bfill' or 'ffill' or interpolate
df3 = df3.fillna(0,inplace=True)
#df3 = df3.fillna(method='bfill',inplace=True) # <- uncomment if you want to use this
#df3 = df3.fillna(method='ffill',inplace=True) # <- uncomment if you want to use this
Output:
Data Data
Time
2000-01-01 0 0
2000-01-02 1 0
2000-01-03 2 0
2000-01-04 3 0
2000-01-05 4 1
2000-01-06 5 2
2000-01-07 6 3
2000-01-08 7 4
2000-01-09 8 5
2000-01-10 0 6
2000-01-11 0 7
2000-01-12 0 8
添加回答
举报