1 回答
TA贡献1785条经验 获得超4个赞
您可以在 pySpark 中合并两个 dataframeS,如下所示:
>>> df1.show()
+---+---------+
| ID| Role|
+---+---------+
| 1| Author|
| 1| Editor|
| 2| Author|
| 2|Publisher|
| 3| Editor|
| 3|Assistant|
+---+---------+
>>> df2.show()
+---+-----------+
| ID| Name|
+---+-----------+
| 1| John Smith|
| 2| John Doe|
| 3|Bob Jim Bob|
+---+-----------+
>>> df3 = df2.join(df1,"ID")
>>> df3.show()
+---+-----------+---------+
| ID| Name| Role|
+---+-----------+---------+
| 1| John Smith| Author|
| 1| John Smith| Editor|
| 2| John Doe| Author|
| 2| John Doe|Publisher|
| 3|Bob Jim Bob| Editor|
| 3|Bob Jim Bob|Assistant|
+---+-----------+---------+
注意:我假设"ID"为外键,如有任何疑问,请发表评论。
添加回答
举报