1 回答
TA贡献1906条经验 获得超10个赞
如果我理解正确,你可以比较greatest并返回列名,然后连接: 示例:
输入:
np.random.seed(111)
df = spark.createDataFrame(pd.DataFrame(np.random.randint(0,100,(5,5)),
columns=list('ABCDE')))
df.show()
+---+---+---+---+---+
| A| B| C| D| E|
+---+---+---+---+---+
| 84| 84| 84| 86| 19|
| 41| 66| 82| 40| 71|
| 57| 7| 12| 10| 65|
| 88| 28| 14| 34| 21|
| 54| 72| 37| 76| 58|
+---+---+---+---+---+
建议的解决方案:
import pyspark.sql.functions as F
cols = ['A','B','C']
df.withColumn("max_of_ABC",F.concat_ws("",
*[F.when(F.col(i) == F.greatest(*cols),i) for i in cols])).show()
+---+---+---+---+---+----------+
| A| B| C| D| E|max_of_ABC|
+---+---+---+---+---+----------+
| 84| 84| 84| 86| 19| ABC|
| 41| 66| 82| 40| 71| C|
| 57| 7| 12| 10| 65| A|
| 88| 28| 14| 34| 21| A|
| 54| 72| 37| 76| 58| B|
+---+---+---+---+---+----------+
添加回答
举报