1 回答
TA贡献1827条经验 获得超8个赞
您可以使用first()参数ignorenullsas True。
另外,请rowsBetween(-sys.maxsize, sys.maxsize)在窗户上使用。
from pyspark.sql import functions as F
from pyspark.sql.functions import *
from pyspark.sql.window import Window
import sys
w = Window().partitionBy("id").orderBy("Date")
df.withColumn("new", F.first('category', True).over(w.rowsBetween(-sys.maxsize, sys.maxsize)))\
.orderBy("id", "Date").show()
+---+--------+----------+
| id|category| Date|
+---+--------+----------+
| A1| Nixon|2010-01-02|
| A1| Nixon|2010-01-03|
| A1| Nixon|2010-01-04|
| A1| Nixon|2010-01-05|
| A9| Leonard|2010-05-02|
| A9| Leonard|2010-05-03|
| A9| Leonard|2010-05-04|
| A9| Leonard|2010-05-05|
+---+--------+----------+
添加回答
举报