1 回答
TA贡献1864条经验 获得超6个赞
使用 F.expr() 可以进行类连接。在您的情况下,您需要将它与内部联接一起使用。尝试这个,
#%%
import pyspark.sql.functions as F
test1 =sqlContext.createDataFrame([("Mike","apple,greenbeans,redwine,the little prince 70th anniversary gift set (book/cd/downloadable audio)" ),("kate","Whitewine,greenbeans,pineapple"),("Ben","Water,Spaghetti")],schema=["name","groceries"])
test2 = sqlContext.createDataFrame([("001","redwine"),("002","greenbeans"),("003","cd")],schema=["id","item"])
#%%
test_join =test1.join(test2,F.expr("""groceries rlike item"""),how='inner')
结果:
test_join.show(truncate=False)
+----+-------------------------------------------------------------------------------------------------+---+----------+
|name|groceries |id |item |
+----+-------------------------------------------------------------------------------------------------+---+----------+
|Mike|apple,greenbeans,redwine,the little prince 70th anniversary gift set (book/cd/downloadable audio)|001|redwine |
|Mike|apple,greenbeans,redwine,the little prince 70th anniversary gift set (book/cd/downloadable audio)|002|greenbeans|
|Mike|apple,greenbeans,redwine,the little prince 70th anniversary gift set (book/cd/downloadable audio)|003|cd |
|kate|Whitewine,greenbeans,pineapple |002|greenbeans|
+----+-------------------------------------------------------------------------------------------------+---+----------+
对于您的复杂数据集,contains() 函数必须有效
import pyspark.sql.functions as F
test1 = spark.createDataFrame([("Mike","apple, oranges, red wine,green beans"),("Kate","Whitewine, green beans waterrr, pineapple, red wine"), ("Leah", "red wine, juice, rice, grapes, green beans"),("Ben","Water,Spaghetti, the little prince 70th anniversary gift set (book/cd/downloadable audio)")],schema=["name","groceries"])
test2 = spark.createDataFrame([("001","red wine"),("002","green beans waterrr"), ("003", "the little prince 70th anniversary gift set (book/cd/downloadable audio)")],schema=["id","item"])
#%%
test_join =test1.join(test2,F.col('groceries').contains(F.col('item')),how='inner')
结果:
+----+-----------------------------------------------------------------------------------------+---+------------------------------------------------------------------------+
|name|groceries |id |item |
+----+-----------------------------------------------------------------------------------------+---+------------------------------------------------------------------------+
|Mike|apple, oranges, red wine,green beans |001|red wine |
|Kate|Whitewine, green beans waterrr, pineapple, red wine |001|red wine |
|Kate|Whitewine, green beans waterrr, pineapple, red wine |002|green beans waterrr |
|Leah|red wine, juice, rice, grapes, green beans |001|red wine |
|Ben |Water,Spaghetti, the little prince 70th anniversary gift set (book/cd/downloadable audio)|003|the little prince 70th anniversary gift set (book/cd/downloadable audio)|
+----+-----------------------------------------------------------------------------------------+---+------------------------------------------------------------------------+
添加回答
举报