Spark Dataset Joinwith Error: Join condition

我想在 spark 中加入两个数据集。这就是我所做的：Dataset<Row> data = spark.read().format("parquet").load("hdfs://path");Dataset<Person> p1= data.filter("id < 200").as(Encoders.bean(Person.class)).alias("ds1");Dataset<Person> p2= data.filter("id < 100").as(Encoders.bean(Person.class)).alias("ds2");p1.joinWith(p2, p1.col("ds1.id").equalTo(p2.col("ds2.id")) ,"inner").show();当我运行程序时，出现此错误：Detected implicit cartesian product for INNER join between logical plansProject [named_struct(id, id#3L, fname, fname#1, lname, lname#4, email, email#0, gender, gender#2) AS _1#41]+- Filter (named_struct(id, id#3L, fname, fname#1, lname, lname#4, email, email#0, gender, gender#2).id = named_struct(id, id#3L, fname, fname#1, lname, lname#4, email, email#0, gender, gender#2).id) +- Relation[email#0,fname#1,gender#2,id#3L,lname#4] parquetandProject [named_struct(id, id#39L, fname, fname#37, lname, lname#40, email, email#36, gender, gender#38) AS _2#42]+- Relation[email#36,fname#37,gender#38,id#39L,lname#40] parquetJoin condition is missing or trivial.Either: use the CROSS JOIN syntax to allow cartesian products between theserelations, or: enable implicit cartesian products by setting the configurationvariable spark.sql.crossJoin.enabled=true;我从错误和查看源代码中了解到：它认为这是一个交叉连接（第 1311-1328 行），但事实并非如此。我也看到了这个解决方案，说这是因为结构共享相同的血统，我们应该使用别名，我使用了它，但它不起作用。我怎么解决这个问题？

查看完整描述

Spark Dataset Joinwith Error: Join condition

Spark Dataset Joinwith Error: Join condition

1 回答

添加回答

热搜

最近搜索清空

Spark Dataset Joinwith Error: Join condition

Spark Dataset Joinwith Error: Join condition

1 回答

添加回答