我使用的是旧的pyspark脚本。我正在尝试将数据帧df转换为rdd。#Importing the required librariesimport pandas as pdfrom pyspark.sql.types import *from pyspark.ml.regression import RandomForestRegressorfrom pyspark.mllib.util import MLUtilsfrom pyspark.ml import Pipelinefrom pyspark.ml.tuning import CrossValidator, ParamGridBuilderfrom pyspark.ml.evaluation import RegressionEvaluatorfrom pyspark.ml.linalg import Vectorsfrom pyspark.ml import Pipelinefrom pyspark.ml.tuning import CrossValidator, ParamGridBuilderfrom pyspark.mllib.fpm import *from pyspark.sql import SparkSessionspark = SparkSession .builder .appName("Python Spark") .config("spark.some.config.option", "some-value")# read the datadf = pd.read_json("events.json")df = (df.rdd.map(lambda x: (x[1],[x[0]])).reduceByKey(lambda x,y: x+y).sortBy(lambda k_v: (k_v[0], sorted(k_v[1], key=lambda x: x[1], reverse=True))).collect()) 继承人错误输出: AttributeError:'DataFrame'对象没有属性'rdd'我想念什么?如何将数据帧转换为rdd?我安装了anaconda 3.6.1和spark 2.3.1
添加回答
举报
0/150
提交
取消