我想从包含单词列表的DataFrame转换为每个单词都在其自己行中的DataFrame。如何在DataFrame中的列上爆炸?这是我尝试的一些示例,您可以在其中取消注释每个代码行并获取以下注释中列出的错误。我在带有Spark 1.6.1的Python 2.7中使用PySpark。from pyspark.sql.functions import split, explodeDF = sqlContext.createDataFrame([('cat \n\n elephant rat \n rat cat', )], ['word'])print 'Dataset:'DF.show()print '\n\n Trying to do explode: \n'DFsplit_explode = ( DF .select(split(DF['word'], ' '))# .select(explode(DF['word'])) # AnalysisException: u"cannot resolve 'explode(word)' due to data type mismatch: input to function explode should be array or map type, not StringType;"# .map(explode) # AttributeError: 'PipelinedRDD' object has no attribute 'show'# .explode() # AttributeError: 'DataFrame' object has no attribute 'explode').show()# Trying without splitprint '\n\n Only explode: \n'DFsplit_explode = ( DF .select(explode(DF['word'])) # AnalysisException: u"cannot resolve 'explode(word)' due to data type mismatch: input to function explode should be array or map type, not StringType;").show()请指教
添加回答
举报
0/150
提交
取消