为了账号安全,请及时绑定邮箱和手机立即绑定

在PySpark中爆炸

在PySpark中爆炸

牧羊人nacy 2019-10-21 10:18:06
我想从包含单词列表的DataFrame转换为每个单词都在其自己行中的DataFrame。如何在DataFrame中的列上爆炸?这是我尝试的一些示例,您可以在其中取消注释每个代码行并获取以下注释中列出的错误。我在带有Spark 1.6.1的Python 2.7中使用PySpark。from pyspark.sql.functions import split, explodeDF = sqlContext.createDataFrame([('cat \n\n elephant rat \n rat cat', )], ['word'])print 'Dataset:'DF.show()print '\n\n Trying to do explode: \n'DFsplit_explode = ( DF .select(split(DF['word'], ' '))#  .select(explode(DF['word']))  # AnalysisException: u"cannot resolve 'explode(word)' due to data type mismatch: input to function explode should be array or map type, not StringType;"#   .map(explode)  # AttributeError: 'PipelinedRDD' object has no attribute 'show'#   .explode()  # AttributeError: 'DataFrame' object has no attribute 'explode').show()# Trying without splitprint '\n\n Only explode: \n'DFsplit_explode = ( DF  .select(explode(DF['word']))  # AnalysisException: u"cannot resolve 'explode(word)' due to data type mismatch: input to function explode should be array or map type, not StringType;").show()请指教
查看完整描述

3 回答

  • 3 回答
  • 0 关注
  • 697 浏览
慕课专栏
更多

添加回答

举报

0/150
提交
取消
意见反馈 帮助中心 APP下载
官方微信