Dataset<Row> ds = spark.read().option("multiLine", true).option("mode", "PERMISSIVE").json("/user/administrador/prueba_diario.txt").toDF(); ds.printSchema(); Dataset<Row> ds2 = ds.select("articles").toDF(); ds2.printSchema(); spark.sql("drop table if exists table1"); ds2.write().saveAsTable("table1");我有这个json格式root |-- articles: array (nullable = true) | |-- element: struct (containsNull = true) | | |-- author: string (nullable = true) | | |-- content: string (nullable = true) | | |-- description: string (nullable = true) | | |-- publishedAt: string (nullable = true) | | |-- source: struct (nullable = true) | | | |-- id: string (nullable = true) | | | |-- name: string (nullable = true) | | |-- title: string (nullable = true) | | |-- url: string (nullable = true) | | |-- urlToImage: string (nullable = true) |-- status: string (nullable = true) |-- totalResults: long (nullable = true)我想将数组文章保存为具有数组格式的 hive 表我想要的蜂巢表示例:author (string)content (string)description (string)publishedat (string)source (struct<id:string,name:string>)title (string)url (string)urltoimage (string)问题是只用一个名为 article 的列保存表,而竞争就在这个唯一的列中
添加回答
举报
0/150
提交
取消