1 回答
TA贡献1875条经验 获得超3个赞
您可以使用“coerce_timestamps”强制 to_parquet 以 Athena 可以理解的格式写入:
t = pandas.DataFrame([['Haiti',pandas.to_datetime('1804-01-01')]],columns=['Country','Independence'])
t.to_parquet("s3://<mybucket>/tmp/t.parquet", coerce_timestamps='ms')
|Country | Independence|
|--------|-------------|
|Haiti | 1804-01-01 |
CREATE EXTERNAL TABLE IF NOT EXISTS default.mytable (
`Country` string,
`Independence` timestamp
)
ROW FORMAT SERDE 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe'
WITH SERDEPROPERTIES (
'serialization.format' = '1'
) LOCATION 's3://<mybucket>/tmp/'
TBLPROPERTIES ('has_encrypted_data'='false');
SELECT * FROM "default"."mytable" limit 10;
|Country | Independence |
|--------|-----------------------|
|Haiti |1804-01-01 00:00:00.000|
添加回答
举报