我有地理数据的 CSV 文件。我使用 apache spark 将这些文件导入数据集,然后我想使用 GeoMesa。所以我需要将数据集转换为 simplefeature,并以 GeoMesa 格式将其保存到 Cassandra公共类主要{public static void main(String[] args) throws IOException { Map<String, String> dsProperties = new HashMap<String, String>(); dsProperties.put("cassandra.keyspace", "t1"); dsProperties.put("cassandra.catalog", "testgeo"); dsProperties.put("cassandra.tableName", "testgeo"); dsProperties.put("cassandra.contact.point", "localhost:9042"); DataStore ds = DataStoreFinder.getDataStore(dsProperties); SimpleFeatureType sft = SimpleFeatureTypes.createType("testgeo", "geoid:Integer,geopoint:Point:srid=4326"); ds.createSchema(sft); SparkSession spark = SparkSession.builder().appName("my-app").master("local[*]") .config("spark.serializer", "org.apache.spark.serializer.KryoSerializer") .config("spark.kryo.registrator", "org.locationtech.geomesa.spark.GeoMesaSparkKryoRegistrator") .getOrCreate(); org.apache.spark.sql.SQLTypes.init(spark.sqlContext()); StructType schema = new StructType() .add(new StructField("id", DataTypes.IntegerType, true, Metadata.empty())) .add(new StructField("dt", DataTypes.TimestampType, true, Metadata.empty())) .add(new StructField("lat", DataTypes.DoubleType, true, Metadata.empty())) .add(new StructField("lon", DataTypes.DoubleType, true, Metadata.empty())); Dataset<Row> df = spark.read().format("geomesa").option("header", true).option("inferSchema", true) .option("dateFormat", "yyyy-MM-dd HH:mm:ss").schema(schema).option("delimiter", ",") .csv("C:\\Users\\h6\\Desktop\\dta.csv"); df.createOrReplaceTempView("testgeo"); df = spark.sql("SELECT id as geoid, st_makePoint(lat, lon) as geopoint FROM testgeo"); df.show();}}
1 回答
泛舟湖上清波郎朗
TA贡献1818条经验 获得超3个赞
Cassandra 目前没有 SpatialRDDProvider,因此您需要使用通用的“GeoTools”:https ://www.geomesa.org/documentation/user/spark/providers.html#geotools-rdd-provider
简而言之,您需要添加"geotools" -> "true"
到您的tableProperties
地图中。您还需要确保适当的 Cassandra 数据存储 JAR 位于 Spark 类路径中。
添加回答
举报
0/150
提交
取消