我在 docker 中使用 Spark 来进行一些处理。我们有一个 Kafka 容器、Spark 主容器、两个 Spark 工作容器和一个 Python 容器来编排整个流程。我们通常docker-compose会提出一切:version: '3.4'volumes: zookeeper-persistence: kafka-store: spark-store:services: zookeeper-server: image: 'bitnami/zookeeper:3.6.1' expose: - '2181' environment: ... volumes: - zookeeper-persistence:/bitnami/zookeeper kafka-server: image: 'bitnami/kafka:2.6.0' expose: - '29092' - '9092' environment: ... volumes: - kafka-store:/bitnami/kafka depends_on: - zookeeper-server spark-master: image: bitnami/spark:3.0.1 environment: SPARK_MODE: 'master' SPARK_MASTER_HOST: 'spark-master' ports: - '8080:8080' expose: - '7077' depends_on: - kafka-server spark-worker1: image: bitnami/spark:3.0.1 environment: SPARK_MODE: 'worker' SPARK_WORKER_MEMORY: '4G' SPARK_WORKER_CORES: '2' depends_on: - spark-master spark-worker2: #same as spark-worker1 compute: build: ./app image: compute environment: KAFKA_HOST: kafka-server:29092 COMPUTE_TOPIC: DataFrames PYSPARK_SUBMIT_ARGS: "--master spark://spark-master:7077 --packages org.apache.spark:spark-sql-kafka-0-10_2.12:3.0.0 pyspark-shell" depends_on: - spark-master - kafka-server volumes: - spark-store:/app/checkpoints数据通过另一个 Python 应用程序发送,计算容器响应更改。我们创建一个 ComputeDeployment 并调用 start 函数来启动 Spark 作业:
1 回答
牛魔王的故事
TA贡献1830条经验 获得超3个赞
这与权限有关。我发现 Spark 容器中的用户无法写入该目录。使用此入口点文件修复它:
#!/bin/bash
chmod -R a+rwX /app/checkpoints/
python -u run.py
添加回答
举报
0/150
提交
取消