我希望 Dataproc 集群下载我创建的不可安装的自定义库,因此它需要用户从云源存储库克隆它,然后执行 。我尝试创建一个bash脚本;集群是在没有任何问题的情况下创建的,但我认为它没有运行bash脚本,因为我没有注意到任何变化。sudo python setup.py install以下是我想要初始化到集群的 bash 脚本:#! /bin/bash# download jarsgsutil -m cp gs://dataproc-featurelib/spark-lib/*.jar .# download credential filesgsutil -m cp gs://mlflow_feature_pipeline/secrets/*.json .# install feature_librarygcloud source repos clone feature_library --project=<project_id>cd feature_librarysudo python3 setup.py installcd ../以下是我设置集群的方法:gcloud beta dataproc clusters create featurelib-cluster \ --zone=us-east1-b \ --master-machine-type n1-highmem-16 \ --worker-machine-type n1-highmem-16 \ --num-workers 4 \ --image-version 1.4-debian9 \ --initialization-actions gs://dataproc-initialization-actions/python/pip-install.sh,gs://dataproc-featurelib/initialization-scripts/dataproc_featurelib_init.sh \ --metadata 'PIP_PACKAGES=google-cloud-storage hvac cryptography mlflow sqlalchemy snowflake-sqlalchemy snowflake-connector-python snowflake' \ --optional-components=ANACONDA \ --enable-component-gateway \ --project <project_id> \ --autoscaling-policy=featurelib-policy \ --tags feature-lib \ --no-address \ --subnet composer-us-east1 \ --bucket dataproc-featurelib
1 回答
慕标5832272
TA贡献1966条经验 获得超4个赞
我通过授权服务帐户解决了此问题。下面的 Bash 脚本示例:
#! /bin/bash
# download jars
gsutil -m cp gs://dataproc-featurelib/spark-lib/*.jar .
# download credential files
gsutil -m cp gs://mlflow_feature_pipeline/secrets/*.json .
# authenticate
gcloud config set account <gserviceaccount_email_id>
gcloud auth activate-service-account <gserviceaccount_email_id> --project=dao-aa-poc-uyim --key-file=<path_to_key_file>
# install package
gcloud source repos clone feature_library --project=<project_id>
cd feature_library
python3 setup.py install
cd ../
添加回答
举报
0/150
提交
取消