继此问题之后,当我尝试从具有多个分区的 dask.dataframe 创建 postgresql 表时,出现以下错误:IntegrityError: (psycopg2.IntegrityError) duplicate key value violates unique constraint "pg_type_typname_nsp_index"DETAIL: Key (typname, typnamespace)=(test1, 2200) already exists. [SQL: '\nCREATE TABLE test1 (\n\t"A" BIGINT, \n\t"B" BIGINT, \n\t"C" BIGINT, \n\t"D" BIGINT, \n\t"E" BIGINT, \n\t"F" BIGINT, \n\t"G" BIGINT, \n\t"H" BIGINT, \n\t"I" BIGINT, \n\t"J" BIGINT, \n\tidx BIGINT\n)\n\n']您可以使用以下代码重新创建错误:import numpy as npimport dask.dataframe as ddimport daskimport pandas as pdimport sqlalchemy_utils as sqla_utilsimport sqlalchemy as sqlaDATABASE_CONFIG = { 'driver': '', 'host': '', 'user': '', 'password': '', 'port': 5432,}DBNAME = 'dask'url = '{driver}://{user}:{password}@{host}:{port}/'.format( **DATABASE_CONFIG)db_url = url.rstrip('/') + '/' + DBNAME# create db if non-existentif not sqla_utils.database_exists(db_url): print('Creating database \'{}\''.format(DBNAME)) sqla_utils.create_database(db_url)conn = sqla.create_engine(db_url)# create pandas df with random numbersdf = pd.DataFrame(np.random.randint(0,40,size=(100, 10)), columns=list('ABCDEFGHIJ'))# add index so that it can be used as primary key later ondf['idx'] = df.index# create dask dfddf = dd.from_pandas(df, npartitions=4)# Write to psqldto_sql = dask.delayed(pd.DataFrame.to_sql)out = [dto_sql(d, 'test', db_url, if_exists='append', index=False, index_label='idx') for d in ddf.to_delayed()]dask.compute(*out)如果 npartitions 设置为 1,代码不会产生错误。所以我猜这与 postgres 无法处理写入同一个 sql 表的并行请求有关......?我怎样才能解决这个问题?
3 回答

HUWWW
TA贡献1874条经验 获得超12个赞
我在 Heroku 的 PostgreSQL 上使用 ponyORM 时遇到了同样的错误。我通过锁定线程直到它执行数据库操作来解决它。就我而言:
lock = threading.Lock()
with lock:
PonyOrmEntity(name='my_name', description='description')
PonyOrmEntity.get(lambda u: u.name == 'another_name')

繁星淼淼
TA贡献1775条经验 获得超11个赞
在 PostgreSQL 中对我有帮助。
set enable_parallel_hash=off;
在你可以打开它之后
set enable_parallel_hash=on;
添加回答
举报
0/150
提交
取消