2 回答
TA贡献1859条经验 获得超6个赞
由于您不需要存储过程的返回值,因此您应该能够使用 pandas 的to_sql方法将行直接插入表中。这段代码...
from time import time
import pandas as pd
import sqlalchemy as sa
from_engine = sa.create_engine("mssql+pyodbc://@mssqlLocal64")
to_engine = sa.create_engine(
"mssql+pyodbc://sa:_whatever_@192.168.0.199/mydb"
"?driver=ODBC+Driver+17+for+SQL+Server",
fast_executemany=False,
)
# set up test
to_cnxn = to_engine.raw_connection()
to_cnxn.execute("TRUNCATE TABLE MillionRows")
to_cnxn.commit()
num_rows_to_upload = 10000
df = pd.read_sql_query(
f"SELECT TOP {num_rows_to_upload} "
"[TextField], [LongIntegerField], [DoubleField], [varchar_column] "
"FROM MillionRows ORDER BY ID",
from_engine,
)
# run test
t0 = time()
df.to_sql("MillionRows", to_engine, index=False, if_exists="append")
s = f"{(time() - t0):0.1f} seconds"
print(f"uploading {num_rows_to_upload:,d} rows took {s}")
… 表示与您现在所做的工作大致相同的内部工作水平,即,将每个单独的行作为单独的调用上传.execute。结果是
uploading 10,000 rows took 60.2 seconds
但是,简单地更改to_engine为使用fast_executemany=True结果
uploading 10,000 rows took 1.4 seconds
TA贡献1801条经验 获得超16个赞
关闭自动提交
conn = pyodbc.connect(
r'DRIVER={ODBC Driver 17 for SQL Server};SERVER=' + server_name + '; \
Database=' + db_name + ';Trusted_Connection=yes;', timeout=5, autocommit=False)
并在此处和循环结束时提交。
if index % 1000 == 0:
print("Imported %s rows" % (index))
使用自动提交,您必须等待日志文件在每一行之后保存到磁盘。
为了进一步优化,如果您使用 SQL 2016+,请使用 JSON 将批量行发送到 SQL Server,并使用OPENJSON在服务器端进行解析。
添加回答
举报