2 回答
TA贡献1876条经验 获得超7个赞
迭代提取信息比使用
pandas.json_normalize
.如示例数据所示, 的值
data
是一种str
类型,必须转换为dict
.主要任务是从和中提取每一
key
value
对,以创建单独的记录。'bid'
'ask'
列表理解执行创建单独记录的任务。
import json
import pandas
# list of tuples, where the value of data, is a string
transaction_data = [('1599324732926-0', {'data': '{"timestamp":1599324732.767, "receipt_timestamp":1599324732.9256856, "delta":true, "bid":{"338.9":0.06482,"338.67":3.95535}, "ask":{"339.12":2.47578,"339.13":6.43172}}'}),
('1599324732926-1', {'data': '{"timestamp":1599324732.767, "receipt_timestamp":1599324732.9256856, "delta":true, "bid":{"338.9":0.06482,"338.67":3.95535}, "ask":{"339.12":2.47578,"339.13":6.43172}}'}),
('1599324732926-2', {'data': '{"timestamp":1599324732.767, "receipt_timestamp":1599324732.9256856, "delta":true, "bid":{"338.9":0.06482,"338.67":3.95535}, "ask":{"339.12":2.47578,"339.13":6.43172}}'})]
# create a list of lists for each transaction data
# split each side, key value pair into a separate list
data_key_list = [['timestamp', 'receipt_timestamp', 'delta', 'side', 'price', 'size']]
for v in transaction_data: # # iterate through each transaction
data = json.loads(v[1]['data']) # convert the string to a dict
for side in ['bid', 'ask']: # extract each key, value pair as a separate record
data_key_list += [[data['timestamp'], data['receipt_timestamp'], data['delta'], side, float(k), v] for k, v in data[side].items()]
# create a dataframe
df = pd.DataFrame(data_key_list[1:], columns=data_key_list[0])
# display(df.head())
timestamp receipt_timestamp delta side price size
0 1.59932e+09 1.59932e+09 True bid 338.9 0.06482
1 1.59932e+09 1.59932e+09 True bid 338.67 3.95535
2 1.59932e+09 1.59932e+09 True ask 339.12 2.47578
3 1.59932e+09 1.59932e+09 True ask 339.13 6.43172
4 1.59932e+09 1.59932e+09 True bid 338.9 0.06482
转换为字典列表
df.to_dict(orient='records')
[out]:
[{'timestamp': 1599324732.767,
'receipt_timestamp': 1599324732.9256856,
'delta': True,
'side': 'bid',
'price': 338.9,
'size': 0.06482},
{'timestamp': 1599324732.767,
'receipt_timestamp': 1599324732.9256856,
'delta': True,
'side': 'bid',
'price': 338.67,
'size': 3.95535},
{'timestamp': 1599324732.767,
'receipt_timestamp': 1599324732.9256856,
'delta': True,
'side': 'ask',
'price': 339.12,
'size': 2.47578},
{'timestamp': 1599324732.767,
'receipt_timestamp': 1599324732.9256856,
'delta': True,
'side': 'ask',
'price': 339.13,
'size': 6.43172},
...]
TA贡献1784条经验 获得超2个赞
这并不完全是您问题的答案,因为它不是 pandas 或 numpy 的实现,但我认为它应该可以满足您的需求。
尝试看看multiprocessing.pool.Pool.map
假设您有一个函数从原始列表接收元组并返回您想要的数据字典。可以说它的签名看起来像这样:
def tuple_to_dict(input):
# conversion code goes here
return result_dict
然后您可以像这样使用 multiprocessing.Pool() :
import multiprocessing
if __name__ == '__main__':
input_list = [...] # your input list
with multiprocessing.Pool() as pool:
result_list = pool.map(tuple_to_dict, input_list)
print(result_list)
笔记:
Pool() 对象的创建应该放在一个
if __name__ == "__main__"
块或从那里调用的函数内(递归) - 否则你会得到一个 RuntimeError放置
with ... as...
在那里,以便在使用结束或失败时关闭 Pool 对象。如果您不使用“with / as”语法,请在 try/catch 块内使用它,并pool.close()
在其finally
块中添加语句以确保池已关闭。
添加回答
举报