首页猿问在 Python...

在 Python 中寻找最有利可图的多头/空头对排列 - 一个优化问题？

Python

肥皂起泡泡 2022-07-05 17:17:04

我有每日利润数据，我正在尝试找到两种资产的最佳组合，从而获得最高的利润。我需要买入一种资产做多并做空另一种资产，并在一个时间窗口内找到表现最好的货币对。我可以通过搜索所有排列来实现这一点，但它非常慢。（这并不奇怪）我认为这可能是适合使用像 PuLP 这样的库进行线性优化的问题类型。这是详尽解决问题的示例。我有意保持数据简单，但我需要搜索 1000 个资产。完成我在下面概述的低效手动方法大约需要 45 分钟。注意：因为做多“Alpha”和做空“Bravo”与做多“Bravo”和做空“Alpha”不同，所以我使用的是排列，而不是组合。编辑：如果有些人不熟悉做空和做空，我试图将最高利润与最低利润配对（做空，我赚的利润越多，价值越负）逻辑将如下所示：对于节点的所有排列，将节点一利润与节点二利润的倒数相加得到总利润。找出总利润最高的货币对。这是我非常低效（但有效）的实现：# Sample dataprofits = [ ('2019-11-18', 'Alpha', -79.629698), ('2019-11-19', 'Alpha', -17.452517), ('2019-11-20', 'Alpha', -19.069558), ('2019-11-21', 'Alpha', -66.061564), ('2019-11-18', 'Bravo', -87.698670), ('2019-11-19', 'Bravo', -73.812616), ('2019-11-20', 'Bravo', 198.513246), ('2019-11-21', 'Bravo', -69.579466), ('2019-11-18', 'Charlie', 66.302287), ('2019-11-19', 'Charlie', -16.132065), ('2019-11-20', 'Charlie', -123.735898), ('2019-11-21', 'Charlie', -30.046416), ('2019-11-18', 'Delta', -131.682322), ('2019-11-19', 'Delta', 13.296473), ('2019-11-20', 'Delta', 23.595053), ('2019-11-21', 'Delta', 14.103027),]profits_df = pd.DataFrame(profits, columns=('Date','Node','Profit')).sort_values('Date')profits_df看起来像这样：+----+------------+---------+-------------+| | Date | Node | Profit |+----+------------+---------+-------------+| 0 | 2019-11-18 | Alpha | -79.629698 || 4 | 2019-11-18 | Bravo | -87.698670 || 8 | 2019-11-18 | Charlie | 66.302287 || 12 | 2019-11-18 | Delta | -131.682322 || 1 | 2019-11-19 | Alpha | -17.452517 |+----+------------+---------+-------------+我确信有一种更有效的方法可以解决这个问题。我不了解优化的复杂性，但我知道它足以知道它是一个可能的解决方案。我不明白线性优化和非线性之间的区别，所以如果我弄错了命名法，我深表歉意。谁能建议我应该尝试的方法？

查看完整描述

2 回答

慕莱坞森

TA贡献1810条经验获得超4个赞

我所做的总结：

从利润列表创建字典
运行每个键值对的排列
遍历每一对以分别获得名称和金额的组合。
按名称排序容器列表，按名称分组，对每个分组的数量求和，并将最终结果加载到字典中。
将字典读入数据框并按利润按降序对值进行排序。

我相信所有的处理都应该在它进入数据帧之前完成，你应该得到显着的加速：

from collections import defaultdict

from operator import itemgetter

from itertools import permutations, groupby

d = defaultdict(list)

for k, v,s in profits:

d[k].append((v,s))

container = []

for k,v in d.items():

l = (permutations(v,2))

#here I combine the names and the amounts separately into A and B

for i,j in l:

A = i[0]+'_'+j[0]

B = i[-1]+(j[-1]*-1)

container.append([A,B])

#here I sort the list, then groupby (groupby wont work if you don't sort first)

container = sorted(container, key=itemgetter(0,1))

sam = dict()

for name, amount in groupby(container,key=itemgetter(0)):

sam[name] = sum(i[-1] for i in amount)

outcome = pd.DataFrame

.from_dict(sam,

orient='index',

columns=['Profit'])

.sort_values(by='Profit',

ascending=False)

Profit

Bravo_Alpha 149.635831

Delta_Alpha 101.525568

Charlie_Alpha 78.601245

Bravo_Charlie 71.034586

Bravo_Delta 48.110263

Delta_Charlie 22.924323

Charlie_Delta -22.924323

Delta_Bravo -48.110263

Charlie_Bravo -71.034586

Alpha_Charlie -78.601245

Alpha_Delta -101.525568

Alpha_Bravo -149.635831

当我在我的 PC 上运行它时，它是 1.24 毫秒，而 urs 是 14.1 毫秒。希望有人可以更快地生产出一些东西。

更新：

我为第一个所做的一切都是不必要的。不需要置换 - 乘数为 -1。这意味着我们需要做的就是获取每个名称的总和，将名称配对（不重复），将其中一个值乘以 -1 并添加到另一个值，然后当我们得到一对的一次性总和时，乘以 - 1 再次得到相反的结果。我得到了大约 18.6μs 的速度，一旦引入 pandas，它就达到了 273μs。这是一些显着的加速。大多数计算都将数据读入 pandas。开始：

from collections import defaultdict

from operator import itemgetter

from itertools import combinations, chain

import pandas as pd

def optimizer(profits):

nw = defaultdict(list)

content = dict()

[nw[node].append((profit)) for dat,node,profit in profits]

#sum the total for each key

B = {key : sum(value) for key ,value in nw.items()}

#multiply the value of the second item in the tuple by -1

#add that to the value of the first item in the tuple

#pair the result back to the tuple and form a dict

sumr = {(first,last):sum((B[first],B[last]*-1))

for first,last

in combinations(B.keys(),2)}

#reverse the positions in the tuple for each key

#multiply the value by -1 and pair to form a dict

rev = {tuple(reversed(k)): v*-1

for k,v in sumr.items()}

#join the two dictionaries into one

#sort in descending order

#and create a dictionary

result = dict(sorted(chain(sumr.items(),

rev.items()

key = itemgetter(-1),

reverse=True

))

#load into pandas

#trying to reduce the compute time here by reducing pandas workload

return pd.DataFrame(list(result.values()),

index = list(result.keys()),

)

我可能会延迟读取数据帧，直到不可避免。我很想知道你最后运行它时的实际速度是多少。

反对回复 2022-07-05

慕姐4208626

TA贡献1852条经验获得超7个赞

这在技术上不是答案，因为它没有使用优化技术解决，但希望有人会发现它有用。

从测试来看，DataFrame 的构建和连接是缓慢的部分。使用 Numpy 创建配对价格矩阵非常快：

arr = df['profit'].values + df['profit'].multiply(-1).values[:, None]

生成每个节点乘以每个节点的矩阵：

+---+-------------+------------+------------+------------+

| | 0 | 1 | 2 | 3 |

+---+-------------+------------+------------+------------+

| 0 | 0.000000 | 149.635831 | 78.598163 | 101.525670 |

+---+-------------+------------+------------+------------+

| 1 | -149.635831 | 0.000000 | -71.037668 | -48.110161 |

+---+-------------+------------+------------+------------+

| 2 | -78.598163 | 71.037668 | 0.000000 | 22.927507 |

+---+-------------+------------+------------+------------+

| 3 | -101.525670 | 48.110161 | -22.927507 | 0.000000 |

+---+-------------+------------+------------+------------+

number of nodes如果您构造一个维度为*的空 numpy 数组number of nodes，那么您可以简单地将 daily 数组添加到 totals 数组中：

total_arr = np.zeros((4, 4))

# Do this for each day

arr = df['profit'].values + df['profit'].multiply(-1).values[:, None]

total_arr += arr

一旦你有了它，你需要做一些 Pandas voodoo 将节点名称分配给矩阵并将矩阵分解为单独的多/空/利润行。

我最初的（详尽的）搜索用了 47 分钟和 60 天的数据。现在已经缩短到 13 秒。

完整的工作示例：

profits = [

{'date':'2019-11-18', 'node':'A', 'profit': -79.629698},

{'date':'2019-11-19', 'node':'A', 'profit': -17.452517},

{'date':'2019-11-20', 'node':'A', 'profit': -19.069558},

{'date':'2019-11-21', 'node':'A', 'profit': -66.061564},

{'date':'2019-11-18', 'node':'B', 'profit': -87.698670},

{'date':'2019-11-19', 'node':'B', 'profit': -73.812616},

{'date':'2019-11-20', 'node':'B', 'profit': 198.513246},

{'date':'2019-11-21', 'node':'B', 'profit': -69.579466},

{'date':'2019-11-18', 'node':'C', 'profit': 66.3022870},

{'date':'2019-11-19', 'node':'C', 'profit': -16.132065},

{'date':'2019-11-20', 'node':'C', 'profit': -123.73898},

{'date':'2019-11-21', 'node':'C', 'profit': -30.046416},

{'date':'2019-11-18', 'node':'D', 'profit': -131.68222},

{'date':'2019-11-19', 'node':'D', 'profit': 13.2964730},

{'date':'2019-11-20', 'node':'D', 'profit': 23.5950530},

{'date':'2019-11-21', 'node':'D', 'profit': 14.1030270},

]

# Initialize a Numpy array of node_length * node_length dimension

profits_df = pd.DataFrame(profits)

nodes = profits_df['node'].unique()

total_arr = np.zeros((len(nodes), len(nodes)))

# For each date, calculate the pairs profit matrix and add it to the total

for date, date_df in profits_df.groupby('date'):

df = date_df[['node', 'profit']].reset_index()

arr = df['profit'].values + df['profit'].multiply(-1).values[:, None]

total_arr += arr

# This will label each column and row

nodes_series = pd.Series(nodes, name='node')

perms_df = pd.concat((nodes_series, pd.DataFrame(total_arr, columns=nodes_series)), axis=1)

# This collapses our matrix back to long, short, and profit rows with the proper column names

perms_df = perms_df.set_index('node').unstack().to_frame(name='profit').reset_index()

perms_df = perms_df.rename(columns={'level_0': 'long', 'node': 'short'})

# Get rid of long/short pairs where the nodes are the same (not technically necessary)

perms_df = perms_df[perms_df['long'] != perms_df['short']]

# Let's see our profit

perms_df.sort_values('profit', ascending=False)

结果：

+----+------+-------+-------------+

+----+------+-------+-------------+

| 4 | B | A | 149.635831 |

+----+------+-------+-------------+

| 12 | D | A | 101.525670 |

+----+------+-------+-------------+

| 8 | C | A | 78.598163 |

+----+------+-------+-------------+

| 6 | B | C | 71.037668 |

+----+------+-------+-------------+

| 7 | B | D | 48.110161 |

+----+------+-------+-------------+

| 14 | D | C | 22.927507 |

+----+------+-------+-------------+

| 11 | C | D | -22.927507 |

+----+------+-------+-------------+

| 13 | D | B | -48.110161 |

+----+------+-------+-------------+

| 9 | C | B | -71.037668 |

+----+------+-------+-------------+

| 2 | A | C | -78.598163 |

+----+------+-------+-------------+

| 3 | A | D | -101.525670 |

+----+------+-------+-------------+

| 1 | A | B | -149.635831 |

+----+------+-------+-------------+

感谢 sammywemmy 帮助我整理问题并提出一些有用的东西。

反对回复 2022-07-05

2 回答
0 关注
123 浏览

关注

添加回答

0/150

提交

取消

热搜

最近搜索清空

在 Python 中寻找最有利可图的多头/空头对排列 - 一个优化问题？

在 Python 中寻找最有利可图的多头/空头对排列 - 一个优化问题？

2 回答

添加回答