对具有多个值和权重的项目进行排名的最快方法

小编典典

对具有多个值和权重的项目进行排名的最快方法

algorithm

我有这样的一组键值对：

{ 
   'key1': [value1_1, value2_1, value3_1, ...], 
   'key2': [value1_2, value2_2, value3_2, ...],
   ...
 }

还有一个与值列表顺序相同的列表，其中包含每个变量应应用的权重。所以看起来像[weight_1, weight_2, weight_3, ...]。

我的目标是最终得到一个有序的键列表，该键列表具有最高的总得分值。请注意，并非所有值都经过标准化/标准化，因此value1_x的范围可能为1-10，但value2_x的范围可能是1-100000。这对我来说是个棘手的部分，因为我必须以某种方式将数据标准化。

我正在尝试使该算法能够按比例缩放以适应许多不同的值，因此它将花费相同的时间（1或100）（或者至少是对数时间）。那可能吗？有什么真正有效的方法可以解决这个问题吗？

阅读 548

2020-07-28

共1个答案

小编典典

您无法获得线性时间，但可以更快地完成。对我来说，这看起来像是一个矩阵乘法，所以我建议您使用numpy：

import numpy as np

keys = ['key1', 'key2', 'key3']

values = np.matrix([
    [1.1, 1.2, 1.3, 1.4],
    [2.1, 2.2, 2.3, 2.4],
    [3.1, 3.2, 3.3, 3.4]
])

weights = np.matrix([[10., 20., 30., 40.]]).transpose()

res = (values * weights).transpose().tolist()[0]

items = zip(res, keys)
items.sort(reverse=True)

这使

[(330.0, 'key3'), (230.0, 'key2'), (130.0, 'key1')]

编辑： 有感谢@Ondro为np.dot并以@unutbu为np.argsort，这里是numpy的完全的改进版本：

import numpy as np

# set up values
keys = np.array(['key1', 'key2', 'key3'])
values = np.array([
    [1.1, 1.2, 1.3, 1.4],    # values1_x
    [2.1, 2.2, 2.3, 2.4],    # values2_x
    [3.1, 3.2, 3.3, 3.4]     # values3_x
])
weights = np.array([10., 20., 30., 40.])

# crunch the numbers
res = np.dot(values, -weights)   # negative of weights!

order = res.argsort(axis=0)  # sorting on negative value gives
                             # same order as reverse-sort; there does
                             # not seem to be any way to reverse-sort
                             # directly
sortedkeys = keys[order].tolist()

结果是['key3', 'key2', 'key1']。

2020-07-28