find_most_similarity

czsc.find_most_similarity(vector: Series, matrix: DataFrame, n=10, metric='cosine', **kwargs)[source]

寻找向量在矩阵中最相似的n个向量

Parameters:
  • vector – 1维向量, Series结构

  • matrix – 2维矩阵, DataFrame结构, 每一列是一个向量,列名是向量的标记

  • n – int, 返回最相似的n个向量

  • metric

    str, 计算相似度的方法,

    • From scikit-learn: [‘cityblock’, ‘cosine’, ‘euclidean’, ‘l1’, ‘l2’,

    ’manhattan’]. These metrics support sparse matrix inputs. [‘nan_euclidean’] but it does not yet support sparse matrices.

    • From scipy.spatial.distance: [‘braycurtis’, ‘canberra’, ‘chebyshev’,

    ’correlation’, ‘dice’, ‘hamming’, ‘jaccard’, ‘kulsinski’, ‘mahalanobis’, ‘minkowski’, ‘rogerstanimoto’, ‘russellrao’, ‘seuclidean’, ‘sokalmichener’, ‘sokalsneath’, ‘sqeuclidean’, ‘yule’] See the documentation for scipy.spatial.distance for details on these metrics. These metrics do not support sparse matrix inputs.

  • kwargs – 其他参数