ALP

class frlearn.data_descriptors.ALP(dissimilarity: str or float or Callable[[np.array], float] or Callable[[np.array, np.array], float] = 'boscovich', k: int or Callable[[int], float] or None = <function log_multiple.<locals>._f>, l: int or Callable[[int], float] or None = <function log_multiple.<locals>._f>, scale_weights: Callable[[int], np.array] | None = LinearWeights(), localisation_weights: Callable[[int], np.array] | None = LinearWeights(), nn_search: NeighbourSearchMethod = <frlearn.neighbours.neighbour_search_methods.KDTree object>, max_array_size: int = 67108864, preprocessors=(<frlearn.statistics.feature_preprocessors.IQRNormaliser object>,))[source]

Implementation of the Average Localised Proximity (ALP) data descriptor [1]. Expresses the proximity of a query instance to the target data, by localising its nearest neighbour distances against the local nearest neighbour distances in the target data.

Parameters
dissimilarity: str or float or (np.array -> float) or ((np.array, np.array) -> float) = ‘boscovich’

The dissimilarity measure to use.

A vector size measure np.array -> float induces a dissimilarity measure through application to y - x. A float is interpreted as Minkowski size with the corresponding value for p. For convenience, a number of popular measures can be referred to by name.

The default is the Boscovich norm (also known as cityblock, Manhattan or taxicab norm).

kint or (int -> float) or None = 5.5 * log n

How many nearest neighbour distances / localised proximities to consider. Corresponds to the scale at which proximity is evaluated. Should be either a positive integer, or a function that takes the target class size n and returns a float, or None, which is resolved as n. All such values are rounded to the nearest integer in [1, n].

lint or (int -> float) or None = 6 * log n

How many nearest neighbours to use for determining the local ith nearest neighbour distance, for each i <= k. Lower values correspond to more localisation. Should be either a positive integer, or a function that takes the target class size n and returns a float, or None, which is resolved as n. All such values are rounded to the nearest integer in [1, n].

scale_weights(int -> np.array) or None = LinearWeights()

Weights to use for calculating the soft maximum of localised proximities. Determines to which extent scales with high localised proximity are emphasised.

localisation_weights(int -> np.array) or None = LinearWeights()

Weights to use for calculating the local ith nearest neighbour distance, for each i <= k. Determines to which extent nearer neighbours dominate.

max_array_sizeint = 2**26

Maximum array size to use. For a query set of size q, calculating local distances requires an array of size [q, l, k], which can be too large to fit in memory. If the size of this array is larger than max_array_size, a query set is batch-processed, which is slower. TODO: determine maximum array size dynamically, investigate lowering float precision

preprocessorsiterable = (IQRNormaliser(), )

Preprocessors to apply. The default interquartile range normaliser rescales all features to ensure that they all have the same interquartile range.

Notes

k and l are the two principal hyperparameters that can be tuned to increase performance. Its default values are based on the empirical evaluation in [1].

References

1(1,2)

Lenz OU, Peralta D, Cornelis C (2021). Average Localised Proximity: A new data descriptor with good default one-class classification performance. Pattern Recognition, vol 118, no 107991. doi: 10.1016/j.patcog.2021.107991

class Model[source]

Examples using frlearn.data_descriptors.ALP