VectorSizeNormaliser

class frlearn.feature_preprocessors.VectorSizeNormaliser(measure: str = 'boscovich', target_size: float = 0.5)[source]

Rescales each instance (seen as a vector) to a fixed size. Typically used on datasets of frequency counts, when only the relative frequencies are considered important, e.g. token counts of texts in NLP.

Parameters
measure: str or float or (np.array -> float) = ‘boscovich’

The vector size measure to use. Must be positively homogeneous. A float is interpreted as Minkowski size with the corresponding value for p. For convenience, a number of popular measures can be referred to by name.

target_size: float = 0.5

The size that all vectors will be rescaled to. The default, 0.5, ensures that for Minkowski sizes, the maximum distance in the resulting dataset is 1. A more typical choice is to set this value to 1, so that all instances end up on the unit hypersphere.

Notes

If the size of an instance is 0, it will be left unscaled. If the size of an instance is ∞, it will be scaled to 0.

class Model[source]