chemicalchecker.util.transform.gaussianize.Gaussianize

class Gaussianize(tol=0.000122, max_iter=100, verbose=False, strategy='lambert')[source]

Bases: TransformerMixin

Gaussianize data using various methods.

Conventions

This class is a wrapper that follows sklearn naming/style (e.g. fit(X) to train). In this code, x is the input, y is the output. But in the functions outside the class, I follow Georg’s convention that Y is the input and X is the output (Gaussianized) data.

Parameters

tol : float, default = 1e-4

max_iterint, default = 100

Maximum number of iterations to search for correct parameters of Lambert transform.

strategystr, default=’lambert’

Possibilities are ‘lambert’[1], ‘brute’[2] and ‘boxcox’[3].

Attributes

coefs_list of tuples

For each variable, we have transformation parameters. For Lambert, e.g., a tuple consisting of (mu, sigma, delta), corresponding to the parameters of the appropriate Lambert transform. Eq. 6 and 8 in the paper below.

References

[1] Georg Goerg. The Lambert Way to Gaussianize heavy tailed data with

the inverse of Tukey’s h transformation as a special case

Author generously provides code in R: https://cran.r-project.org/web/packages/LambertW/

[2] Valero Laparra, Gustavo Camps-Valls, and Jesus Malo. Iterative Gaussianization: From ICA to Random Rotations [3] Box cox transformation and references: https://en.wikipedia.org/wiki/Power_transform

Methods

fit

Fit a Gaussianizing transformation to each variable/column in x.

fit_transform

Fit to data, then transform it.

inverse_transform

Recover original data from Gaussianized data.

qqplot

Show qq plots compared to normal before and after the transform.

transform

Transform new data using a previously learned Gaussianization model.

fit(x, y=None)[source]

Fit a Gaussianizing transformation to each variable/column in x.

fit_transform(X, y=None, **fit_params)

Fit to data, then transform it.

Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.

Parameters

Xarray-like of shape (n_samples, n_features)

Input samples.

yarray-like of shape (n_samples,) or (n_samples, n_outputs), default=None

Target values (None for unsupervised transformations).

**fit_paramsdict

Additional fit parameters.

Returns

X_newndarray array of shape (n_samples, n_features_new)

Transformed array.

inverse_transform(y)[source]

Recover original data from Gaussianized data.

qqplot(x, prefix='qq')[source]

Show qq plots compared to normal before and after the transform.

transform(x)[source]

Transform new data using a previously learned Gaussianization model.