chemicalchecker.util.transform.gaussianize.Gaussianize
- class Gaussianize(tol=0.000122, max_iter=100, verbose=False, strategy='lambert')[source]
Bases:
TransformerMixin
Gaussianize data using various methods.
Conventions
This class is a wrapper that follows sklearn naming/style (e.g. fit(X) to train). In this code, x is the input, y is the output. But in the functions outside the class, I follow Georg’s convention that Y is the input and X is the output (Gaussianized) data.
Parameters
tol : float, default = 1e-4
- max_iterint, default = 100
Maximum number of iterations to search for correct parameters of Lambert transform.
- strategystr, default=’lambert’
Possibilities are ‘lambert’[1], ‘brute’[2] and ‘boxcox’[3].
Attributes
- coefs_list of tuples
For each variable, we have transformation parameters. For Lambert, e.g., a tuple consisting of (mu, sigma, delta), corresponding to the parameters of the appropriate Lambert transform. Eq. 6 and 8 in the paper below.
References
- [1] Georg Goerg. The Lambert Way to Gaussianize heavy tailed data with
the inverse of Tukey’s h transformation as a special case
Author generously provides code in R: https://cran.r-project.org/web/packages/LambertW/
[2] Valero Laparra, Gustavo Camps-Valls, and Jesus Malo. Iterative Gaussianization: From ICA to Random Rotations [3] Box cox transformation and references: https://en.wikipedia.org/wiki/Power_transform
Methods
Fit a Gaussianizing transformation to each variable/column in x.
Fit to data, then transform it.
Recover original data from Gaussianized data.
Show qq plots compared to normal before and after the transform.
Transform new data using a previously learned Gaussianization model.
- fit_transform(X, y=None, **fit_params)
Fit to data, then transform it.
Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.
Parameters
- Xarray-like of shape (n_samples, n_features)
Input samples.
- yarray-like of shape (n_samples,) or (n_samples, n_outputs), default=None
Target values (None for unsupervised transformations).
- **fit_paramsdict
Additional fit parameters.
Returns
- X_newndarray array of shape (n_samples, n_features_new)
Transformed array.