data_preprocessing

cosmo_utils.ml.ml_utils.data_preprocessing(feat_arr, pre_opt='min_max', reshape=False)[source] [edit on github]

Preprocess the data used, in order to clean and make the data more suitable for the machine learning algorithms

Parameters:

feat_arr : numpy.ndarray, list, pandas.DataFrame

Array of feature values. This array is used for training a ML algorithm.

pre_opt : {‘min_max’, ‘standard’, ‘normalize’, ‘no’} str, optional

Type of preprocessing to do on feat_arr.

Options:
  • ‘min_max’ : Turns feat_arr to values between (0,1)
  • ‘standard’ : Uses StandardScaler method
  • ‘normalize’ : Uses the Normalizer method
  • ‘no’ : No preprocessing on feat_arr

reshape : bool, optional

If True, it reshapes feat_arr into a 1d array if its shapes is equal to (ncols, 1), where ncols is the number of columns. This variable is set to False by default.

Returns:

feat_arr_scaled : numpy.ndarray

Rescaled version of feat_arr based on the choice of pre_opt.

Notes

For more information on how to pre-process your data, see `http://scikit-learn.org/stable/modules/preprocessing.html`_.