data_preprocessing¶
-
cosmo_utils.ml.ml_utils.data_preprocessing(feat_arr, pre_opt='min_max', reshape=False)[source] [edit on github]¶ Preprocess the data used, in order to clean and make the data more suitable for the machine learning algorithms
Parameters: feat_arr :
numpy.ndarrayArray of feature values. This array is used for training a ML algorithm.
pre_opt : {‘min_max’, ‘standard’, ‘normalize’, ‘no’}
str, optionalType of preprocessing to do on
feat_arr.- Options:
- ‘min_max’ : Turns
feat_arrto values between (0,1) - ‘standard’ : Uses
StandardScalermethod - ‘normalize’ : Uses the
Normalizermethod - ‘no’ : No preprocessing on
feat_arr
- ‘min_max’ : Turns
reshape :
bool, optionalIf True, it reshapes
feat_arrinto a 1d array if its shapes is equal to (ncols, 1), wherencolsis the number of columns. This variable is set toFalseby default.Returns: feat_arr_scaled :
numpy.ndarrayRescaled version of
feat_arrbased on the choice ofpre_opt.Notes
For more information on how to pre-process your data, see `http://scikit-learn.org/stable/modules/preprocessing.html`_.