data_preprocessing¶
-
cosmo_utils.ml.ml_utils.
data_preprocessing
(feat_arr, pre_opt='min_max', reshape=False)[source] [edit on github]¶ Preprocess the data used, in order to clean and make the data more suitable for the machine learning algorithms
Parameters: feat_arr :
numpy.ndarray
,list
,pandas.DataFrame
Array of feature values. This array is used for training a ML algorithm.
pre_opt : {‘min_max’, ‘standard’, ‘normalize’, ‘no’}
str
, optionalType of preprocessing to do on
feat_arr
.- Options:
- ‘min_max’ : Turns
feat_arr
to values between (0,1) - ‘standard’ : Uses
StandardScaler
method - ‘normalize’ : Uses the
Normalizer
method - ‘no’ : No preprocessing on
feat_arr
- ‘min_max’ : Turns
reshape :
bool
, optionalIf True, it reshapes
feat_arr
into a 1d array if its shapes is equal to (ncols, 1), wherencols
is the number of columns. This variable is set toFalse
by default.Returns: feat_arr_scaled :
numpy.ndarray
Rescaled version of
feat_arr
based on the choice ofpre_opt
.Notes
For more information on how to pre-process your data, see `http://scikit-learn.org/stable/modules/preprocessing.html`_.