When creating machine learning
models in python, using for instance the libraries numpy, pandas and sklearn,
depending on the algorithms selected, the input data should be on the numerical
format.
In this post two methods to
transform the data from string (or object) to numerical data types are shown,
using different approaches from the pandas library. If this data type conversion is not
done, the following error will appear: “ValueError: could not convert string to
float: ‘male’”.
The first example is done
using the following method:
X = pd.get_dummies(df[features].fillna(-1))
And the second example is done
using:
X =df[features].fillna(-1)
X['gender'] = X['gender'].astype('category').cat.codes
These techniques are particularly useful when working with grouped data such as gender, geographical location (country, state, city), and much more categorical applications.
See below the program output using the Titanic dataset and the source code can be found on this github link:
No comments:
Post a Comment