What do you mean by workflow ?
In case of Machine Learning, workflow means series of steps, through which a project goes till the time of its completion.
The general steps that are always advised to follow for a machine learning project are:
Now, how do we start ?
Defining the problem:
We need to get the basic understanding of our problem, what does it want us to do and what can be the possible logical solutions.
Our model must be accurate.
The quality of our model, the final results. All depend on quantity and quality of data collected. Therefore this step is quite important.
Now since we have understood the problem statement and also gathered the dataset. Now before implementing any model, we need to prepare our data. And by prepare, I mean, we need to clean our data. Because a raw data set contains many values that are not required, they are unwanted values like, null values, repeated values, irrelevant values. So we need to remove all these values. And we need to load our data set as a pandas data frame.
Pandas is a python library that is used to load the data set before we start working on it.
Here matlpotlib is also a python library, that is used to plot graphs.
Exploratory Data Analysis:
Here we develop statistics to find a trend in the data, or if there is any relationship in the data or not. It is an open ended process. These findings help us to know about features that we can choose in our model.
Now once we find out about these features, now is the time for improving these features by feature engineering selection.
Feature Engineering Selection:
It is the process of finding out the most important features from the raw data set that we have prepared. It help us to remove the features from the model that are not required, this help us to create a better model.
Using One-Hot encoder is one of the few steps of Feature Engineering. This process help us to include categorical variables in our data.
Choosing the best model:
Choosing the best model depends on the accuracy score, error rate of that model. One way to choose the best model is to train each and every model and take the results of that model that is showing the best results out of them (obviously, a time taking process, but quite interesting if we get familiar). This step also includes training the data set and fitting our data in the model and then testing it to predict and get the accuracy score.
Once the evaluation is over, we can check for better results by tuning the parameters. There are several parameters, if their values are being changed then obviously we will observe some change in the results and most importantly, in our accuracy score. These parameters are known as hyper parameters, and different values of them are totally dependent on the model on which we are working.
And at last, interpreting the results, well it varies from person to person, problem to problem. So find out your problem definition and start interpreting your own results.
Go on, prepare your model, and find out the accuracy of any model. I will talk about the coding part of any model in the next article.
Till then, keep learning !