One of the important aspects of machine learning is known as K fold Cross-Validation. Before we consider K fold cross-validation, remember that any of machine learning model we have to divide the data into at least two parts. We have earlier said in Setosa class of Iris dataset, we had 150 observations and there were only three species/classes. How do we train a particular machine learning model with this simple dataset of Iris flowers, 150 observations, and 3 classes? In this case, we have divided the dataset into two parts.
For example, if the data is randomly divided into 100 observations into one part and a machine learning model is used as an input and gets an output. So machine learning algorithms learn from these observations, and it generates its actual output as well as predicted output. Initially, the error may be high, but once it gets trained, the errors get reduced and we see a predicted output that is very close to the actual output. This is called supervised learning.
Now, about the total 150 observations, we have used 100 for training the model and the rest 50 which are unseen by the model are known as testing data. These are unseen data by the model, and when we give this testing data of 50 observations to the model, it gives us a correct prediction as the output. By giving the input to the model, we derive the desired output, which means the first 100 observations used by the model are called as training data and the rest 50 observations are known as testing data. So any machine learning algorithm by default uses training data as well as testing data to test the accuracy of the model thereby minimizing the errors. Hence, the K fold cross-validation is an important concept of the machine learning algorithm where we divide our data into K number of folds, where K is equal to or less than 10 or more than 10, depending upon the data.
In K fold cross-validation concept, the objective is that the overfitting is reduced as the data is divided into four folds: fold 1, 2, 3 and 4. The initial fold 1 is a test set, the other three folds are in the training data so that we can train our model with these folds. In the second time, the first fold is test and the second fold also becomes test set and the other folds become training data. This implies that every dataset and every fold is once a training data and gradually becomes a test dataset. This is for reducing the overfitting and atlast we can calculate the average error.