What is a Confusion Matrix?
The model’s accuracy and correctness are measured using the confusion matrix. It can be applied to binary or multiple class classification issues. Many metrics are measured using the confusion matrix measurements, even though it is not a direct measure of performance. A confusion matrix is a multi-dimensional matrix where the predicted values are represented in the rows and the true values in the columns. The target variable in a binary classification problem will have two values, 1 or 0, referred to as actual values as True or False, respectively. The model’s predictions are referred to as expected values.
Source: Explorium.AI
True Positives (TP)
True Positives are the number of cases in which the actual value of a data sample matches the anticipated value.
True Negatives (TN)
True Negatives is a statistic that counts the number of cases in which the actual value of a data sample is zero, and the anticipated value is also zero.
False Positives (FP)
False Positives refer to the number of occurrences in which the actual value of a data sample is 0, but the predicted value is 1.
False Negatives (FN)
False Negatives is a statistic that counts the number of occurrences in which the actual value of a data sample is 1, but the projected value is 0.
The model’s performance will be favorable, with greater values of TP and TN and lower values of FP and FN, based on the meaning of the preceding terminology. The model should be trained to maximize TP and TN while minimizing FP and FN values. If either, which of FP and FN should be minimized depends on the categorization problem’s requirements. Keeping False Negatives to a minimum will be crucial in the medical field.
For example, suppose the classification challenge is determining whether or not the patient has a significant disease such as cancer or HIV. Take, for example, if the patient has cancer, which is represented by 1, and whether the patient does not have cancer, which is represented by 0. In this scenario, reducing False Positives over False Negatives is usually preferable.
That is if a patient has cancer (1) and the model predicts a negative (0) – False Negatives – the patient and diagnosis could be jeopardized. As a result, FN must decrease as much as feasible. On the other hand, if the patient does not have cancer (0) but the model predicts that they have had cancer (1) – False Positives – this will have fewer ramifications because, in most cases, subsequent tests will be undertaken for crucial diseases before the disease is confirmed as positive. As a result, False Positives are preferable to False Negatives in this problem.
Benefits of Confusion Matrix
- It demonstrates how any classification model can be perplexed when making predictions.
- The confusion matrix indicates the sorts of errors that are being made by your classifier in addition to the errors that are being produced.
- Using this breakdown, you can get around the problems by relying entirely on classification accuracy.
- Each column of the confusion matrix displays instances of that projected class.
- Each row of the confusion matrix maps to an actual class instance.
- It reveals not only the errors made by a classifier but also the errors that humans commit
How Do You Calculate Confusion Matrices?
Enlisted below are the steps to calculate confusion matrices:
- You should have a test or validation dataset with expected outcome values.
- Next, predict each row in your test dataset.
- The following are the expected outcomes and predictions:
- The number of correct guesses for each class.
- The total number of inaccurate predictions for each class is sorted by projected class.
Implementation of Confusion Matrix in Sklearn
from sklearn.metrics import confusion_matrix
# the true labels of the given dataset
y_true = [1, 2, 0, 2, 1, 0]
# the predicted labels of the given dataset
y_pred = [1, 0, 1, 2, 0, 1]
# get the confusion matrix of the dataset
confusion_matrix(y_true, y_pred)
Output
[1, 1, 0],
[1, 0, 1]])
Conclusion
We learned about the confusion matrix and its implementation in sklearn. Sklearn is a popular Python-based ML library that implements various metrics and algorithms. The confusion matrix determines the accuracy metrics of classification problems based on true positives or true negatives or false positives or false negatives.