Confusion Matrix and it’s use in Cyber Security

Rishabh Manhas
4 min readJun 4, 2021

Particularly in the last decade, Internet usage has been growing rapidly. However, as the Internet becomes a part of the day to day activities, cybercrime is also on the rise. Cybercrimes are steadily increasing daily. Evaluating cybercrime attacks and providing protective measures by manual methods using existing technical approaches and also investigations has often failed to control cybercrime attacks, due to the massive amount of data which needs to be evaluated. Therefore, machine learning techniques come in use. There are various methodologies for evaluating these attacks, out of which the use of Confusion Matrix is also a widely used method. So, first of all, let's understand what Confusion Matrix is.

What is Confusion Matrix?

The confusion matrix was invented in 1904 by Karl Pearson. He used the term Contingency Table. A confusion matrix is a performance measurement technique for Machine learning classification problems. It’s a simple table which helps us to know the performance of the classification model on test data for the true values are known. Since it shows the errors in the model performance in the form of a matrix, hence also known as an error matrix. Some features of Confusion matrix are given below:

  • For the 2 prediction classes of classifiers, the matrix is of 2*2 table, for 3 classes, it is 3*3 table, and so on.
  • The matrix is divided into two dimensions, that are predicted values and actual values along with the total number of predictions.
  • Predicted values are those values, which are predicted by the model, and actual values are the true values for the given observations.
  • It looks like the below table:

The above table has the following cases:

  • True Negative: Model has given prediction No, and the real or actual value was also No.
  • True Positive: The model has predicted yes, and the actual value was also true.
  • False Negative: The model has predicted no, but the actual value was Yes, it is also called as Type-II error.
  • False Positive: The model has predicted Yes, but the actual value was No. It is also called a Type-I error.

Type-I Error

This type of error can prove to be very dangerous. Our system predicted no attack but in real attack takes place, in that case no notification would have reached the security team and nothing can be done to prevent it. The False Positive cases above fall in this category and thus one of the aim of model is to minimize this value.

Type-II Error

This type of error are not very dangerous as our system is protected in reality but model predicted an attack. the team would get notified and check for any malicious activity. This doesn’t cause any harm. They can be termed as False Alarm.

Example:

Suppose we are trying to create a model that can predict the result for the disease that is either a person has that disease or not. So, the confusion matrix for this is given as:

From the above example, we can conclude that:

  • The table is given for the two-class classifier, which has two predictions “Yes” and “NO.” Here, Yes defines that patient has the disease, and No defines that patient does not has that disease.
  • The classifier has made a total of 100 predictions. Out of 100 predictions, 89 are true predictions, and 11 are incorrect predictions.
  • The model has given prediction “yes” for 32 times, and “No” for 68 times. Whereas the actual “Yes” was 27, and actual “No” was 73 times.

Sample Case Study On How Confusion Matrix Is Used In Monitoring Cyber Attacks

The data set used for The Third International Knowledge Discovery and Data Mining Tools Competition, which was held in conjunction with KDD-99 The Fifth International Conference on Knowledge Discovery and Data Mining. The competition task was to build a network intrusion detector, a predictive model capable of distinguishing between ``bad’’ connections, called intrusions or attacks, and ``good’’ normal connections.

This database contains a standard set of data to be audited, which includes a wide variety of intrusions simulated in a military network environment.
In KDD99 dataset these four attack classes (DoS, U2R,R2L, and probe) are divided into 22 different attack classes that tabulated below:

In the KDD Cup 99, the criteria used for evaluation of the participant entries is the Cost Per Test
(CPT) computed using the confusion matrix and a given cost matrix.
• True Positive (TP): The amount of attack detected when it is actually attack.
• True Negative (TN): The amount of normal detected when it is actually normal.
• False Positive (FP): The amount of attack detected when it is actually normal (False alarm).

• False Negative (FN): The amount of normal detected when it is actually attack.

So, this is one of the way of implementing Confusion Matrix for monitoring and evaluating Cybercrime attacks.

Thank you for reading.

--

--

Rishabh Manhas

CSE Student || Curiosity is the most underrated skill.