Artificial intelligence and machine learning are the buzzwords these days. These are technologies that have been introduced in every industry, cybersecurity has also joined the bandwagon. Attacks and incidents in the past have proved that companies need to respond faster to security incidents. Attacks are increasing in number and in volume these days, so companies are automating tasks using AI and ML. We need the system to analyze the behavior of the bad guys, identify new vulnerabilities, and prevent exploits and attacks. With conventional approaches, it can take anywhere between a day to a couple of years to identify a compromise. With AI in place, we expect the system to identify the weakness and patch it much faster. The system can detect if it is being attacked, and take appropriate mitigation steps.

What are AI and ML?

People tend to use AI and ML interchangeably, but that is incorrect. Before we proceed, let’s discuss what these two exactly are. Machine learning is a subset of AI. All ML is AI but not the vice versa. When machines develop and perform “smart” tasks that are not programmed in them, it is considered AI. ML is when machines process information to learn patterns, and uses that information to predict patterns in the new data.

I usually do bank transactions in the morning over the weekdays. Once I did a transaction on a Saturday night, I was prompted by the banking site to do another level of authentication since it was not a usual behavior from the user. This is how ML works for detecting frauds- it had enough data from my transaction behavior, that it was able to predict the pattern and detect potential fraud if any.

Here are a few more examples from our day-to-day life that use AI- Apple’s Siri, Samsung’s Bixby, Netflix recommendation engine, games like call of duty, self-driving cars, spam filtering engines, ride sharing in Uber, speech and pattern recognition, etc.

Another prominent example of an AI-based machine is Jarvis (not from the Iron Man movie). Facebook’s CEO Mark Zuckerberg has built Jarvis, it is used in his smart home. Jarvis is able to process language, control sensors, doors, cameras, light, and thermal controls, and perform face recognition.

Impact of AI on CyberSecurity

Let’s discuss an example - how will AI and ML impact cybersecurity? Organisation A has 300 employees and a small network setup. The security team consists of 2 people that are responsible for the organization’s information security. It is practically impossible for the team to take care of all the day-to-day roles along with enhancement. The below link will walk you through the roles and responsibilities of an information security team.

AI-based solutions will help in reducing the monotony in the tasks and help the team focus on other stuff that will add more value to the organization. AI solutions require expert people to get started, but that will be of help in various domains and operations afterward. Expert assistance is required to ensure that the system understands the environment and is able to make decisions that fit the organization well. AI solutions can assist the organization in:

Vulnerability analysis
Malware analysis
Threat detection
Security monitoring and response
Host anomaly detection

We will discuss some of these in detail later in the article. Stay tuned!

Application of ML in CyberSecurity

We have talked much about AI; ML is a subset of AI, and as it is being used practically, so we will stick to the use of the word ML and leave AI alone for some time. Before we jump to the application of ML; let’s discuss various ML types, algorithms used, training the machine, and the situations to which that can be applied, to get the expected response. This understanding will help you break down the problem into smaller parts and in applying the concepts to come up with a solution. We will also discuss the application in cybersecurity (find your perfect cybersecurity certification here).

Definitions:

Labeled data: Datasets that will be used to train the machines. Labeled data will consist of input and expected output pairs.
Classification: The goal is to get the response from a subset. This subset can be either binary or a set of strings; e.g. 0, 1, threat, normal, etc.
Regression: Aims to predict values that are continuous in nature; e.g. stock prices, currency exchange values, etc.

Types of machine learning (basis usage):

Supervised learning: This requires the machine to be fed with labeled data and generate relationships accordingly. The machines are fed with input and expected responses, and the machine will generate the dependencies and relations. Using this learning, the solution is able to generate responses. Some of the algorithms used in supervised learning are- decision tree, nearest neighbor, and linear regression.

From a cybersecurity standpoint, this type of learning technique can be used by Intrusion detection systems. The system can be trained to identify the anomalies in the traffic and then take action accordingly. The system can learn what to look for in the traffic stream and generate responses accordingly. For starters, the responses can be as simple as alerting the administrator. In the case of an IPS, the system can take actions, it will decide whether to block the traffic or allow it. You may think that this can be done by a conventional system as well. Yes, true, but ML will develop over time to identify more complex attacks and take decisions; whereas, the conventional system will fail.

Semi-supervised learning: As the name suggests, this learning technique is used when the data is a mix of labeled and unlabeled data. This does not mean that the unlabelled data will be less important. The machine will treat the data with equal priority and assume that it has some data.
Unsupervised learning: Here, the data sets used are not labeled at all. In this case, the learning is left to the machine itself; without any human training intervention. This is done in cases when the humans are not sure about the responses or what to find in the data available. This can be used for pattern identification and data grouping; e.g. data can be a set of Twitter feeds, text messages, etc. Data can be grouped into various categories using this technique. Some common algorithms involved are clustering and association rules algorithms.

Unsupervised learning can be used for cryptographic purposes. Although the crypto algorithms are strong enough not to create a pattern, there can still be spaces that ML can fill. Machines can learn and identify the algorithm used, the key size used, and various other parameters. The machine can also be used to find weaknesses in a crypto algorithm.

Reinforcement learning: In reinforcement learning, machines will make decisions based on environmental factors; each set of the situation, environmental factor, and response will generate a state. When a decision is required, the machine will observe the state and generate a response. These responses will generate feedback that is stored in the state. This approach is used to minimize the risk and maximize the expected response reward. This learning technique can be used in the case of self-driving cars, where the response is highly dependent on environmental factors.

Let’s continue with the same example of self-driven cars. These cars need to have a lot of sensors that will collect data and record it. This data will be used to identify the type of environment. If someone is trying to break open the car, how will this be identified and what will be the response? The car may detect a wrong key inserted as a forced attempt; this can either be due to the driver’s carelessness or a genuine attempt at stealing the car. This type of learning techniques can be used by the car to ensure physical security.

ML Algorithms: The target of an ML algorithm is to learn a function that can best map a given set of data input and output. Let’s see what types of ML algorithms we have without digging deep into the details of how the algorithm works.

Linear regression
Logistic regression
Tree-based modeling
Bayesian modeling
Support vector machines
K nearest neighbors
K-means
Neural networks and Perceptron
Ensemble modeling
Anomaly detection

How can AI-powered cybersecurity tools benefit an organization?

Now, we have discussed enough concepts on how AI and ML work, and what is targeted out of it. Let’s unveil a few cybersecurity tools that are available in the market (Also, if you have some experience in Cybersecurity, consider taking up the cissp certification). We will also discuss the technical workings in order to get an understanding of how this will benefit an organization.

Partnerships between companies are common these days for better solution offerings and conquering the market. Two such companies are Crowdstrike and Vectra. Crowdstrike is the leader in cloud delivery end-point solutions and Vectra works in the automation of threat hunting in cyber attacks. Falcon insight from Crowdstrike and Cognito from Vectra are combined to create an overall threat-hunting solution. Falcon covers the endpoint protection part and Cognito take care of the network. Cognito detects hidden threats in the network and combines them with AI to determine the hackers behavior on the network and host, without any requirement of signature or reputation database. Since AI is involved, Cognito will learn the normal network behavior. Any deviation from the normal behavior will trigger the trap and remediation kicks in. Since the tool is able to determine the anomaly in the network and host, it will:

Reduce the investigation time.
Assist in comprehensive detection.
Assist in selecting the correct course of action.
Provide a targeted response.

The solution is agent-based with features like log recording, threat hunting, exhaustive searching of network and host info, threat response, and containment of the suspect. While all this is being done, the team can spend the time to prevent loss and bring efficiency in security operations.

Visit: https://vectra.ai/ for more information.

Another solution is the Enterprise Immune system from Darktrace. Just like life has developed over time and DNA has mutated itself to adapt to the environment, this tool evolves too. The tool creates a life for each device on the network. Each device has an operating pattern, which is learned over time. All this information is collected and correlated to identify new threats which would otherwise go undetected. It also has a threat visualizer that will create a 3D view of the threat notification and activities in the network and devices (enterprise-level overview). These can then be used to cluster the anomalies, replay history, identify ransomware, etc. The tool works on its own, using unsupervised learning to find anomalies and make judgments. There is a specialized module named Darktrace Antigena that kicks in to take response steps in case of a cyber threat. Consider the case when a cyber attack happens on a weekend when the security team is unavailable. This module can take remediation steps and can prevent business loss.

For more product information and a free trial, visit: https://www.darktrace.com/.

Attacker’s take on AI

AI and ML can also be used by attackers for malicious tasks. Till now, we have sensed the advantages of AI and how it can boost cybersecurity. But, this does not mean that AI cannot be defeated. We just need to know the limitations before jumping to the implementation part. AI needs a lot of human intervention to be trained and ensure that false positives are removed. Leaving the AI on its own will lead to incorrect responses. AI needs to be trained in a manner that it will be able to take up the monotonous tasks first and then move forward to the complex tasks. The machine can then take its toll on developing and coming up with optimized solutions. On similar lines, attackers can also teach AI to attack and train the solution to launch sophisticated attacks depending on the responses that the system is taught to identify and tweak the attack accordingly.

Conclusion

AI and ML can be applied in various domains of cybersecurity, but, it will only be effective if implemented correctly. The hardest part is the collection and processing of data, we cannot always rely on unsupervised learning. One way to tackle this is to create a hybrid approach, creating a solution that will learn using data sets whenever available, along with rule-based decisions; and some unsupervised learning to detect both known and zero-day threats. Since you have come this far, I will share some bonus pointers. Here is a list of cybersecurity companies that work on AI-based solutions. Look out for them and read the solution details (and also consider checking out this career guide for cyber security Jobs) This will give a better idea of the power of AI and where the cybersecurity industry is going with AI in hand.

Crowdstrike
Cybereason
ENDGAME
Shape
perimeter
Obsidian

There is something for those who want to take the first step in AI and ML. Learning AI can be tough since the field is relatively new. You can start by creating the environment for AI development using Python Anaconda (https://www.anaconda.com/download/).

You are on your way to learning AI. Good luck!