Blog

Email

Explore AI Email Security Approaches with Darktrace

Default blog imageDefault blog imageDefault blog imageDefault blog imageDefault blog imageDefault blog image
01
Feb 2021
01
Feb 2021
Stay informed on the latest AI approaches to email security. Explore Darktrace's comparisons to find the best solution for your cybersecurity needs!

Innovations in artificial intelligence (AI) have fundamentally changed the email security landscape in recent years, but it can often be hard to determine what makes one system different to the next. In reality, under that umbrella term there exists a significant distinction in approach which may determine whether the technology provides genuine protection or simply a perceived notion of defense.

One backward-looking approach involves feeding a machine thousands of emails that have already been deemed to be malicious, and training it to look for patterns in these emails in order to spot future attacks. The second approach uses an AI system to analyze the entirety of an organization’s real-world data, enabling it to establish a notion of what is ‘normal’ and then spot subtle deviations indicative of an attack.

In the below, we compare the relative merits of each approach, with special consideration to novel attacks that leverage the latest news headlines to bypass machine learning systems trained on data sets. Training a machine on previously identified ‘known bads’ is only advantageous in certain, specific contexts that don’t change over time: to recognize the intent behind an email, for example. However, an effective email security solution must also incorporate a self-learning approach that understands ‘normal’ in the context of an organization in order to identify unusual and anomalous emails and catch even the novel attacks.

Signatures – a backward-looking approach

Over the past few decades, cyber security technologies have looked to mitigate risk by preventing previously seen attacks from occurring again. In the early days, when the lifespan of a given strain of malware or the infrastructure of an attack was in the range of months and years, this method was satisfactory. But the approach inevitably results in playing catch-up with malicious actors: it always looks to the past to guide detection for the future. With decreasing lifetimes of attacks, where a domain could be used in a single email and never seen again, this historic-looking signature-based approach is now being widely replaced by more intelligent systems.

Training a machine on ‘bad’ emails

The first AI approach we often see in the wild involves harnessing an extremely large data set with thousands or millions of emails. Once these emails have come through, an AI is trained to look for common patterns in malicious emails. The system then updates its models, rules set, and blacklists based on that data.

This method certainly represents an improvement to traditional rules and signatures, but it does not escape the fact that it is still reactive, and unable to stop new attack infrastructure and new types of email attacks. It is simply automating that flawed, traditional approach – only instead of having a human update the rules and signatures, a machine is updating them instead.

Relying on this approach alone has one basic but critical flaw: it does not enable you to stop new types of attacks that it has never seen before. It accepts that there has to be a ‘patient zero’ – or first victim – in order to succeed.

The industry is beginning to acknowledge the challenges with this approach, and huge amounts of resources – both automated systems and security researchers – are being thrown into minimizing its limitations. This includes leveraging a technique called “data augmentation” that involves taking a malicious email that slipped through and generating many “training samples” using open-source text augmentation libraries to create “similar” emails – so that the machine learns not only the missed phish as ‘bad’, but several others like it – enabling it to detect future attacks that use similar wording, and fall into the same category.

But spending all this time and effort into trying to fix an unsolvable problem is like putting all your eggs in the wrong basket. Why try and fix a flawed system rather than change the game altogether? To spell out the limitations of this approach, let us look at a situation where the nature of the attack is entirely new.

The rise of ‘fearware’

When the global pandemic hit, and governments began enforcing travel bans and imposing stringent restrictions, there was undoubtedly a collective sense of fear and uncertainty. As explained previously in this blog, cyber-criminals were quick to capitalize on this, taking advantage of people’s desire for information to send out topical emails related to COVID-19 containing malware or credential-grabbing links.

These emails often spoofed the Centers for Disease Control and Prevention (CDC), or later on, as the economic impact of the pandemic began to take hold, the Small Business Administration (SBA). As the global situation shifted, so did attackers’ tactics. And in the process, over 130,000 new domains related to COVID-19 were purchased.

Let’s now consider how the above approach to email security might fare when faced with these new email attacks. The question becomes: how can you train a model to look out for emails containing ‘COVID-19’, when the term hasn’t even been invented yet?

And while COVID-19 is the most salient example of this, the same reasoning follows for every single novel and unexpected news cycle that attackers are leveraging in their phishing emails to evade tools using this approach – and attracting the recipient’s attention as a bonus. Moreover, if an email attack is truly targeted to your organization, it might contain bespoke and tailored news referring to a very specific thing that supervised machine learning systems could never be trained on.

This isn’t to say there’s not a time and a place in email security for looking at past attacks to set yourself up for the future. It just isn’t here.

Spotting intention

Darktrace uses this approach for one specific use which is future-proof and not prone to change over time, to analyze grammar and tone in an email in order to identify intention: asking questions like ‘does this look like an attempt at inducement? Is the sender trying to solicit some sensitive information? Is this extortion?’ By training a system on an extremely large data set collected over a period of time, you can start to understand what, for instance, inducement looks like. This then enables you to easily spot future scenarios of inducement based on a common set of characteristics.

Training a system in this way works because, unlike news cycles and the topics of phishing emails, fundamental patterns in tone and language don’t change over time. An attempt at solicitation is always an attempt at solicitation, and will always bear common characteristics.

For this reason, this approach only plays one small part of a very large engine. It gives an additional indication about the nature of the threat, but is not in itself used to determine anomalous emails.

Detecting the unknown unknowns

In addition to using the above approach to identify intention, Darktrace uses unsupervised machine learning, which starts with extracting and extrapolating thousands of data points from every email. Some of these are taken directly from the email itself, while others are only ascertainable by the above intention-type analysis. Additional insights are also gained from observing emails in the wider context of all available data across email, network and the cloud environment of the organization.

Only after having a now-significantly larger and more comprehensive set of indicators, with a more complete description of that email, can the data be fed into a topic-indifferent machine learning engine to start questioning the data in millions of ways in order to understand if it belongs, given the wider context of the typical ‘pattern of life’ for the organization. Monitoring all emails in conjunction allows the machine to establish things like:

  • Does this person usually receive ZIP files?
  • Does this supplier usually send links to Dropbox?
  • Has this sender ever logged in from China?
  • Do these recipients usually get the same emails together?

The technology identifies patterns across an entire organization and gains a continuously evolving sense of ‘self’ as the organization grows and changes. It is this innate understanding of what is and isn’t ‘normal’ that allows AI to spot the truly ‘unknown unknowns’ instead of just ‘new variations of known bads.’

This type of analysis brings an additional advantage in that it is language and topic agnostic: because it focusses on anomaly detection rather than finding specific patterns that indicate threat, it is effective regardless of whether an organization typically communicates in English, Spanish, Japanese, or any other language.

By layering both of these approaches, you can understand the intention behind an email and understand whether that email belongs given the context of normal communication. And all of this is done without ever making an assumption or having the expectation that you’ve seen this threat before.

Years in the making

It’s well established now that the legacy approach to email security has failed – and this makes it easy to see why existing recommendation engines are being applied to the cyber security space. On first glance, these solutions may be appealing to a security team, but highly targeted, truly unique spear phishing emails easily skirt these systems. They can’t be relied on to stop email threats on the first encounter, as they have a dependency on known attacks with previously seen topics, domains, and payloads.

An effective, layered AI approach takes years of research and development. There is no single mathematical model to solve the problem of determining malicious emails from benign communication. A layered approach accepts that competing mathematical models each have their own strengths and weaknesses. It autonomously determines the relative weight these models should have and weighs them against one another to produce an overall ‘anomaly score’ given as a percentage, indicating exactly how unusual a particular email is in comparison to the organization’s wider email traffic flow.

It is time for email security to well and truly drop the assumption that you can look at threats of the past to predict tomorrow’s attacks. An effective AI cyber security system can identify abnormalities with no reliance on historical attacks, enabling it to catch truly unique novel emails on the first encounter – before they land in the inbox.

INSIDE THE SOC
Darktrace cyber analysts are world-class experts in threat intelligence, threat hunting and incident response, and provide 24/7 SOC support to thousands of Darktrace customers around the globe. Inside the SOC is exclusively authored by these experts, providing analysis of cyber incidents and threat trends, based on real-world experience in the field.
AUTHOR
ABOUT ThE AUTHOR
Dan Fein
VP, Product

Based in New York, Dan joined Darktrace’s technical team in 2015, helping customers quickly achieve a complete and granular understanding of Darktrace’s product suite. Dan has a particular focus on Darktrace/Email, ensuring that it is effectively deployed in complex digital environments, and works closely with the development, marketing, sales, and technical teams. Dan holds a Bachelor’s degree in Computer Science from New York University.

Book a 1-1 meeting with one of our experts
share this article
COre coverage

More in this series

No items found.

Blog

Thought Leadership

The State of AI in Cybersecurity: Understanding AI Technologies

Default blog imageDefault blog image
24
Jul 2024

About the State of AI Cybersecurity Report

Darktrace surveyed 1,800 CISOs, security leaders, administrators, and practitioners from industries around the globe. Our research was conducted to understand how the adoption of new AI-powered offensive and defensive cybersecurity technologies are being managed by organizations.

This blog continues the conversation from “The State of AI in Cybersecurity: Unveiling Global Insights from 1,800 Security Practitioners”. This blog will focus on security professionals’ understanding of AI technologies in cybersecurity tools.

To access download the full report, click here.

How familiar are security professionals with supervised machine learning

Just 31% of security professionals report that they are “very familiar” with supervised machine learning.

Many participants admitted unfamiliarity with various AI types. Less than one-third felt "very familiar" with the technologies surveyed: only 31% with supervised machine learning and 28% with natural language processing (NLP).

Most participants were "somewhat" familiar, ranging from 46% for supervised machine learning to 36% for generative adversarial networks (GANs). Executives and those in larger organizations reported the highest familiarity.

Combining "very" and "somewhat" familiar responses, 77% had familiarity with supervised machine learning, 74% generative AI, and 73% NLP. With generative AI getting so much media attention, and NLP being the broader area of AI that encompasses generative AI, these results may indicate that stakeholders are understanding the topic on the basis of buzz, not hands-on work with the technologies.  

If defenders hope to get ahead of attackers, they will need to go beyond supervised learning algorithms trained on known attack patterns and generative AI. Instead, they’ll need to adopt a comprehensive toolkit comprised of multiple, varied AI approaches—including unsupervised algorithms that continuously learn from an organization’s specific data rather than relying on big data generalizations.  

Different types of AI

Different types of AI have different strengths and use cases in cyber security. It’s important to choose the right technique for what you’re trying to achieve.  

Supervised machine learning: Applied more often than any other type of AI in cyber security. Trained on human attack patterns and historical threat intelligence.  

Large language models (LLMs): Applies deep learning models trained on extremely large data sets to understand, summarize, and generate new content. Used in generative AI tools.  

Natural language processing (NLP): Applies computational techniques to process and understand human language.  

Unsupervised machine learning: Continuously learns from raw, unstructured data to identify deviations that represent true anomalies.  

What impact will generative AI have on the cybersecurity field?

More than half of security professionals (57%) believe that generative AI will have a bigger impact on their field over the next few years than other types of AI.

Chart showing the types of AI expected to impact security the most
Figure 1: Chart from Darktrace's State of AI in Cybersecurity Report

Security stakeholders are highly aware of generative AI and LLMs, viewing them as pivotal to the field's future. Generative AI excels at abstracting information, automating tasks, and facilitating human-computer interaction. However, LLMs can "hallucinate" due to training data errors and are vulnerable to prompt injection attacks. Despite improvements in securing LLMs, the best cyber defenses use a mix of AI types for enhanced accuracy and capability.

AI education is crucial as industry expectations for generative AI grow. Leaders and practitioners need to understand where and how to use AI while managing risks. As they learn more, there will be a shift from generative AI to broader AI applications.

Do security professionals fully understand the different types of AI in security products?

Only 26% of security professionals report a full understanding of the different types of AI in use within security products.

Confusion is prevalent in today’s marketplace. Our survey found that only 26% of respondents fully understand the AI types in their security stack, while 31% are unsure or confused by vendor claims. Nearly 65% believe generative AI is mainly used in cybersecurity, though it’s only useful for identifying phishing emails. This highlights a gap between user expectations and vendor delivery, with too much focus on generative AI.

Key findings include:

  • Executives and managers report higher understanding than practitioners.
  • Larger organizations have better understanding due to greater specialization.

As AI evolves, vendors are rapidly introducing new solutions faster than practitioners can learn to use them. There's a strong need for greater vendor transparency and more education for users to maximize the technology's value.

To help ease confusion around AI technologies in cybersecurity, Darktrace has released the CISO’s Guide to Cyber AI. A comprehensive white paper that categorizes the different applications of AI in cybersecurity. Download the White Paper here.  

Do security professionals believe generative AI alone is enough to stop zero-day threats?

No! 86% of survey participants believe generative AI alone is NOT enough to stop zero-day threats

This consensus spans all geographies, organization sizes, and roles, though executives are slightly less likely to agree. Asia-Pacific participants agree more, while U.S. participants agree less.

Despite expecting generative AI to have the most impact, respondents recognize its limited security use cases and its need to work alongside other AI types. This highlights the necessity for vendor transparency and varied AI approaches for effective security across threat prevention, detection, and response.

Stakeholders must understand how AI solutions work to ensure they offer advanced, rather than outdated, threat detection methods. The survey shows awareness that old methods are insufficient.

To access the full report, click here.

Continue reading
About the author
The Darktrace Community

Blog

Inside the SOC

Jupyter Ascending: Darktrace’s Investigation of the Adaptive Jupyter Information Stealer

Default blog imageDefault blog image
18
Jul 2024

What is Malware as a Service (MaaS)?

Malware as a Service (MaaS) is a model where cybercriminals develop and sell or lease malware to other attackers.

This approach allows individuals or groups with limited technical skills to launch sophisticated cyberattacks by purchasing or renting malware tools and services. MaaS is often provided through online marketplaces on the dark web, where sellers offer various types of malware, including ransomware, spyware, and trojans, along with support services such as updates and customer support.

The Growing MaaS Marketplace

The Malware-as-a-Service (MaaS) marketplace is rapidly expanding, with new strains of malware being regularly introduced and attracting waves of new and previous attackers. The low barrier for entry, combined with the subscription-like accessibility and lucrative business model, has made MaaS a prevalent tool for cybercriminals. As a result, MaaS has become a significant concern for organizations and their security teams, necessitating heightened vigilance and advanced defense strategies.

Examples of Malware as a Service

  • Ransomware as a Service (RaaS): Providers offer ransomware kits that allow users to launch ransomware attacks and share the ransom payments with the service provider.
  • Phishing as a Service: Services that provide phishing kits, including templates and email lists, to facilitate phishing campaigns.
  • Botnet as a Service: Renting out botnets to perform distributed denial-of-service (DDoS) attacks or other malicious activities.
  • Information Stealer: Information stealers are a type of malware specifically designed to collect sensitive data from infected systems, such as login credentials, credit card numbers, personal identification information, and other valuable data.

How does information stealer malware work?

Information stealers are an often-discussed type MaaS tool used to harvest personal and proprietary information such as administrative credentials, banking information, and cryptocurrency wallet details. This information is then exfiltrated from target networks via command-and-control (C2) communication, allowing threat actors to monetize the data. Information stealers have also increasingly been used as an initial access vector for high impact breaches including ransomware attacks, employing both double and triple extortion tactics.

After investigating several prominent information stealers in recent years, the Darktrace Threat Research team launched an investigation into indicators of compromise (IoCs) associated with another variant in late 2023, namely the Jupyter information stealer.

What is Jupyter information stealer and how does it work?

The Jupyter information stealer (also known as Yellow Cockatoo, SolarMarker, and Polazert) was first observed in the wild in late 2020. Multiple variants have since become part of the wider threat landscape, however, towards the end of 2023 a new variant was observed. This latest variant achieved greater stealth and updated its delivery method, targeting browser extensions such as Edge, Firefox, and Chrome via search engine optimization (SEO) poisoning and malvertising. This then redirects users to download malicious files that typically impersonate legitimate software, and finally initiates the infection and the attack chain for Jupyter [3][4]. In recently noted cases, users download malicious executables for Jupyter via installer packages created using InnoSetup – an open-source compiler used to create installation packages in the Windows OS.

The latest release of Jupyter reportedly takes advantage of signed digital certificates to add credibility to downloaded executables, further supplementing its already existing tactics, techniques and procedures (TTPs) for detection evasion and sophistication [4]. Jupyter does this while still maintaining features observed in other iterations, such as dropping files into the %TEMP% folder of a system and using PowerShell to decrypt and load content into memory [4]. Another reported feature includes backdoor functionality such as:

  • C2 infrastructure
  • Ability to download and execute malware
  • Execution of PowerShell scripts and commands
  • Injecting shellcode into legitimate windows applications

Darktrace Coverage of Jupyter information stealer

In September 2023, Darktrace’s Threat Research team first investigated Jupyter and discovered multiple IoCs and TTPs associated with the info-stealer across the customer base. Across most investigated networks during this time, Darktrace observed the following activity:

  • HTTP POST requests over destination port 80 to rare external IP addresses (some of these connections were also made via port 8089 and 8090 with no prior hostname lookup).
  • HTTP POST requests specifically to the root directory of a rare external endpoint.
  • Data streams being sent to unusual external endpoints
  • Anomalous PowerShell execution was observed on numerous affected networks.

Taking a further look at the activity patterns detected, Darktrace identified a series of HTTP POST requests within one customer’s environment on December 7, 2023. The HTTP POST requests were made to the root directory of an external IP address, namely 146.70.71[.]135, which had never previously been observed on the network. This IP address was later reported to be malicious and associated with Jupyter (SolarMarker) by open-source intelligence (OSINT) [5].

Device Event Log indicating several connections from the source device to the rare external IP address 146.70.71[.]135 over port 80.
Figure 1: Device Event Log indicating several connections from the source device to the rare external IP address 146.70.71[.]135 over port 80.

This activity triggered the Darktrace / NETWORK model, ‘Anomalous Connection / Posting HTTP to IP Without Hostname’. This model alerts for devices that have been seen posting data out of the network to rare external endpoints without a hostname. Further investigation into the offending device revealed a significant increase in external data transfers around the time Darktrace alerted the activity.

This External Data Transfer graph demonstrates a spike in external data transfer from the internal device indicated at the top of the graph on December 7, 2023, with a time lapse shown of one week prior.
Figure 2: This External Data Transfer graph demonstrates a spike in external data transfer from the internal device indicated at the top of the graph on December 7, 2023, with a time lapse shown of one week prior.

Packet capture (PCAP) analysis of this activity also demonstrates possible external data transfer, with the device observed making a POST request to the root directory of the malicious endpoint, 146.70.71[.]135.

PCAP of a HTTP POST request showing streams of data being sent to the endpoint, 146.70.71[.]135.
Figure 3: PCAP of a HTTP POST request showing streams of data being sent to the endpoint, 146.70.71[.]135.

In other cases investigated by the Darktrace Threat Research team, connections to the rare external endpoint 67.43.235[.]218 were detected on port 8089 and 8090. This endpoint was also linked to Jupyter information stealer by OSINT sources [6].

Darktrace recognized that such suspicious connections represented unusual activity and raised several model alerts on multiple customer environments, including ‘Compromise / Large Number of Suspicious Successful Connections’ and ‘Anomalous Connection / Multiple Connections to New External TCP Port’.

In one instance, a device that was observed performing many suspicious connections to 67.43.235[.]218 was later observed making suspicious HTTP POST connections to other malicious IP addresses. This included 2.58.14[.]246, 91.206.178[.]109, and 78.135.73[.]176, all of which had been linked to Jupyter information stealer by OSINT sources [7] [8] [9].

Darktrace further observed activity likely indicative of data streams being exfiltrated to Jupyter information stealer C2 endpoints.

Graph displaying the significant increase in the number of HTTP POST requests with No Get made by an affected device, likely indicative of Jupyter information stealer C2 activity.
Figure 4: Graph displaying the significant increase in the number of HTTP POST requests with No Get made by an affected device, likely indicative of Jupyter information stealer C2 activity.

In several cases, Darktrace was able to leverage customer integrations with other security vendors to add additional context to its own model alerts. For example, numerous customers who had integrated Darktrace with Microsoft Defender received security integration alerts that enriched Darktrace’s model alerts with additional intelligence, linking suspicious activity to Jupyter information stealer actors.

The security integration model alerts ‘Security Integration / Low Severity Integration Detection’ and (right image) ‘Security Integration / High Severity Integration Detection’, linking suspicious activity observed by Darktrace with Jupyter information stealer (SolarMarker).
Figure 5: The security integration model alerts ‘Security Integration / Low Severity Integration Detection’ and (right image) ‘Security Integration / High Severity Integration Detection’, linking suspicious activity observed by Darktrace with Jupyter information stealer (SolarMarker).

Conclusion

The MaaS ecosystems continue to dominate the current threat landscape and the increasing sophistication of MaaS variants, featuring advanced defense evasion techniques, poses significant risks once deployed on target networks.

Leveraging anomaly-based detections is crucial for staying ahead of evolving MaaS threats like Jupyter information stealer. By adopting AI-driven security tools like Darktrace / NETWORK, organizations can more quickly identify and effectively detect and respond to potential threats as soon as they emerge. This is especially crucial given the rise of stealthy information stealing malware strains like Jupyter which cannot only harvest and steal sensitive data, but also serve as a gateway to potentially disruptive ransomware attacks.

Credit to Nahisha Nobregas (Senior Cyber Analyst), Vivek Rajan (Cyber Analyst)

References

1.     https://www.paloaltonetworks.com/cyberpedia/what-is-multi-extortion-ransomware

2.     https://flashpoint.io/blog/evolution-stealer-malware/

3.     https://blogs.vmware.com/security/2023/11/jupyter-rising-an-update-on-jupyter-infostealer.html

4.     https://www.morphisec.com/hubfs/eBooks_and_Whitepapers/Jupyter%20Infostealer%20WEB.pdf

5.     https://www.virustotal.com/gui/ip-address/146.70.71.135

6.     https://www.virustotal.com/gui/ip-address/67.43.235.218/community

7.     https://www.virustotal.com/gui/ip-address/2.58.14.246/community

8.     https://www.virustotal.com/gui/ip-address/91.206.178.109/community

9.     https://www.virustotal.com/gui/ip-address/78.135.73.176/community

Appendices

Darktrace Model Detections

  • Anomalous Connection / Posting HTTP to IP Without Hostname
  • Compromise / HTTP Beaconing to Rare Destination
  • Unusual Activity / Unusual External Data to New Endpoints
  • Compromise / Slow Beaconing Activity To External Rare
  • Compromise / Large Number of Suspicious Successful Connections
  • Anomalous Connection / Multiple Failed Connections to Rare Endpoint
  • Compromise / Excessive Posts to Root
  • Compromise / Sustained SSL or HTTP Increase
  • Security Integration / High Severity Integration Detection
  • Security Integration / Low Severity Integration Detection
  • Anomalous Connection / Multiple Connections to New External TCP Port
  • Unusual Activity / Unusual External Data Transfer

AI Analyst Incidents:

  • Unusual Repeated Connections
  • Possible HTTP Command and Control to Multiple Endpoints
  • Possible HTTP Command and Control

List of IoCs

Indicators – Type – Description

146.70.71[.]135

IP Address

Jupyter info-stealer C2 Endpoint

91.206.178[.]109

IP Address

Jupyter info-stealer C2 Endpoint

146.70.92[.]153

IP Address

Jupyter info-stealer C2 Endpoint

2.58.14[.]246

IP Address

Jupyter info-stealer C2 Endpoint

78.135.73[.]176

IP Address

Jupyter info-stealer C2 Endpoint

217.138.215[.]105

IP Address

Jupyter info-stealer C2 Endpoint

185.243.115[.]88

IP Address

Jupyter info-stealer C2 Endpoint

146.70.80[.]66

IP Address

Jupyter info-stealer C2 Endpoint

23.29.115[.]186

IP Address

Jupyter info-stealer C2 Endpoint

67.43.235[.]218

IP Address

Jupyter info-stealer C2 Endpoint

217.138.215[.]85

IP Address

Jupyter info-stealer C2 Endpoint

193.29.104[.]25

IP Address

Jupyter info-stealer C2 Endpoint

Continue reading
About the author
Nahisha Nobregas
SOC Analyst
Our ai. Your data.

Elevate your cyber defenses with Darktrace AI

Start your free trial
Darktrace AI protecting a business from cyber threats.