Blog
/
/
October 30, 2023

Exploring AI Threats: Package Hallucination Attacks

Learn how malicious actors exploit errors in generative AI tools to launch packet attacks. Read how Darktrace products detect and prevent these threats!
Inside the SOC
Darktrace cyber analysts are world-class experts in threat intelligence, threat hunting and incident response, and provide 24/7 SOC support to thousands of Darktrace customers around the globe. Inside the SOC is exclusively authored by these experts, providing analysis of cyber incidents and threat trends, based on real-world experience in the field.
Written by
Charlotte Thompson
Cyber Analyst
Written by
Tiana Kelly
Deputy Team Lead, London & Cyber Analyst
Default blog imageDefault blog imageDefault blog imageDefault blog imageDefault blog imageDefault blog image
30
Oct 2023

AI tools open doors for threat actors

On November 30, 2022, the free conversational language generation model ChatGPT was launched by OpenAI, an artificial intelligence (AI) research and development company. The launch of ChatGPT was the culmination of development ongoing since 2018 and represented the latest innovation in the ongoing generative AI boom and made the use of generative AI tools accessible to the general population for the first time.

ChatGPT is estimated to currently have at least 100 million users, and in August 2023 the site reached 1.43 billion visits [1]. Darktrace data indicated that, as of March 2023, 74% of active customer environments have employees using generative AI tools in the workplace [2].

However, with new tools come new opportunities for threat actors to exploit and use them maliciously, expanding their arsenal.

Much consideration has been given to mitigating the impacts of the increased linguistic complexity in social engineering and phishing attacks resulting from generative AI tool use, with Darktrace observing a 135% increase in ‘novel social engineering attacks’ across thousands of active Darktrace/Email™ customers from January to February 2023, corresponding with the widespread adoption of ChatGPT and its peers [3].

Less overall consideration, however, has been given to impacts stemming from errors intrinsic to generative AI tools. One of these errors is AI hallucinations.

What is an AI hallucination?

AI “hallucination” is a term which refers to the predictive elements of generative AI and LLMs’ AI model gives an unexpected or factually incorrect response which does not align with its machine learning training data [4]. This differs from regular and intended behavior for an AI model, which should provide a response based on the data it was trained upon.  

Why are AI hallucinations a problem?

Despite the term indicating it might be a rare phenomenon, hallucinations are far more likely than accurate or factual results as the AI models used in LLMs are merely predictive and focus on the most probable text or outcome, rather than factual accuracy.

Given the widespread use of generative AI tools in the workplace employees are becoming significantly more likely to encounter an AI hallucination. Furthermore, if these fabricated hallucination responses are taken at face value, they could cause significant issues for an organization.

Use of generative AI in software development

Software developers may use generative AI for recommendations on how to optimize their scripts or code, or to find packages to import into their code for various uses. Software developers may ask LLMs for recommendations on specific pieces of code or how to solve a specific problem, which will likely lead to a third-party package. It is possible that packages recommended by generative AI tools could represent AI hallucinations and the packages may not have been published, or, more accurately, the packages may not have been published prior to the date at which the training data for the model halts. If these hallucinations result in common suggestions of a non-existent package, and the developer copies the code snippet wholesale, this may leave the exchanges vulnerable to attack.

Research conducted by Vulcan revealed the prevalence of AI hallucinations when ChatGPT is asked questions related to coding. After sourcing a sample of commonly asked coding questions from Stack Overflow, a question-and-answer website for programmers, researchers queried ChatGPT (in the context of Node.js and Python) and reviewed its responses. In 20% of the responses provided by ChatGPT pertaining to Node.js at least one un-published package was included, whilst the figure sat at around 35% for Python [4].

Hallucinations can be unpredictable, but would-be attackers are able to find packages to create by asking generative AI tools generic questions and checking whether the suggested packages exist already. As such, attacks using this vector are unlikely to target specific organizations, instead posing more of a widespread threat to users of generative AI tools.

Malicious packages as attack vectors

Although AI hallucinations can be unpredictable, and responses given by generative AI tools may not always be consistent, malicious actors are able to discover AI hallucinations by adopting the approach used by Vulcan. This allows hallucinated packages to be used as attack vectors. Once a malicious actor has discovered a hallucination of an un-published package, they are able to create a package with the same name and include a malicious payload, before publishing it. This is known as a malicious package.

Malicious packages could also be recommended by generative AI tools in the form of pre-existing packages. A user may be recommended a package that had previously been confirmed to contain malicious content, or a package that is no longer maintained and, therefore, is more vulnerable to hijack by malicious actors.

In such scenarios it is not necessary to manipulate the training data (data poisoning) to achieve the desired outcome for the malicious actor, thus a complex and time-consuming attack phase can easily be bypassed.

An unsuspecting software developer may incorporate a malicious package into their code, rendering it harmful. Deployment of this code could then result in compromise and escalation into a full-blown cyber-attack.

Figure 1: Flow diagram depicting the initial stages of an AI Package Hallucination Attack.

For providers of Software-as-a-Service (SaaS) products, this attack vector may represent an even greater risk. Such organizations may have a higher proportion of employed software developers than other organizations of comparable size. A threat actor, therefore, could utilize this attack vector as part of a supply chain attack, whereby a malicious payload becomes incorporated into trusted software and is then distributed to multiple customers. This type of attack could have severe consequences including data loss, the downtime of critical systems, and reputational damage.

How could Darktrace detect an AI Package Hallucination Attack?

In June 2023, Darktrace introduced a range of DETECT™ and RESPOND™ models designed to identify the use of generative AI tools within customer environments, and to autonomously perform inhibitive actions in response to such detections. These models will trigger based on connections to endpoints associated with generative AI tools, as such, Darktrace’s detection of an AI Package Hallucination Attack would likely begin with the breaching of one of the following DETECT models:

  • Compliance / Anomalous Upload to Generative AI
  • Compliance / Beaconing to Rare Generative AI and Generative AI
  • Compliance / Generative AI

Should generative AI tool use not be permitted by an organization, the Darktrace RESPOND model ‘Antigena / Network / Compliance / Antigena Generative AI Block’ can be activated to autonomously block connections to endpoints associated with generative AI, thus preventing an AI Package Hallucination attack before it can take hold.

Once a malicious package has been recommended, it may be downloaded from GitHub, a platform and cloud-based service used to store and manage code. Darktrace DETECT is able to identify when a device has performed a download from an open-source repository such as GitHub using the following models:

  • Device / Anomalous GitHub Download
  • Device / Anomalous Script Download Followed By Additional Packages

Whatever goal the malicious package has been designed to fulfil will determine the next stages of the attack. Due to their highly flexible nature, AI package hallucinations could be used as an attack vector to deliver a large variety of different malware types.

As GitHub is a commonly used service by software developers and IT professionals alike, traditional security tools may not alert customer security teams to such GitHub downloads, meaning malicious downloads may go undetected. Darktrace’s anomaly-based approach to threat detection, however, enables it to recognize subtle deviations in a device’s pre-established pattern of life which may be indicative of an emerging attack.

Subsequent anomalous activity representing the possible progression of the kill chain as part of an AI Package Hallucination Attack could then trigger an Enhanced Monitoring model. Enhanced Monitoring models are high-fidelity indicators of potential malicious activity that are investigated by the Darktrace analyst team as part of the Proactive Threat Notification (PTN) service offered by the Darktrace Security Operation Center (SOC).

Conclusion

Employees are often considered the first line of defense in cyber security; this is particularly true in the face of an AI Package Hallucination Attack.

As the use of generative AI becomes more accessible and an increasingly prevalent tool in an attacker’s toolbox, organizations will benefit from implementing company-wide policies to define expectations surrounding the use of such tools. It is simple, yet critical, for example, for employees to fact check responses provided to them by generative AI tools. All packages recommended by generative AI should also be checked by reviewing non-generated data from either external third-party or internal sources. It is also good practice to adopt caution when downloading packages with very few downloads as it could indicate the package is untrustworthy or malicious.

As of September 2023, ChatGPT Plus and Enterprise users were able to use the tool to browse the internet, expanding the data ChatGPT can access beyond the previous training data cut-off of September 2021 [5]. This feature will be expanded to all users soon [6]. ChatGPT providing up-to-date responses could prompt the evolution of this attack vector, allowing attackers to publish malicious packages which could subsequently be recommended by ChatGPT.

It is inevitable that a greater embrace of AI tools in the workplace will be seen in the coming years as the AI technology advances and existing tools become less novel and more familiar. By fighting fire with fire, using AI technology to identify AI usage, Darktrace is uniquely placed to detect and take preventative action against malicious actors capitalizing on the AI boom.

Credit to Charlotte Thompson, Cyber Analyst, Tiana Kelly, Analyst Team Lead, London, Cyber Analyst

References

[1] https://seo.ai/blog/chatgpt-user-statistics-facts

[2] https://darktrace.com/news/darktrace-addresses-generative-ai-concerns

[3] https://darktrace.com/news/darktrace-email-defends-organizations-against-evolving-cyber-threat-landscape

[4] https://vulcan.io/blog/ai-hallucinations-package-risk?nab=1&utm_referrer=https%3A%2F%2Fwww.google.com%2F

[5] https://twitter.com/OpenAI/status/1707077710047216095

[6] https://www.reuters.com/technology/openai-says-chatgpt-can-now-browse-internet-2023-09-27/

Inside the SOC
Darktrace cyber analysts are world-class experts in threat intelligence, threat hunting and incident response, and provide 24/7 SOC support to thousands of Darktrace customers around the globe. Inside the SOC is exclusively authored by these experts, providing analysis of cyber incidents and threat trends, based on real-world experience in the field.
Written by
Charlotte Thompson
Cyber Analyst
Written by
Tiana Kelly
Deputy Team Lead, London & Cyber Analyst

More in this series

No items found.

Blog

/

Email

/

December 18, 2025

Why organizations are moving to label-free, behavioral DLP for outbound email

Man at laptopDefault blog imageDefault blog image

Why outbound email DLP needs reinventing

In 2025, the global average cost of a data breach fell slightly — but remains substantial at USD 4.44 million (IBM Cost of a Data Breach Report 2025). The headline figure hides a painful reality: many of these breaches stem not from sophisticated hacks, but from simple human error: mis-sent emails, accidental forwarding, or replying with the wrong attachment. Because outbound email is a common channel for sensitive data leaving an organization, the risk posed by everyday mistakes is enormous.

In 2025, 53% of data breaches involved customer PII, making it the most commonly compromised asset (IBM Cost of a Data Breach Report 2025). This makes “protection at the moment of send” essential. A single unintended disclosure can trigger compliance violations, regulatory scrutiny, and erosion of customer trust –consequences that are disproportionate to the marginal human errors that cause them.

Traditional DLP has long attempted to mitigate these impacts, but it relies heavily on perfect labelling and rigid pattern-matching. In reality, data loss rarely presents itself as a neat, well-structured pattern waiting to be caught – it looks like everyday communication, just slightly out of context.

How data loss actually happens

Most data loss comes from frustratingly familiar scenarios. A mistyped name in auto-complete sends sensitive data to the wrong “Alex.” A user forwards a document to a personal Gmail account “just this once.” Someone shares an attachment with a new or unknown correspondent without realizing how sensitive it is.

Traditional, content-centric DLP rarely catches these moments. Labels are missing or wrong. Regexes break the moment the data shifts formats. And static rules can’t interpret the context that actually matters – the sender-recipient relationship, the communication history, or whether this behavior is typical for the user.

It’s the everyday mistakes that hurt the most. The classic example: the Friday 5:58 p.m. mis-send, when auto-complete selects Martin, a former contractor, instead of Marta in Finance.

What traditional DLP approaches offer (and where gaps remain)

Most email DLP today follows two patterns, each useful but incomplete.

  • Policy- and label-centric DLP works when labels are correct — but content is often unlabeled or mislabeled, and maintaining classification adds friction. Gaps appear exactly where users move fastest
  • Rule and signature-based approaches catch known patterns but miss nuance: human error, new workflows, and “unknown unknowns” that don’t match a rule

The takeaway: Protection must combine content + behavior + explainability at send time, without depending on perfect labels.

Your technology primer: The three pillars that make outbound DLP effective

1) Label-free (vs. data classification)

Protects all content, not just what’s labeled. Label-free analysis removes classification overhead and closes gaps from missing or incorrect tags. By evaluating content and context at send time, it also catches misdelivery and other payload-free errors.

  • No labeling burden; no regex/rule maintenance
  • Works when tags are missing, wrong, or stale
  • Detects misdirected sends even when labels look right

2) Behavioral (vs. rules, signatures, threat intelligence)

Understands user behavior, not just static patterns. Behavioral analysis learns what’s normal for each person, surfacing human error and subtle exfiltration that rules can’t. It also incorporates account signals and inbound intel, extending across email and Teams.

  • Flags risk without predefined rules or IOCs
  • Catches misdelivery, unusual contacts, personal forwards, odd timing/volume
  • Blends identity and inbound context across channels

3) Proprietary DSLM (vs. generic LLM)

Optimized for precise, fast, explainable on-send decisions. A DSLM understands email/DLP semantics, avoids generative risks, and stays auditable and privacy-controlled, delivering intelligence reliably without slowing mail flow.

  • Low-latency, on-send enforcement
  • Non-generative for predictable, explainable outcomes
  • Governed model with strong privacy and auditability

The Darktrace approach to DLP

Darktrace / EMAIL – DLP stops misdelivery and sensitive data loss at send time using hold/notify/justify/release actions. It blends behavioral insight with content understanding across 35+ PII categories, protecting both labeled and unlabeled data. Every action is paired with clear explainability: AI narratives show exactly why an email was flagged, supporting analysts and helping end-users learn. Deployment aligns cleanly with existing SOC workflows through mail-flow connectors and optional Microsoft Purview label ingestion, without forcing duplicate policy-building.

Deployment is simple: Microsoft 365 routes outbound mail to Darktrace for real-time, inline decisions without regex or rule-heavy setup.

A buyer’s checklist for DLP solutions

When choosing your DLP solution, you want to be sure that it can deliver precise, explainable protection at the moment it matters – on send – without operational drag.  

To finish, we’ve compiled a handy list of questions you can ask before choosing an outbound DLP solution:

  • Can it operate label free when tags are missing or wrong? 
  • Does it truly learn per user behavior (no shortcuts)? 
  • Is there a domain specific model behind the content understanding (not a generic LLM)? 
  • Does it explain decisions to both analysts and end users? 
  • Will it integrate with your label program and SOC workflows rather than duplicate them? 

For a deep dive into Darktrace’s DLP solution, check out the full solution brief.

[related-resource]

Continue reading
About the author
Carlos Gray
Senior Product Marketing Manager, Email

Blog

/

Email

/

December 17, 2025

Beyond MFA: Detecting Adversary-in-the-Middle Attacks and Phishing with Darktrace

Beyond MFA: Detecting Adversary-in-the-Middle Attacks and Phishing with DarktraceDefault blog imageDefault blog image

What is an Adversary-in-the-middle (AiTM) attack?

Adversary-in-the-Middle (AiTM) attacks are a sophisticated technique often paired with phishing campaigns to steal user credentials. Unlike traditional phishing, which multi-factor authentication (MFA) increasingly mitigates, AiTM attacks leverage reverse proxy servers to intercept authentication tokens and session cookies. This allows attackers to bypass MFA entirely and hijack active sessions, stealthily maintaining access without repeated logins.

This blog examines a real-world incident detected during a Darktrace customer trial, highlighting how Darktrace / EMAILTM and Darktrace / IDENTITYTM identified the emerging compromise in a customer’s email and software-as-a-service (SaaS) environment, tracked its progression, and could have intervened at critical moments to contain the threat had Darktrace’s Autonomous Response capability been enabled.

What does an AiTM attack look like?

Inbound phishing email

Attacks typically begin with a phishing email, often originating from the compromised account of a known contact like a vendor or business partner. These emails will often contain malicious links or attachments leading to fake login pages designed to spoof legitimate login platforms, like Microsoft 365, designed to harvest user credentials.

Proxy-based credential theft and session hijacking

When a user clicks on a malicious link, they are redirected through an attacker-controlled proxy that impersonates legitimate services.  This proxy forwards login requests to Microsoft, making the login page appear legitimate. After the user successfully completes MFA, the attacker captures credentials and session tokens, enabling full account takeover without the need for reauthentication.

Follow-on attacks

Once inside, attackers will typically establish persistence through the creation of email rules or registering OAuth applications. From there, they often act on their objectives, exfiltrating sensitive data and launching additional business email compromise (BEC) campaigns. These campaigns can include fraudulent payment requests to external contacts or internal phishing designed to compromise more accounts and enable lateral movement across the organization.

Darktrace’s detection of an AiTM attack

At the end of September 2025, Darktrace detected one such example of an AiTM attack on the network of a customer trialling Darktrace / EMAIL and Darktrace / IDENTITY.

In this instance, the first indicator of compromise observed by Darktrace was the creation of a malicious email rule on one of the customer’s Office 365 accounts, suggesting the account had likely already been compromised before Darktrace was deployed for the trial.

Darktrace / IDENTITY observed the account creating a new email rule with a randomly generated name, likely to hide its presence from the legitimate account owner. The rule marked all inbound emails as read and deleted them, while ignoring any existing mail rules on the account. This rule was likely intended to conceal any replies to malicious emails the attacker had sent from the legitimate account owner and to facilitate further phishing attempts.

Darktrace’s detection of the anomalous email rule creation.
Figure 1: Darktrace’s detection of the anomalous email rule creation.

Internal and external phishing

Following the creation of the email rule, Darktrace / EMAIL observed a surge of suspicious activity on the user’s account. The account sent emails with subject lines referencing payment information to over 9,000 different external recipients within just one hour. Darktrace also identified that these emails contained a link to an unusual Google Drive endpoint, embedded in the text “download order and invoice”.

Darkrace’s detection of an unusual surge in outbound emails containing suspicious content, shortly following the creation of a new email rule.
Figure 2: Darkrace’s detection of an unusual surge in outbound emails containing suspicious content, shortly following the creation of a new email rule.
Darktrace / EMAIL’s detection of the compromised account sending over 9,000 external phishing emails, containing an unusual Google Drive link.
Figure 3: Darktrace / EMAIL’s detection of the compromised account sending over 9,000 external phishing emails, containing an unusual Google Drive link.

As Darktrace / EMAIL flagged the message with the ‘Compromise Indicators’ tag (Figure 2), it would have been held automatically if the customer had enabled default Data Loss Prevention (DLP) Action Flows in their email environment, preventing any external phishing attempts.

Figure 4: Darktrace / EMAIL’s preview of the email sent by the offending account.
Figure 4: Darktrace / EMAIL’s preview of the email sent by the offending account.

Darktrace analysis revealed that, after clicking the malicious link in the email, recipients would be redirected to a convincing landing page that closely mimicked the customer’s legitimate branding, including authentic imagery and logos, where prompted to download with a PDF named “invoice”.

Figure 5: Download and login prompts presented to recipients after following the malicious email link, shown here in safe view.

After clicking the “Download” button, users would be prompted to enter their company credentials on a page that was likely a credential-harvesting tool, designed to steal corporate login details and enable further compromise of SaaS and email accounts.

Darktrace’s Response

In this case, Darktrace’s Autonomous Response was not fully enabled across the customer’s email or SaaS environments, allowing the compromise to progress,  as observed by Darktrace here.

Despite this, Darktrace / EMAIL’s successful detection of the malicious Google Drive link in the internal phishing emails prompted it to suggest ‘Lock Link’, as a recommended action for the customer’s security team to manually apply. This action would have automatically placed the malicious link behind a warning or screening page blocking users from visiting it.

Autonomous Response suggesting locking the malicious Google Drive link sent in internal phishing emails.
Figure 6: Autonomous Response suggesting locking the malicious Google Drive link sent in internal phishing emails.

Furthermore, if active in the customer’s SaaS environment, Darktrace would likely have been able to mitigate the threat even earlier, at the point of the first unusual activity: the creation of a new email rule. Mitigative actions would have included forcing the user to log out, terminating any active sessions, and disabling the account.

Conclusion

AiTM attacks represent a significant evolution in credential theft techniques, enabling attackers to bypass MFA and hijack active sessions through reverse proxy infrastructure. In the real-world case we explored, Darktrace’s AI-driven detection identified multiple stages of the attack, from anomalous email rule creation to suspicious internal email activity, demonstrating how Autonomous Response could have contained the threat before escalation.

MFA is a critical security measure, but it is no longer a silver bullet. Attackers are increasingly targeting session tokens rather than passwords, exploiting trusted SaaS environments and internal communications to remain undetected. Behavioral AI provides a vital layer of defense by spotting subtle anomalies that traditional tools often miss

Security teams must move beyond static defenses and embrace adaptive, AI-driven solutions that can detect and respond in real time. Regularly review SaaS configurations, enforce conditional access policies, and deploy technologies that understand “normal” behavior to stop attackers before they succeed.

Credit to David Ison (Cyber Analyst), Bertille Pierron (Solutions Engineer), Ryan Traill (Analyst Content Lead)

Appendices

Models

SaaS / Anomalous New Email Rule

Tactic – Technique – Sub-Technique  

Phishing - T1566

Adversary-in-the-Middle - T1557

Continue reading
About the author
Your data. Our AI.
Elevate your network security with Darktrace AI