In today’s rapidly evolving threat landscape, it is important to integrate innovative security solutions to mitigate new attack trends. One area of innovation involves leveraging multi-modal large language models (LLMs) to simulate more realistic attack scenarios. However, building and deploying such sophisticated systems, especially for activities like phishing simulation, presents a unique set of challenges that demand rigorous attention.
Alignment: a behavioral problem?
Alignment is commonly referred as the phase of LLM development where the model is shaped to follow a specific task, or adhere to a specific goal. LLMs are usually tailored to answer questions written in a chat interface, or maybe to perform code generation tasks. Alignment is also used to make sure that the model follows the ethical guidelines set by the model creator, i.e. avoiding generating of harmful or offensive content.
Basically, it’s a way to say “no” to certain requests that may come from malicious users.
One of the possible malicious uses that models are forbidden to comply to are phishing attacks, even if done in a training environment and for simulation purposes. Using a third-party model for this task means taking a risk, since it might refuse to complete it correctly or to do it completely.
Techniques used to ensure model alignment
Some of the techniques used to ensure perfect alignment of the model are:
- Task-specific training: to ensure that our model completes the requested tasks, it is mandatory to train it for our specific simulation needs.
- Abliteration: when using a third-party model to simulate a phishing attack, some “refusal” techniques are put in place by the model creator to make sure it complies with moral standards and that it doesn’t generate malicious content. By removing the “refusal” parts of the model, we can make sure that the model always answers to our specific requests.
- Parameters and wording changes: LLMs refuse to complete a perceived dangerous task when it detects some strange or misbehaving tokens, like “phishing”, “cyberattack” or similar ones. By changing some parameters and some word choices, LLMs can reply, but they might still refuse to complete the task.
- Jail-breaking: to make commercial-grade LLMs answer to dangerous requests, security researchers have developed prompting techniques to disrupt the available defenses put in place. This technique frequently works and can lead to misbehaving models, but it can be easily fixed with an update on model behavior.
That’s why it is absolutely essential to use a model trained to a specific task. To do that, it’s necessary to have an extensive training dataset, tailored for the different tasks needed during advanced phishing simulation, like open source intelligence data scraping and highly-personalized phishing simulation per-employee. While this solves one of the problems, others may arise during model bench-marking.
Dataset quality: what about new attacks?
AI models are trained on extensive datasets of real-world data, while they lack training on specific tasks, in our observed case, a model that is capable of helping with phishing simulation scenarios. Even models that are trained on that might lack knowledge about new types of threats, like social engineering-boosted threats and highly personalized attacks, aimed at higher profiles inside your organization.
Malicious actors don’t wait for companies to setup an updated security system or better phishing simulation tools, they act fast and unpredictably. This means that your current phishing simulations models might be outdated, because they lack knowledge of the latest attacks and trends. To overcome this problem, it is mandatory to frequently update the model, training it thoroughly and ensuring that the simulation takes into consideration possible new attacks.
Security concerns: protect your secrets!
Giving an LLM access to specific tools is extremely dangerous: they can leak private informations, intellectual properties and, possibly, company-wide private data and keys! Moreover, if you leverage commercial-grade LLMs for your phishing simulations, leaks of private information can occur, damaging your organisation and the reputation of it.
It extremely important to consider every possible risk when deploying an LLM solution, making sure that it adheres to MLOps good practices, and that is correctly setup and correctly shielded by external attacks like jail-breaking or secrets leaking.
Conclusion: strengthen your security!
At Baited we use our in-house privacy-oriented model, deployed on secure dedicated servers, tailored for your organization specific simulation needs, correctly aligned and frequently updated with the latest threat intelligence on phishing.
Let’s work together to defend against phishing. Reach out to learn more about how Baited can help your organization stay secure!

Cybersecurity and AI specialist focused on developing an advanced phishing detection and simulation pipeline. Leading Baited research initiatives in artificial intelligence and threat intellingence landscape.