- Статьи
- Internet and technology
- Skipping a line: why neural networks were vulnerable to hacking using verses
Skipping a line: why neural networks were vulnerable to hacking using verses
Malicious requests submitted in the form of verses make artificial intelligence (AI) 62% more likely to break the rules — scientists came to these conclusions after conducting an experiment with 25 language models. At the same time, the poetic form of the same easily bypassed a variety of neural network filters, including to protect against cyber attacks, manipulation, and privacy violations. For more information about how "poetic attacks" on AI are arranged, how dangerous this vulnerability is and how to counter it, read the Izvestia article
What is known about the experiment with "poetic attacks" on neural networks
An unusual experiment with "adversarial poetry" on AI was recently described by researchers from the University of Rome La Sapienza and Sant'Anna School of Advanced Studies researchers. The essence of such attacks is to mask inherently dangerous meanings behind images and metaphors.
This method of checking the security of neural networks turned out to be surprisingly effective: the author's verses gave more than 62% of successful circumventions of protections. In the case of 1,200 malicious suggestions that were automatically converted to verse form, this figure was slightly lower — about 43%. It is important to note that the prosaic counterparts of such prompts showed significantly lower rates.
DeepSeek neural networks, as well as Google models, including Gemini 2.5 Pro, turned out to be particularly vulnerable to "poetic attacks", which "broke" on all 20 samples of malicious prompts in poetry. During the experiment, some neural networks produced dangerous responses in more than 90% of cases. Open AI and Anthropic products proved to be more stable, but there were also failures among them. In particular, the GPT-5 line showed up to 10% of unsuccessful responses.
As the authors of the study noted, "poetic attacks" were equally easy to bypass security filters against cyber attacks, manipulation, privacy violations, fraud, malicious software creation and other scenarios. All this suggests that the vulnerability problem lies not in the thematic filters, but in the architecture of the failure mechanisms and the logic of text analysis.
Why Poetry has become the Key to circumventing AI's Defenses
A study conducted by Italian scientists demonstrates a fundamental flaw in modern large language models (LLM) — defense mechanisms operate primarily at the level of semantic analysis of "typical" malicious queries, says Sergey Zybnev, a leading specialist in the Bastion IP vulnerability management department, in an interview with Izvestia.
"However, when a request changes shape — for example, it becomes a poem — security classifiers lose their ability to correctly identify a threat," the expert notes. — The statistics provided in the study indicate that the problem is solvable, but requires significant investments in the security architecture.
According to Sergey Zybnev, the vulnerability of AI to "poetic attacks" is serious, but not critical — it requires targeted efforts by the attacker and does not scale automatically. However, the very fact of its existence shows that today the neural network industry is at the initial stage of building reliable protective mechanisms for them.
The poetic form is still an atypical pattern that the security filters of most AI assistants have not learned to recognize, adds Stanislav Pyzhov, head of the analysis group at the Solar 4RAYS Cyber Threat Research Center at Solar Group. In addition, when it comes to poetry, neural networks switch to a "creative mode" in which the artistic component of the response is preferred over the ethical aspects.
"The researchers just found another method that allows them to substitute concepts for AI, which forces the model to talk about "forbidden" topics or give "harmful advice", believing that it is only about writing poetry," says Maxim Alexandrov, an expert on Security Code software products.
What other unusual ways to circumvent AI protection have you encountered before?
Various LLM attack and hacking strategies are regularly discovered by experts, Alexander Lebedev, senior developer of Innostage artificial intelligence systems, says in an interview with Izvestia. According to the expert, the strategy with "poetic attacks" is interesting because it looks original from the outside and is relatively easy to reproduce. At the same time, there are whole classes of more effective attacks on neural networks.
- Crescendo (Multi-turn Drift). The neural network is gradually being brought to a dangerous answer. At the same time, the model itself does not understand that it is being hacked.
- DAN role change attacks. The models say that she is free from all moral constraints.
- Attacks of "Machiavellianism". The need for dangerous LLM answers is justified by necessity.
- Many-shot Jailbreaking. The model is given many examples of malicious behavior, thus immersing it in the context of "It's accepted here — don't limit yourself."
—Text encryption also works well, for example, in base64, or using low—resource languages," notes Alexander Lebedev. — The model's defense is well trained to catch attacks in English, but it can easily miss an attack in Swahili, Sumerian, or another rare language.
The risks of such vulnerabilities are that they are scalable and poorly visible, says Nikita Novikov, an expert on cybersecurity at Angara Security. Malicious users can automate such techniques to massively generate malicious instructions, phishing texts, social engineering scenarios, or circumvent corporate restrictions when using AI within companies. In application systems, this can lead to data leaks, the creation of malicious content, and the undermining of trust in neural networks as a secure tool.
—Attacks on AI lead to data loss, and sensitive data if it has access to corporate secrets or company code,— emphasizes Alexander Lebedev. — However, attacks on AI agents that can carry out actions in the real world are especially dangerous.
How to protect yourself from attacks on AI using poetry and other tricks
Unfortunately, at the current level of LLM development, it is not necessary to talk about the complete protection of neural networks from hacking, says Sergey Polunin, head of the IT infrastructure solutions protection group at Gazinformservice, in an interview with Izvestia. However, you can significantly reduce the risks if you focus not on static query filtering, but on the query form and understanding the meaning.
— There are two strategies that can be used in parallel: first, train your neural network to recognize bypass techniques and thus reduce the likelihood of exploitation, — says the expert. — Secondly, for particularly sensitive scenarios, it is possible to connect live people, not to mention that AI in corporate environments must constantly undergo audits and a comprehensive study of the industry.
The experiment with "poetic attacks", like any other research on software vulnerabilities, is useful because it improves the final product — that is, it makes neural networks less vulnerable to "harmful" queries, Stanislav Pyzhov notes. Therefore, developers of neural network products should closely monitor the appearance of such studies and take into account their results.
After threats like "poetic attacks" appear, which are usually really effective, modelers try to train them not to respond to this kind of manipulation, says Vladislav Tushkanov, head of the machine learning technology research and development group at Kaspersky Lab. However, the fact that people are able to find new workarounds underscores the fundamental difficulty of training models to withstand attacks.
"Therefore, for full—fledged protection, it is necessary to combine internal protection of models with additional measures, such as detection of unacceptable inputs and outputs (AI Firewall), monitoring of models and interactions with them, as well as active testing of services for resistance to attacks," concludes Izvestia's interlocutor.
Переведено сервисом «Яндекс Переводчик»