MAI-2024-0016
November 01, 2024
The SQL Injection Jailbreak (SIJ) vulnerability represents a significant threat to Large Language Models (LLMs), enabling attackers to circumvent established safety protocols by altering the structure of input prompts. This attack method exploits the model's handling of system prompts, user prefixes, user prompts, and assistant prefixes. By strategically "commenting out" the expected response prefix, attackers can inject malicious instructions, prompting the LLM to produce unsafe content. The vulnerability is rooted in the external properties of LLMs, specifically their input prompt parsing mechanisms, rather than intrinsic model flaws.
Mitigation steps: **For AI Developers:**
* Implement input sanitization and validation techniques to prevent malicious prompt structures.
* Develop robust parsing mechanisms that are less susceptible to manipulation through structural changes in the input.
* Employ defense methods that incorporate random strings or keys after ethical prompts, making it harder for attackers to reliably predict response prefixes.
**For Model Trainers/Fine-tuners:**
* Regularly update and patch LLMs with security fixes addressing newly discovered vulnerabilities.
* Conduct thorough security testing and penetration testing of LLMs to identify and mitigate potential vulnerabilities.
Related ResourcesĀ (1)
Do you need more information?
Contact UsCVSS v4
Base Score:
8.8
Attack Vector
NETWORK
Attack Complexity
LOW
Attack Requirements
NONE
Privileges Required
NONE
User Interaction
NONE
Vulnerable System Confidentiality
LOW
Vulnerable System Integrity
HIGH
Vulnerable System Availability
NONE
Subsequent System Confidentiality
NONE
Subsequent System Integrity
NONE
Subsequent System Availability
NONE
CVSS v3
Base Score:
8.2
Attack Vector
NETWORK
Attack Complexity
LOW
Privileges Required
NONE
User Interaction
NONE
Scope
UNCHANGED
Confidentiality
LOW
Integrity
HIGH
Availability
NONE
AIVSS
Base Score:
5