HomeVulnerabilitySafety researchers circumvent Microsoft Azure AI Content material Security

Safety researchers circumvent Microsoft Azure AI Content material Security

Stress testing

Mindgard deployed these two filters in entrance of ChatGPT 3.5 Turbo utilizing Azure OpenAI, then accessed the goal LLM by way of Mindgard’s Automated AI Crimson Teaming Platform.

Two assault strategies had been used in opposition to the filters: Character injection (including particular varieties of characters and irregular textual content patterns, and so forth.) and adversarial ML evasion (discovering blind spots inside ML classification).

Character injection lowered Immediate Guard’s jailbreak detection effectiveness from 89% to 7% when uncovered to diacritics (e.g., altering the letter a to á), homoglyphs (e.g., shut resembling characters comparable to 0 and O), numerical substitute (“Leet converse”), and spaced characters. The effectiveness of AI Textual content Moderation was additionally lowered utilizing related methods.

See also  Generative AI is scaring CISOs – however adoption isn’t slowing down
- Advertisment -spot_img
RELATED ARTICLES

LEAVE A REPLY

Please enter your comment!
Please enter your name here

- Advertisment -

Most Popular