Tech

How a Seemingly Harmless Image Can Jailbreak Vision-Language AI Models

Published

on

Slashdot reader BrianFagioli writes: Florida International University researchers have developed a technique called JaiLIP (Jailbreaking with Loss-guided Image Perturbation) that uses subtle image modifications to bypass AI safety guardrails. Unlike traditional jailbreaks that rely on carefully crafted prompts, the attack works through images that appear normal to human viewers.

The researchers tested the technique against BLIP-2, a multimodal AI model, and found that manipulated images significantly increased the likelihood of harmful responses. According to the study, the approach outperformed previous image-based jailbreak methods and nearly doubled the number of unsafe outputs generated during testing.

The findings highlight a potential security risk for businesses deploying AI systems that process both images and text. While most discussions about AI safety focus on prompts, the research suggests that seemingly harmless images may also serve as an attack vector.

Source link

Advertisement

You must be logged in to post a comment Login

Leave a Reply

Cancel reply

Trending

Exit mobile version