VLSBench: Unveiling Information Leakage in Multimodal Safety


1Shanghai Artificial Intelligence Laboratory, 2Fudan University, 3Beihang University

*Equally Contribution †Corresponding Author
geometric reasoning

Overview of our work. We have discovered a problem in current multimodal safety data samples, which says visual safety information leakage (VSIL). Based on this leakage, we further find it leads to a counter-intuitive problem, that simpler SFT-based alignment methods can perform nearly the same high safety rate. Thus, we construct VLSBench, preventing visual leakage. This newly proposed task discourages textual alignment and motivates more dedicated multimodal alignment methods to better solve this challenging task.

Abstract

Safety concerns of Multimodal large language models (MLLMs) have gradually become an important problem in various applications. Surprisingly, previous works indicate a counter-intuitive phenomenon that using textual unlearning to align MLLMs achieves comparable safety performances with MLLMs trained with image-text pairs. To explain such a counter-intuitive phenomenon, we discover a visual safety information leakage (VSIL) problem in existing multimodal safety benchmarks, i.e., the potentially risky and sensitive content in the image has been revealed in the textual query. In this way, MLLMs can easily refuse these sensitive text-image queries according to textual queries. However, image-text pairs without VSIL are common in real-world scenarios and are overlooked by existing multimodal safety benchmarks. To this end, we construct multimodal visual leakless safety benchmark (VLSBench) preventing visual safety leakage from image to textual query with 2.4k image-text pairs. Experimental results indicate that VLSBench poses a significant challenge to both open-source and close-source MLLMs, including LLaVA, Qwen2-VL, Llama3.2-Vision, and GPT-4o. This study demonstrates that textual alignment is enough for multimodal safety scenarios with VSIL, while multimodal alignment is a more promising solution for multimodal safety scenarios without VSIL.

The Problem of VSIL

Shortcut Alignment

dataset_statistics

The VSIL problem leads the shortcut alignment methods: textual alignment to the multimodal safety challenge.


VLSBench Dataset

Overview

dataset_statistics

To address the existing issues in current multimodal safety benchmarks, called VSIL, we construct Multimodal Visual Leakless Safety Benchmark (VLSBench) filling this blank in the current multimodal safety datasets. As shown above, our dataset compromise 2.4k image-text pairs, convering 6 categories and 19 sub-categories.


Construction

pipeline

Our data construction pipeline shown above focuses on effectively preventing visual safety leakage from image modality to textual query. First, we should generate harmful textual queries from two parallel paths shown in Step 1. Then, we need to detoxify the harmful queries and obtain the harmless queries shown in Step 2. Furthermore, we use text-to-image models to iteratively generate images shown in Step 3. Finally, we filter out the mismatched and safe image-text pairs and obtain the final datasets as shown in Step 4.

Samples Preview

pipeline

We present six examples each paired with a corresponding response in our VSLBench. The left and middle four images are generated and the right two images are from existing data sources.

Experimental Results

Benchmark Results

pipeline

We evaluate various MLLMs including open-source models and close-source APIs. We also benchmark several safety aligned baselines. The evaluation is conducted by GPT-4o with a specialized prompt. We classify the response into three type, safe with refusal, safe with warning and unsafe. The safe rate is sum of safe refuse rate and safe warning rate.


pipeline

The above two figures highlight our VLSBench's several features: (1) the challenging nature of our dataset; (2) highlight the importance of multimodal alignment methods rather than textual shortcut alignment.

More Examples

BibTeX

@article{hu2024vlsbench,
      title={VLSBench: Unveiling Visual Leakage in Multimodal Safety}, 
      author={Xuhao Hu and Dongrui Liu and Hao Li and Xuanjing Huang and Jing Shao},
      journal={arXiv preprint arXiv:2411.19939},
      year={2024}
    }