3.2 案例2:系统提示词提取 攻击Payload演变历史: Version 1(早期朴素方法): Version 2(角色欺骗): Version 3(数据脱敏诱导): Version 4(递归提取): 实测效果: 在多个未加防护的开源模型中,Version 3和Version 4的成功率超过60%。
7.3.1 视觉通路利用(Visual Pathway Exploitation) 多模态大模型(如GPT-4V、Gemini Pro Vision、LLaVA)在处理图像时存在新的攻击面。攻击者可以通过在图像中嵌入人眼不可见的扰动来操纵模型行为。 A. 对抗图像攻击 B. 隐写术注入 将恶意指令编码到图像的低位平面: C. 组合攻击(Compositional Attacks) 根据NeurIPS 2024的研究,多模态模型面临组合对抗攻击:
7.4.3 水印防御技术 为了防御成员推断攻击和模型窃取,研究者提出了水印技术: A. 模型权重水印 B. 输出水印(用于检测模型窃取) C. 数据水印(用于防御成员推断)
7.5 新兴防御技术
7.5.1 对抗训练(Adversarial Training)
7.5.2 Constitutional AI
7.5.3 安全验证器(Safety Verifier)
八、参考资料: 1OWASP Top 10 for LLM Applications, https://owasp.org/www-project-top-10-for-large-language-model-applications/ 2Greshake, K., et al. "Not what you've signed up for: Compromising Real-World LLM-Integrated Applications with Indirect Prompt Injection." ACM CCS 2023. 3Hendrycks, D., et al. "Jailbreaking: A Case Study on Safety & Security Risks of Large Language Models." 2023. 4IBM Security "2024 Threat Intelligence Report" 5NIST "AI Risk Management Framework (AI RMF 1.0)" 6"Exploitation of Visual Pathways in Multi-Modal Language Models." arXiv:2411.05056, Nov 2024. 7"Safety of Multimodal Large Language Models on Images." IJCAI 2024. 8"Multi-modal LLMs are Vulnerable to Compositional Adversarial Attacks." NeurIPS 2024. 9"Chain of Attack: On the Robustness of Vision-Language Models." CVPR 2025. 10"Membership Inference Attacks Against Vision-Language Models." USENIX Security 2025. 11"SoK: On Gradient Leakage in Federated Learning." USENIX Security 2025. 12"Can Watermarking Large Language Models Prevent Membership Inference Attacks?" AAAI 2025. 13"Robust Data Watermarking in Language Models." ACL Findings 2025. 14"Gradient-Free Privacy Leakage in Federated Language Models." arXiv:2310.16152, 2024. 15"Adversarial Attacks on Multimodal Large Language Models." OpenReview, 2024. 16面向人工智能模型的安全攻击和防御策略综述. 计算机研究与发展, 2024. 17大语言模型安全与隐私风险综述. 浙江大学NESA, 2025. 18多模态大模型安全研究进展. 中国图像图形学报, 2024.