Papers

2,781 papers found
Bias in the Mirror : Are LLMs opinions robust to their own adversarial attacks
Virgile Rennard, Christos Xypolopoulos, Michalis Vazirgiannis
2025 ACL
2025 ACL
Taming LLMs with Gradient Grouping
Siyuan Li, Juanxi Tian, Zedong Wang et al.
2025 ACL
2025 ACL
2025 ACL
2025 ACL
2025 ACL
Stepwise Reasoning Disruption Attack of LLMs
Jingyu Peng, Maolin Wang, Xiangyu Zhao et al.
2025 ACL
2025 ACL
2025 ACL
2025 ACL
GIFT-SW: Gaussian noise Injected Fine-Tuning of Salient Weights for LLMs
Maxim Zhelnin, Viktor Moskvoretskii, Egor Shvetsov et al.
2025 ACL
Biased LLMs can Influence Political Decision-Making
Jillian Fisher, Shangbin Feng, Robert Aron et al.
2025 ACL