conftrace_

← Learning Types

Machine Learning › Learning Types ›

Adversarial Learning

4,854 papers

Papers per year

Papers

[MASK] Insertion: a robust method for anti-adversarial attacks EACL 2023

Prompting for explanations improves Adversarial NLI. Is this true? {Yes} it is {true} because {it weakens superficial cues} EACL 2023

Unveiling the Implicit Toxicity in Large Language Models EMNLP 2023

Lion: Adversarial Distillation of Proprietary Large Language Models EMNLP 2023

TrojanSQL: SQL Injection against Natural Language Interface to Database EMNLP 2023

CT-GAT: Cross-Task Generative Adversarial Attack based on Transferability EMNLP 2023

Hidding the Ghostwriters: An Adversarial Evaluation of AI-Generated Student Essay Detection EMNLP 2023

Prompt as Triggers for Backdoor Attack: Examining the Vulnerability in Language Models EMNLP 2023

MeaeQ: Mount Model Extraction Attacks with Efficient Queries EMNLP 2023

“Are Your Explanations Reliable?” Investigating the Stability of LIME in Explaining Text Classifiers by Marrying XAI and Adversarial Attack EMNLP 2023

Generative Adversarial Training with Perturbed Token Detection for Model Robustness EMNLP 2023

Poisoning Retrieval Corpora by Injecting Adversarial Passages EMNLP 2023

RobustQA: A Framework for Adversarial Text Generation Analysis on Question Answering Systems EMNLP 2023

Improving Classifier Robustness through Active Generative Counterfactual Data Augmentation EMNLP 2023

How Reliable Are AI-Generated-Text Detectors? An Assessment Framework Using Evasive Soft Prompts EMNLP 2023

Attack Prompt Generation for Red Teaming and Defending Large Language Models EMNLP 2023

No offence, Bert - I insult only humans! Multilingual sentence-level attack on toxicity detection networks EMNLP 2023

RoAST: Robustifying Language Models via Adversarial Perturbation with Selective Training EMNLP 2023

Multi-step Jailbreaking Privacy Attacks on ChatGPT EMNLP 2023

Robustness Tests for Automatic Machine Translation Metrics with Adversarial Attacks EMNLP 2023

ASSERT: Automated Safety Scenario Red Teaming for Evaluating the Robustness of Large Language Models EMNLP 2023

Effects of Human Adversarial and Affable Samples on BERT Generalization EMNLP 2023

Guiding LLM to Fool Itself: Automatically Manipulating Machine Reading Comprehension Shortcut Triggers EMNLP 2023

Are Personalized Stochastic Parrots More Dangerous? Evaluating Persona Biases in Dialogue Systems EMNLP 2023

A Black-Box Attack on Code Models via Representation Nearest Neighbor Search EMNLP 2023