Cross-Domain Transferability of Adversarial Examples in NLP

Existing attacks rely directly or indirectly on the source training data, which hampers their transferability to other domains. Previous work in Computer Vision [1] has explored the idea of the transferability of adversarial examples across domains with compelling results that demonstrated such domain invariant adversaries. In this work we explore this idea in NLP by exploring different domain and task settings and looking at how adversarial examples transfer across them. To the best of our knowledge, the work closest to ours is [2], which only explores the similar domain, same task setting.

  • [1] M. Naseer, S. H. Khan, H. Khan, F. S. Khan, and F. Porikli, “Cross-domain transferability of adversarial perturbations,” 2019.
  • [2] ] S. Datta, “Learn2weight: Parameter adaptation against similar-domain adversarial attacks,” 2022.

The code is in GitHub Cross-Domain-Attacks-NLP.

Cross-Domain-Attacks-NLP

Cross-Domain-Attacks-NLP