CROP: Zero-shot Cross-lingual Named Entity Recognition with Multilingual Labeled Sequence Translation


Named entity recognition (NER) suffers from the scarcity of annotated training data, especially for low-resource languages without labeled data. Cross-lingual NER has been proposed to alleviate this issue by transferring knowledge from high-resource languages to low-resource languages via aligned cross-lingual representations or machine translation results. However, the performance of cross-lingual NER methods is severely affected by the unsatisfactory quality of translation or label projection. To address these problems, we propose a Cross-lingual Entity Projection framework (CROP) to enable zero-shot cross-lingual NER with the help of a multilingual labeled sequence translation model. Specifically, the target sequence is first translated into the source language and then tagged by a source NER model. We further adopt a labeled sequence translation model to project the tagged sequence back to the target language and label the target raw sentence. Ultimately, the whole pipeline is integrated into an end-to-end model by the way of self-training. Experimental results on two benchmarks demonstrate that our method substantially outperforms the previous strong baseline by a large margin of +3 ~ 7 F1 scores and achieves state-of-the-art performance.

In Findings of the Association for Computational Linguistics EMNLP 2022


  title = "{CROP}: Zero-shot Cross-lingual Named Entity Recognition with Multilingual Labeled Sequence Translation",
  author = "Yang, Jian and Huang, Shaohan and Ma, Shuming and Yin, Yuwei and Dong, Li and Zhang, Dongdong and Guo, Hongcheng and Li, Zhoujun and Wei, Furu",
  booktitle = "Findings of the Association for Computational Linguistics: EMNLP 2022",
  publisher = "Association for Computational Linguistics",
  pages = "486--496",
  year = "2022",
  month = "12",
  url = "",
  address = "Abu Dhabi, United Arab Emirates",
Yuwei Yin
Yuwei Yin
Adventurer | Seeker

Adventurer | Seeker