Luís Pina

Reproducibility in Computational Linguistics: Is Source Code Enough?

| PDF |

Mohammad Arvan and Luís Pina and Natalie Parde
In Proceedings of the Association for Computational Linguistics 2022 Empirical Methods in Natural Language Processing (EMNLP)
December, 2022


The availability of source code has been put forward as one of the most critical factors for improving the reproducibility of scientific research. This work studies trends in source code availability at major computational linguistics conferences, namely, ACL, EMNLP, LREC, NAACL, and COLING. We observe positive trends, especially in conferences that actively promote reproducibility. We follow this by conducting a reproducibility study of eight papers published in EMNLP 2021, finding that source code releases leave much to be desired. Moving forward, we suggest all conferences require self-contained artifacts and provide a venue to evaluate such artifacts at the time of publication. Authors can include small-scale experiments and explicit scripts to generate each result to improve the reproducibility of their work.


  title     = {Reproducibility in Computational Linguistics: Is Source Code Enough?},
  author    = {Arvan, Mohammad and Pina, Lu\'{\i}s and Parde, Natalie},
  booktitle = {Proceedings of the {Association for Computational Linguistics} 2022 Empirical Methods in Natural Language Processing},
  year      = {2022},
  month     = DEC,
  series    = {EMNLP '22},
  location  = {Abu Dhabi, United Arab Emirates},
  publisher = {Association for Computational Linguistics},