Towards Low-resource Language Generation with Limited Supervision

Abstract

We present a research narrative aimed at enabling language technology for multiple natural language generation (NLG) tasks in low-resource languages (LRLs). With approximately 7,000 languages spoken globally, many lack the resources required for model training. NLG applications for LRLs present two additional key challenges: (i) The training is more pronounced, and (ii) Zero-shot modeling is a viable research direction for scalability; however, generating zero-shot well-formed text in target LRLs is challenging. Addressing these concerns, this narrative introduces three promising research explorations that serve as a step toward enabling language technology for many LRLs. These approaches make effective use of transfer learning and limited supervision techniques for modeling. Evaluations were conducted mostly in the zero-shot setting, enabling scalability. This research narrative is an ongoing doctoral thesis.

BibTeX

@inproceedings{maurya-desarkar-2023-towards,
  title     = {Towards Low-resource Language Generation with Limited Supervision},
  author    = {Maurya, Kaushal  and
              Desarkar, Maunendra},
  editor    = {Elazar, Yanai  and
              Ettinger, Allyson  and
              Kassner, Nora  and
              Ruder, Sebastian  and
              A. Smith, Noah},
  booktitle = {Proceedings of the Big Picture Workshop},
  month     = dec,
  year      = {2023},
  address   = {Singapore},
  publisher = {Association for Computational Linguistics},
  url       = {https://aclanthology.org/2023.bigpicture-1.7/},
  doi       = {10.18653/v1/2023.bigpicture-1.7},
  pages     = {80--92},
  abstract  = {We present a research narrative aimed at enabling language technology for multiple natural language generation (NLG) tasks in low-resource languages (LRLs). With approximately 7,000 languages spoken globally, many lack the resources required for model training. NLG applications for LRLs present two additional key challenges: (i) The training is more pronounced, and (ii) Zero-shot modeling is a viable research direction for scalability; however, generating zero-shot well-formed text in target LRLs is challenging. Addressing these concerns, this narrative introduces three promising research explorations that serve as a step toward enabling language technology for many LRLs. These approaches make effective use of transfer learning and limited supervision techniques for modeling. Evaluations were conducted mostly in the zero-shot setting, enabling scalability. This research narrative is an ongoing doctoral thesis.}
}