doi.mendelu.cz: SURVEY OF LARGE LANGUAGE MODELS ON THE TEXT GENERATION TASK

DOI: 10.11118/978-80-7509-990-7-0195

SURVEY OF LARGE LANGUAGE MODELS ON THE TEXT GENERATION TASK

Michaela Veselá¹, Oldřich Trenz¹: ¹ Department of Informatics, Faculty of Business and Economics, Mendel University in Brno, Zemědělská 1, 613 00 Brno, Czech Republic

This paper focuses on the comparison of GPT, GPT-2, XLNet, T5 models on text generation tasks. None of the autoencoder models are included in the comparison ranking due to their unsuitability for text generation tasks. The comparison of the models was performed using the BERT-score metric, which calculates precision, recall and F1 values for each sentence. The median was used to obtain the final results from this metric. A preprocessed dataset of empathetic dialogues was used to test the models, which is presented in this paper and compared with other datasets containing dialogues in English. The tested models were only pre-trained and there was no fine-tune on the dataset used for testing. The transformers library from Hugging face and the Python language were used to test the models. The research showed on the pre-trained dataset empathic dialogues has the highest precision model T5, recall and F1 has the highest precision model GPT-2.

Keywords: natural language processing, auto-regressive transformers, large-scale model, natural language generation, decoder transformer, auto-encoding transformers, sequence to sequence model

pages: 195-200, online: 2024

References

BROWN, T. B., MANN, B., RYDER, N., SUBBIAH, M., KAPLAN, J., DHARIWAL, P., NEELAKANTAN, A., SHYAM, P., SASTRY, G., ASKELL, A., AGARWAL, S., HERBERT-VOSS, A., KRUEGER, G., HENIGHAN, T., CHILD, R., RAMESH, A., ZIEGLER, D. M., WU, J., WINTER, C. … and AMODEI, D. 2020. Language Models are Few-Shot Learners (Version 4). arXiv. https://doi.org/10.48550/ARXIV.2005.14165 Go to original source...
CLARK, K., LUONG, M.-T., LE, Q. V. and MANNING, C. D. 2020. ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators (Version 1). arXiv. https://doi.org/10.48550/ARXIV.2003.10555 Go to original source...
DEVLIN, J., CHANG, M.-W., LEE, K. and TOUTANOVA, K. 2018. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding (Version 2). arXiv. https://doi.org/10.48550/ARXIV.1810.04805 Go to original source...
JANGIR, S. 2021. Finetuning BERT and XLNet for Sentiment Analysis of Stock Market Tweets using Mixout and Dropout Regularization. Technological University Dublin. https://doi.org/10.21427/K0YS-5B82 Go to original source...
KHALIQ, Z., FAROOQ, S. U. and KHAN, D. A. 2022. A deep learning-based automated framework for functional User Interface testing. Information and Software Technology, 150, 106969. Elsevier BV. https://doi.org/10.1016/j.infsof.2022.106969 Go to original source...
KIM, H., HESSEL, J., JIANG, L., WEST, P., LU, X., YU, Y., ZHOU, P., BRAS, R. L., ALIKHANI, M., KIM, G., SAP, M. and CHOI, Y. 2022. SODA: Million-scale Dialogue Distillation with Social Commonsense Contextualization (Version 3). arXiv. https://doi.org/10.48550/ARXIV.2212.10465 Go to original source...
KIM, H., YU, Y., JIANG, L., LU, X., KHASHABI, D., KIM, G., CHOI, Y. and SAP, M. 2022. ProsocialDialog: A Prosocial Backbone for Conversational Agents (Version 2). arXiv. https://doi.org/10.48550/ARXIV.2205.12688 Go to original source...
LAN, Z., CHEN, M., GOODMAN, S., GIMPEL, K., SHARMA, P. and SORICUT, R. 2019. ALBERT: A Lite BERT for Self-supervised Learning of Language Representations (Version 6). arXiv. https://doi.org/10.48550/ARXIV.1909.11942 Go to original source...
LEE, Y.-J., KO, B., KIM, H.-G. and CHOI, H.-J. 2022. DialogCC: Large-Scale Multi-Modal Dialogue Dataset (Version 1). arXiv. https://doi.org/10.48550/ARXIV.2212.04119 Go to original source...
LI, Q., LI, P., REN, Z., REN, P. and CHEN, Z. 2020. Knowledge Bridging for Empathetic Dialogue Generation (Version 3). arXiv. https://doi.org/10.48550/ARXIV.2009.09708 Go to original source...
LI, Y., SU, H., SHEN, X., LI, W., CAO, Z. and NIU, S. 2017. DailyDialog: A Manually Labelled Multi-turn Dialogue Dataset (Version 1). arXiv. https://doi.org/10.48550/ARXIV.1710.03957 Go to original source...
LIU, Y., OTT, M., GOYAL, N., DU, J., JOSHI, M., CHEN, D., LEVY, O., LEWIS, M., ZETTLEMOYER, L. and STOYANOV, V. 2019. RoBERTa: A Robustly Optimized BERT Pretraining Approach (Version 1). arXiv. https://doi.org/10.48550/ARXIV.1907.11692 Go to original source...
LOWE, R., POW, N., SERBAN, I. and PINEAU, J. 2015. The Ubuntu Dialogue Corpus: A Large Dataset for Research in Unstructured Multi-Turn Dialogue Systems (Version 3). arXiv. https://doi.org/10.48550/ARXIV.1506.08909 Go to original source...
NGUYEN-MAU, T., LE, A.-C., PHAM, D.-H. and HUYNH, V.-N. 2024. An information fusion based approach to context-based fine-tuning of GPT models. Information Fusion, 104, 102202. Elsevier BV. https://doi.org/10.1016/j.inffus.2023.102202 Go to original source...
OPENAI, ACHIAM, J., ADLER, S., AGARWAL, S., AHMAD, L., AKKAYA, I., ALEMAN, F. L., ALMEIDA, D., ALTENSCHMIDT, J., ALTMAN, S., ANADKAT, S., AVILA, R., BABUSCHKIN, I., BALAJI, S., BALCOM, V., BALTESCU, P., BAO, H., BAVARIAN, M. … ZOPH, B. 2023. GPT-4 Technical Report (Version 4). arXiv. https://doi.org/10.48550/ARXIV.2303.08774 Go to original source...
PAPINENI, K. et al. 2002. Bleu: a method for automatic evaluation of machine translation. In: Proceedings of the 40th annual meeting of the Association for Computational Linguistics, p. 311-318. Go to original source...
RADFORD, A., NARASIMHAN, K., SALIMANS, T. and SUTSKEVER, I. 2018. Improving language understanding by generative pre-training. https://cdn.openai.com/research-covers/language-unsupervised/language_understanding_paper.pdf
RADFORD, A., WU, J., CHILD, R., LUAN, D., AMODEI, D. and SUTSKEVER, I. 2019. Language Models are Unsupervised Multitask Learners. https://d4mucfpksywv.cloudfront.net/better-language-models/language_models_are_unsupervised_multitask_learners.pdf
RAFFEL, C., SHAZEER, N., ROBERTS, A., LEE, K., NARANG, S., MATENA, M., ZHOU, Y., LI, W. and LIU, P. J. 2019. Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer (Version 4). arXiv. https://doi.org/10.48550/ARXIV.1910.10683 Go to original source...
RAHALI, A. and AKHLOUFI, M. A. 2023. End-to-End Transformer-Based Models in Textual-Based NLP. AI, 4(1), 54-110. MDPI AG. https://doi.org/10.3390/ai4010004 Go to original source...
SAI, A. B., MOHANKUMAR, A. K. and KHAPRA, M. M. 2020. A Survey of Evaluation Metrics Used for NLG Systems (Version 2). arXiv. https://doi.org/10.48550/ARXIV.2008.12009 Go to original source...
SIVARAJKUMAR, S. and WANG, Y. 2022. HealthPrompt: A Zero-shot Learning Paradigm for Clinical Natural Language Processing (Version 1). arXiv. https://doi.org/10.48550/ARXIV.2203.05061 Go to original source...
STASTNY, J. and SKORPIL, V. 2007. Analysis of Algorithms for Radial Basis Function Neural Network. In: Personal Wireless Communications. Springer New York, vol. 245, pp. 54-62, ISSN 1571- 5736, ISBN 978-0-387-74158-1, WOS:000250717300005. Go to original source...
STASTNY, J., SKORPIL. V., BALOGH, Z. and KLEIN, R. 2021. Job shop scheduling problem optimization by means of graph-based algorithm. Applied Sciences, 11(4), 1921. ISSN 2076-3417. https://doi.org/10.3390/app11041921 Go to original source...
THOPPILAN, R., DE FREITAS, D., HALL, J., SHAZEER, N., KULSHRESHTHA, A., CHENG, H.-T., JIN, A., BOS, T., BAKER, L., DU, Y., LI, Y., LEE, H., ZHENG, H. S., GHAFOURI, A., MENEGALI, M., HUANG, Y., KRIKUN, M., LEPIKHIN, D., QIN, J. … and LE, Q. 2022. LaMDA: Language Models for Dialog Applications (Version 3). arXiv. https://doi.org/10.48550/ARXIV.2201.08239 Go to original source...
TUNSTALL, Lewis, WERRA, Lenadro von, WOLF, Thomas and GÉRON, Aurélien. 2022. Natural language processing with transformers: building language applications with Hugging Face. Revised edition. Beijing: O'Reilly. ISBN 978-1-098-13679-6
YANG, Z., DAI, Z., YANG, Y., CARBONELL, J., SALAKHUTDINOV, R. and LE, Q. V. 2019. XLNet: Generalized Autoregressive Pretraining for Language Understanding (Version 2). arXiv. https://doi.org/10.48550/ARXIV.1906.08237 Go to original source...
ZHANG, T., KISHORE, V., WU, F., WEINBERGER, K. Q. and ARTZI, Y. 2019. BERTScore: Evaluating Text Generation with BERT (Version 3). arXiv. https://doi.org/10.48550/ARXIV.1904.09675 Go to original source...
ZIEGLER, D. M., WU, J., WINTER, C., … and AMODEI, D. 2020. Language Models are Few-Shot Learners (Version 4). arXiv. https://doi.org/10.48550/ARXIV.2005.14165 Go to original source...