doi.mendelu.cz: SURVEY OF LARGE LANGUAGE MODELS ON THE TEXT GENERATION TASK

DOI: 10.11118/978-80-7509-990-7-0195

SURVEY OF LARGE LANGUAGE MODELS ON THE TEXT GENERATION TASK

Michaela Veselá¹, Oldřich Trenz¹: ¹ Department of Informatics, Faculty of Business and Economics, Mendel University in Brno, Zemědělská 1, 613 00 Brno, Czech Republic

This paper focuses on the comparison of GPT, GPT-2, XLNet, T5 models on text generation tasks. None of the autoencoder models are included in the comparison ranking due to their unsuitability for text generation tasks. The comparison of the models was performed using the BERT-score metric, which calculates precision, recall and F1 values for each sentence. The median was used to obtain the final results from this metric. A preprocessed dataset of empathetic dialogues was used to test the models, which is presented in this paper and compared with other datasets containing dialogues in English. The tested models were only pre-trained and there was no fine-tune on the dataset used for testing. The transformers library from Hugging face and the Python language were used to test the models. The research showed on the pre-trained dataset empathic dialogues has the highest precision model T5, recall and F1 has the highest precision model GPT-2.

Klíčová slova: natural language processing, auto-regressive transformers, large-scale model, natural language generation, decoder transformer, auto-encoding transformers, sequence to sequence model

stránky: 195-200, online: 2024

Reference

BROWN, T. B., MANN, B., RYDER, N., SUBBIAH, M., KAPLAN, J., DHARIWAL, P., NEELAKANTAN, A., SHYAM, P., SASTRY, G., ASKELL, A., AGARWAL, S., HERBERT-VOSS, A., KRUEGER, G., HENIGHAN, T., CHILD, R., RAMESH, A., ZIEGLER, D. M., WU, J., WINTER, C. … and AMODEI, D. 2020. Language Models are Few-Shot Learners (Version 4). arXiv. https://doi.org/10.48550/ARXIV.2005.14165 Přejít k původnímu zdroji...
CLARK, K., LUONG, M.-T., LE, Q. V. and MANNING, C. D. 2020. ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators (Version 1). arXiv. https://doi.org/10.48550/ARXIV.2003.10555 Přejít k původnímu zdroji...
DEVLIN, J., CHANG, M.-W., LEE, K. and TOUTANOVA, K. 2018. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding (Version 2). arXiv. https://doi.org/10.48550/ARXIV.1810.04805 Přejít k původnímu zdroji...
JANGIR, S. 2021. Finetuning BERT and XLNet for Sentiment Analysis of Stock Market Tweets using Mixout and Dropout Regularization. Technological University Dublin. https://doi.org/10.21427/K0YS-5B82 Přejít k původnímu zdroji...
KHALIQ, Z., FAROOQ, S. U. and KHAN, D. A. 2022. A deep learning-based automated framework for functional User Interface testing. Information and Software Technology, 150, 106969. Elsevier BV. https://doi.org/10.1016/j.infsof.2022.106969 Přejít k původnímu zdroji...
KIM, H., HESSEL, J., JIANG, L., WEST, P., LU, X., YU, Y., ZHOU, P., BRAS, R. L., ALIKHANI, M., KIM, G., SAP, M. and CHOI, Y. 2022. SODA: Million-scale Dialogue Distillation with Social Commonsense Contextualization (Version 3). arXiv. https://doi.org/10.48550/ARXIV.2212.10465 Přejít k původnímu zdroji...
KIM, H., YU, Y., JIANG, L., LU, X., KHASHABI, D., KIM, G., CHOI, Y. and SAP, M. 2022. ProsocialDialog: A Prosocial Backbone for Conversational Agents (Version 2). arXiv. https://doi.org/10.48550/ARXIV.2205.12688 Přejít k původnímu zdroji...
LAN, Z., CHEN, M., GOODMAN, S., GIMPEL, K., SHARMA, P. and SORICUT, R. 2019. ALBERT: A Lite BERT for Self-supervised Learning of Language Representations (Version 6). arXiv. https://doi.org/10.48550/ARXIV.1909.11942 Přejít k původnímu zdroji...
LEE, Y.-J., KO, B., KIM, H.-G. and CHOI, H.-J. 2022. DialogCC: Large-Scale Multi-Modal Dialogue Dataset (Version 1). arXiv. https://doi.org/10.48550/ARXIV.2212.04119 Přejít k původnímu zdroji...
LI, Q., LI, P., REN, Z., REN, P. and CHEN, Z. 2020. Knowledge Bridging for Empathetic Dialogue Generation (Version 3). arXiv. https://doi.org/10.48550/ARXIV.2009.09708 Přejít k původnímu zdroji...
LI, Y., SU, H., SHEN, X., LI, W., CAO, Z. and NIU, S. 2017. DailyDialog: A Manually Labelled Multi-turn Dialogue Dataset (Version 1). arXiv. https://doi.org/10.48550/ARXIV.1710.03957 Přejít k původnímu zdroji...
LIU, Y., OTT, M., GOYAL, N., DU, J., JOSHI, M., CHEN, D., LEVY, O., LEWIS, M., ZETTLEMOYER, L. and STOYANOV, V. 2019. RoBERTa: A Robustly Optimized BERT Pretraining Approach (Version 1). arXiv. https://doi.org/10.48550/ARXIV.1907.11692 Přejít k původnímu zdroji...
LOWE, R., POW, N., SERBAN, I. and PINEAU, J. 2015. The Ubuntu Dialogue Corpus: A Large Dataset for Research in Unstructured Multi-Turn Dialogue Systems (Version 3). arXiv. https://doi.org/10.48550/ARXIV.1506.08909 Přejít k původnímu zdroji...
NGUYEN-MAU, T., LE, A.-C., PHAM, D.-H. and HUYNH, V.-N. 2024. An information fusion based approach to context-based fine-tuning of GPT models. Information Fusion, 104, 102202. Elsevier BV. https://doi.org/10.1016/j.inffus.2023.102202 Přejít k původnímu zdroji...
OPENAI, ACHIAM, J., ADLER, S., AGARWAL, S., AHMAD, L., AKKAYA, I., ALEMAN, F. L., ALMEIDA, D., ALTENSCHMIDT, J., ALTMAN, S., ANADKAT, S., AVILA, R., BABUSCHKIN, I., BALAJI, S., BALCOM, V., BALTESCU, P., BAO, H., BAVARIAN, M. … ZOPH, B. 2023. GPT-4 Technical Report (Version 4). arXiv. https://doi.org/10.48550/ARXIV.2303.08774 Přejít k původnímu zdroji...
PAPINENI, K. et al. 2002. Bleu: a method for automatic evaluation of machine translation. In: Proceedings of the 40th annual meeting of the Association for Computational Linguistics, p. 311-318. Přejít k původnímu zdroji...
RADFORD, A., NARASIMHAN, K., SALIMANS, T. and SUTSKEVER, I. 2018. Improving language understanding by generative pre-training. https://cdn.openai.com/research-covers/language-unsupervised/language_understanding_paper.pdf
RADFORD, A., WU, J., CHILD, R., LUAN, D., AMODEI, D. and SUTSKEVER, I. 2019. Language Models are Unsupervised Multitask Learners. https://d4mucfpksywv.cloudfront.net/better-language-models/language_models_are_unsupervised_multitask_learners.pdf
RAFFEL, C., SHAZEER, N., ROBERTS, A., LEE, K., NARANG, S., MATENA, M., ZHOU, Y., LI, W. and LIU, P. J. 2019. Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer (Version 4). arXiv. https://doi.org/10.48550/ARXIV.1910.10683 Přejít k původnímu zdroji...
RAHALI, A. and AKHLOUFI, M. A. 2023. End-to-End Transformer-Based Models in Textual-Based NLP. AI, 4(1), 54-110. MDPI AG. https://doi.org/10.3390/ai4010004 Přejít k původnímu zdroji...
SAI, A. B., MOHANKUMAR, A. K. and KHAPRA, M. M. 2020. A Survey of Evaluation Metrics Used for NLG Systems (Version 2). arXiv. https://doi.org/10.48550/ARXIV.2008.12009 Přejít k původnímu zdroji...
SIVARAJKUMAR, S. and WANG, Y. 2022. HealthPrompt: A Zero-shot Learning Paradigm for Clinical Natural Language Processing (Version 1). arXiv. https://doi.org/10.48550/ARXIV.2203.05061 Přejít k původnímu zdroji...
STASTNY, J. and SKORPIL, V. 2007. Analysis of Algorithms for Radial Basis Function Neural Network. In: Personal Wireless Communications. Springer New York, vol. 245, pp. 54-62, ISSN 1571- 5736, ISBN 978-0-387-74158-1, WOS:000250717300005. Přejít k původnímu zdroji...
STASTNY, J., SKORPIL. V., BALOGH, Z. and KLEIN, R. 2021. Job shop scheduling problem optimization by means of graph-based algorithm. Applied Sciences, 11(4), 1921. ISSN 2076-3417. https://doi.org/10.3390/app11041921 Přejít k původnímu zdroji...
THOPPILAN, R., DE FREITAS, D., HALL, J., SHAZEER, N., KULSHRESHTHA, A., CHENG, H.-T., JIN, A., BOS, T., BAKER, L., DU, Y., LI, Y., LEE, H., ZHENG, H. S., GHAFOURI, A., MENEGALI, M., HUANG, Y., KRIKUN, M., LEPIKHIN, D., QIN, J. … and LE, Q. 2022. LaMDA: Language Models for Dialog Applications (Version 3). arXiv. https://doi.org/10.48550/ARXIV.2201.08239 Přejít k původnímu zdroji...
TUNSTALL, Lewis, WERRA, Lenadro von, WOLF, Thomas and GÉRON, Aurélien. 2022. Natural language processing with transformers: building language applications with Hugging Face. Revised edition. Beijing: O'Reilly. ISBN 978-1-098-13679-6
YANG, Z., DAI, Z., YANG, Y., CARBONELL, J., SALAKHUTDINOV, R. and LE, Q. V. 2019. XLNet: Generalized Autoregressive Pretraining for Language Understanding (Version 2). arXiv. https://doi.org/10.48550/ARXIV.1906.08237 Přejít k původnímu zdroji...
ZHANG, T., KISHORE, V., WU, F., WEINBERGER, K. Q. and ARTZI, Y. 2019. BERTScore: Evaluating Text Generation with BERT (Version 3). arXiv. https://doi.org/10.48550/ARXIV.1904.09675 Přejít k původnímu zdroji...
ZIEGLER, D. M., WU, J., WINTER, C., … and AMODEI, D. 2020. Language Models are Few-Shot Learners (Version 4). arXiv. https://doi.org/10.48550/ARXIV.2005.14165 Přejít k původnímu zdroji...