
DOI: 10.11118/978-80-7509-990-7-0195
SURVEY OF LARGE LANGUAGE MODELS ON THE TEXT GENERATION TASK
- Michaela Veselá1, Oldĝich Trenz1
- 1 Department of Informatics, Faculty of Business and Economics, Mendel University in Brno, Zemìdìlská 1, 613 00 Brno, Czech Republic
This paper focuses on the comparison of GPT, GPT-2, XLNet, T5 models on text generation tasks. None of the autoencoder models are included in the comparison ranking due to their unsuitability for text generation tasks. The comparison of the models was performed using the BERT-score metric, which calculates precision, recall and F1 values for each sentence. The median was used to obtain the final results from this metric. A preprocessed dataset of empathetic dialogues was used to test the models, which is presented in this paper and compared with other datasets containing dialogues in English. The tested models were only pre-trained and there was no fine-tune on the dataset used for testing. The transformers library from Hugging face and the Python language were used to test the models. The research showed on the pre-trained dataset empathic dialogues has the highest precision model T5, recall and F1 has the highest precision model GPT-2.
Keywords: natural language processing, auto-regressive transformers, large-scale model, natural language generation, decoder transformer, auto-encoding transformers, sequence to sequence model
pages: 195-200, online: 2024
References
- BROWN, T. B., MANN, B., RYDER, N., SUBBIAH, M., KAPLAN, J., DHARIWAL, P., NEELAKANTAN, A., SHYAM, P., SASTRY, G., ASKELL, A., AGARWAL, S., HERBERT-VOSS, A., KRUEGER, G., HENIGHAN, T., CHILD, R., RAMESH, A., ZIEGLER, D. M., WU, J., WINTER, C. … and AMODEI, D. 2020. Language Models are Few-Shot Learners (Version 4). arXiv. https://doi.org/10.48550/ARXIV.2005.14165
Go to original source...
- CLARK, K., LUONG, M.-T., LE, Q. V. and MANNING, C. D. 2020. ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators (Version 1). arXiv. https://doi.org/10.48550/ARXIV.2003.10555
Go to original source...
- DEVLIN, J., CHANG, M.-W., LEE, K. and TOUTANOVA, K. 2018. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding (Version 2). arXiv. https://doi.org/10.48550/ARXIV.1810.04805
Go to original source...
- JANGIR, S. 2021. Finetuning BERT and XLNet for Sentiment Analysis of Stock Market Tweets using Mixout and Dropout Regularization. Technological University Dublin. https://doi.org/10.21427/K0YS-5B82
Go to original source...
- KHALIQ, Z., FAROOQ, S. U. and KHAN, D. A. 2022. A deep learning-based automated framework for functional User Interface testing. Information and Software Technology, 150, 106969. Elsevier BV. https://doi.org/10.1016/j.infsof.2022.106969
Go to original source...
- KIM, H., HESSEL, J., JIANG, L., WEST, P., LU, X., YU, Y., ZHOU, P., BRAS, R. L., ALIKHANI, M., KIM, G., SAP, M. and CHOI, Y. 2022. SODA: Million-scale Dialogue Distillation with Social Commonsense Contextualization (Version 3). arXiv. https://doi.org/10.48550/ARXIV.2212.10465
Go to original source...
- KIM, H., YU, Y., JIANG, L., LU, X., KHASHABI, D., KIM, G., CHOI, Y. and SAP, M. 2022. ProsocialDialog: A Prosocial Backbone for Conversational Agents (Version 2). arXiv. https://doi.org/10.48550/ARXIV.2205.12688
Go to original source...
- LAN, Z., CHEN, M., GOODMAN, S., GIMPEL, K., SHARMA, P. and SORICUT, R. 2019. ALBERT: A Lite BERT for Self-supervised Learning of Language Representations (Version 6). arXiv. https://doi.org/10.48550/ARXIV.1909.11942
Go to original source...
- LEE, Y.-J., KO, B., KIM, H.-G. and CHOI, H.-J. 2022. DialogCC: Large-Scale Multi-Modal Dialogue Dataset (Version 1). arXiv. https://doi.org/10.48550/ARXIV.2212.04119
Go to original source...
- LI, Q., LI, P., REN, Z., REN, P. and CHEN, Z. 2020. Knowledge Bridging for Empathetic Dialogue Generation (Version 3). arXiv. https://doi.org/10.48550/ARXIV.2009.09708
Go to original source...
- LI, Y., SU, H., SHEN, X., LI, W., CAO, Z. and NIU, S. 2017. DailyDialog: A Manually Labelled Multi-turn Dialogue Dataset (Version 1). arXiv. https://doi.org/10.48550/ARXIV.1710.03957
Go to original source...
- LIU, Y., OTT, M., GOYAL, N., DU, J., JOSHI, M., CHEN, D., LEVY, O., LEWIS, M., ZETTLEMOYER, L. and STOYANOV, V. 2019. RoBERTa: A Robustly Optimized BERT Pretraining Approach (Version 1). arXiv. https://doi.org/10.48550/ARXIV.1907.11692
Go to original source...
- LOWE, R., POW, N., SERBAN, I. and PINEAU, J. 2015. The Ubuntu Dialogue Corpus: A Large Dataset for Research in Unstructured Multi-Turn Dialogue Systems (Version 3). arXiv. https://doi.org/10.48550/ARXIV.1506.08909
Go to original source...
- NGUYEN-MAU, T., LE, A.-C., PHAM, D.-H. and HUYNH, V.-N. 2024. An information fusion based approach to context-based fine-tuning of GPT models. Information Fusion, 104, 102202. Elsevier BV. https://doi.org/10.1016/j.inffus.2023.102202
Go to original source...
- OPENAI, ACHIAM, J., ADLER, S., AGARWAL, S., AHMAD, L., AKKAYA, I., ALEMAN, F. L., ALMEIDA, D., ALTENSCHMIDT, J., ALTMAN, S., ANADKAT, S., AVILA, R., BABUSCHKIN, I., BALAJI, S., BALCOM, V., BALTESCU, P., BAO, H., BAVARIAN, M. … ZOPH, B. 2023. GPT-4 Technical Report (Version 4). arXiv. https://doi.org/10.48550/ARXIV.2303.08774
Go to original source...
- PAPINENI, K. et al. 2002. Bleu: a method for automatic evaluation of machine translation. In: Proceedings of the 40th annual meeting of the Association for Computational Linguistics, p. 311-318.
Go to original source...
- RADFORD, A., NARASIMHAN, K., SALIMANS, T. and SUTSKEVER, I. 2018. Improving language understanding by generative pre-training. https://cdn.openai.com/research-covers/language-unsupervised/language_understanding_paper.pdf
- RADFORD, A., WU, J., CHILD, R., LUAN, D., AMODEI, D. and SUTSKEVER, I. 2019. Language Models are Unsupervised Multitask Learners. https://d4mucfpksywv.cloudfront.net/better-language-models/language_models_are_unsupervised_multitask_learners.pdf
- RAFFEL, C., SHAZEER, N., ROBERTS, A., LEE, K., NARANG, S., MATENA, M., ZHOU, Y., LI, W. and LIU, P. J. 2019. Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer (Version 4). arXiv. https://doi.org/10.48550/ARXIV.1910.10683
Go to original source...
- RAHALI, A. and AKHLOUFI, M. A. 2023. End-to-End Transformer-Based Models in Textual-Based NLP. AI, 4(1), 54-110. MDPI AG. https://doi.org/10.3390/ai4010004
Go to original source...
- SAI, A. B., MOHANKUMAR, A. K. and KHAPRA, M. M. 2020. A Survey of Evaluation Metrics Used for NLG Systems (Version 2). arXiv. https://doi.org/10.48550/ARXIV.2008.12009
Go to original source...
- SIVARAJKUMAR, S. and WANG, Y. 2022. HealthPrompt: A Zero-shot Learning Paradigm for Clinical Natural Language Processing (Version 1). arXiv. https://doi.org/10.48550/ARXIV.2203.05061
Go to original source...
- STASTNY, J. and SKORPIL, V. 2007. Analysis of Algorithms for Radial Basis Function Neural Network. In: Personal Wireless Communications. Springer New York, vol. 245, pp. 54-62, ISSN 1571- 5736, ISBN 978-0-387-74158-1, WOS:000250717300005.
Go to original source...
- STASTNY, J., SKORPIL. V., BALOGH, Z. and KLEIN, R. 2021. Job shop scheduling problem optimization by means of graph-based algorithm. Applied Sciences, 11(4), 1921. ISSN 2076-3417. https://doi.org/10.3390/app11041921
Go to original source...
- THOPPILAN, R., DE FREITAS, D., HALL, J., SHAZEER, N., KULSHRESHTHA, A., CHENG, H.-T., JIN, A., BOS, T., BAKER, L., DU, Y., LI, Y., LEE, H., ZHENG, H. S., GHAFOURI, A., MENEGALI, M., HUANG, Y., KRIKUN, M., LEPIKHIN, D., QIN, J. … and LE, Q. 2022. LaMDA: Language Models for Dialog Applications (Version 3). arXiv. https://doi.org/10.48550/ARXIV.2201.08239
Go to original source...
- TUNSTALL, Lewis, WERRA, Lenadro von, WOLF, Thomas and GÉRON, Aurélien. 2022. Natural language processing with transformers: building language applications with Hugging Face. Revised edition. Beijing: O'Reilly. ISBN 978-1-098-13679-6
- YANG, Z., DAI, Z., YANG, Y., CARBONELL, J., SALAKHUTDINOV, R. and LE, Q. V. 2019. XLNet: Generalized Autoregressive Pretraining for Language Understanding (Version 2). arXiv. https://doi.org/10.48550/ARXIV.1906.08237
Go to original source...
- ZHANG, T., KISHORE, V., WU, F., WEINBERGER, K. Q. and ARTZI, Y. 2019. BERTScore: Evaluating Text Generation with BERT (Version 3). arXiv. https://doi.org/10.48550/ARXIV.1904.09675
Go to original source...
- ZIEGLER, D. M., WU, J., WINTER, C., … and AMODEI, D. 2020. Language Models are Few-Shot Learners (Version 4). arXiv. https://doi.org/10.48550/ARXIV.2005.14165
Go to original source...