
DOI: 10.11118/978-80-7509-990-7-0195
SURVEY OF LARGE LANGUAGE MODELS ON THE TEXT GENERATION TASK
- Michaela Veselá1, Oldřich Trenz1
- 1 Department of Informatics, Faculty of Business and Economics, Mendel University in Brno, Zemědělská 1, 613 00 Brno, Czech Republic
This paper focuses on the comparison of GPT, GPT-2, XLNet, T5 models on text generation tasks. None of the autoencoder models are included in the comparison ranking due to their unsuitability for text generation tasks. The comparison of the models was performed using the BERT-score metric, which calculates precision, recall and F1 values for each sentence. The median was used to obtain the final results from this metric. A preprocessed dataset of empathetic dialogues was used to test the models, which is presented in this paper and compared with other datasets containing dialogues in English. The tested models were only pre-trained and there was no fine-tune on the dataset used for testing. The transformers library from Hugging face and the Python language were used to test the models. The research showed on the pre-trained dataset empathic dialogues has the highest precision model T5, recall and F1 has the highest precision model GPT-2.
Klíčová slova: natural language processing, auto-regressive transformers, large-scale model, natural language generation, decoder transformer, auto-encoding transformers, sequence to sequence model
stránky: 195-200, online: 2024
Reference
- BROWN, T. B., MANN, B., RYDER, N., SUBBIAH, M., KAPLAN, J., DHARIWAL, P., NEELAKANTAN, A., SHYAM, P., SASTRY, G., ASKELL, A., AGARWAL, S., HERBERT-VOSS, A., KRUEGER, G., HENIGHAN, T., CHILD, R., RAMESH, A., ZIEGLER, D. M., WU, J., WINTER, C. … and AMODEI, D. 2020. Language Models are Few-Shot Learners (Version 4). arXiv. https://doi.org/10.48550/ARXIV.2005.14165
Přejít k původnímu zdroji...
- CLARK, K., LUONG, M.-T., LE, Q. V. and MANNING, C. D. 2020. ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators (Version 1). arXiv. https://doi.org/10.48550/ARXIV.2003.10555
Přejít k původnímu zdroji...
- DEVLIN, J., CHANG, M.-W., LEE, K. and TOUTANOVA, K. 2018. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding (Version 2). arXiv. https://doi.org/10.48550/ARXIV.1810.04805
Přejít k původnímu zdroji...
- JANGIR, S. 2021. Finetuning BERT and XLNet for Sentiment Analysis of Stock Market Tweets using Mixout and Dropout Regularization. Technological University Dublin. https://doi.org/10.21427/K0YS-5B82
Přejít k původnímu zdroji...
- KHALIQ, Z., FAROOQ, S. U. and KHAN, D. A. 2022. A deep learning-based automated framework for functional User Interface testing. Information and Software Technology, 150, 106969. Elsevier BV. https://doi.org/10.1016/j.infsof.2022.106969
Přejít k původnímu zdroji...
- KIM, H., HESSEL, J., JIANG, L., WEST, P., LU, X., YU, Y., ZHOU, P., BRAS, R. L., ALIKHANI, M., KIM, G., SAP, M. and CHOI, Y. 2022. SODA: Million-scale Dialogue Distillation with Social Commonsense Contextualization (Version 3). arXiv. https://doi.org/10.48550/ARXIV.2212.10465
Přejít k původnímu zdroji...
- KIM, H., YU, Y., JIANG, L., LU, X., KHASHABI, D., KIM, G., CHOI, Y. and SAP, M. 2022. ProsocialDialog: A Prosocial Backbone for Conversational Agents (Version 2). arXiv. https://doi.org/10.48550/ARXIV.2205.12688
Přejít k původnímu zdroji...
- LAN, Z., CHEN, M., GOODMAN, S., GIMPEL, K., SHARMA, P. and SORICUT, R. 2019. ALBERT: A Lite BERT for Self-supervised Learning of Language Representations (Version 6). arXiv. https://doi.org/10.48550/ARXIV.1909.11942
Přejít k původnímu zdroji...
- LEE, Y.-J., KO, B., KIM, H.-G. and CHOI, H.-J. 2022. DialogCC: Large-Scale Multi-Modal Dialogue Dataset (Version 1). arXiv. https://doi.org/10.48550/ARXIV.2212.04119
Přejít k původnímu zdroji...
- LI, Q., LI, P., REN, Z., REN, P. and CHEN, Z. 2020. Knowledge Bridging for Empathetic Dialogue Generation (Version 3). arXiv. https://doi.org/10.48550/ARXIV.2009.09708
Přejít k původnímu zdroji...
- LI, Y., SU, H., SHEN, X., LI, W., CAO, Z. and NIU, S. 2017. DailyDialog: A Manually Labelled Multi-turn Dialogue Dataset (Version 1). arXiv. https://doi.org/10.48550/ARXIV.1710.03957
Přejít k původnímu zdroji...
- LIU, Y., OTT, M., GOYAL, N., DU, J., JOSHI, M., CHEN, D., LEVY, O., LEWIS, M., ZETTLEMOYER, L. and STOYANOV, V. 2019. RoBERTa: A Robustly Optimized BERT Pretraining Approach (Version 1). arXiv. https://doi.org/10.48550/ARXIV.1907.11692
Přejít k původnímu zdroji...
- LOWE, R., POW, N., SERBAN, I. and PINEAU, J. 2015. The Ubuntu Dialogue Corpus: A Large Dataset for Research in Unstructured Multi-Turn Dialogue Systems (Version 3). arXiv. https://doi.org/10.48550/ARXIV.1506.08909
Přejít k původnímu zdroji...
- NGUYEN-MAU, T., LE, A.-C., PHAM, D.-H. and HUYNH, V.-N. 2024. An information fusion based approach to context-based fine-tuning of GPT models. Information Fusion, 104, 102202. Elsevier BV. https://doi.org/10.1016/j.inffus.2023.102202
Přejít k původnímu zdroji...
- OPENAI, ACHIAM, J., ADLER, S., AGARWAL, S., AHMAD, L., AKKAYA, I., ALEMAN, F. L., ALMEIDA, D., ALTENSCHMIDT, J., ALTMAN, S., ANADKAT, S., AVILA, R., BABUSCHKIN, I., BALAJI, S., BALCOM, V., BALTESCU, P., BAO, H., BAVARIAN, M. … ZOPH, B. 2023. GPT-4 Technical Report (Version 4). arXiv. https://doi.org/10.48550/ARXIV.2303.08774
Přejít k původnímu zdroji...
- PAPINENI, K. et al. 2002. Bleu: a method for automatic evaluation of machine translation. In: Proceedings of the 40th annual meeting of the Association for Computational Linguistics, p. 311-318.
Přejít k původnímu zdroji...
- RADFORD, A., NARASIMHAN, K., SALIMANS, T. and SUTSKEVER, I. 2018. Improving language understanding by generative pre-training. https://cdn.openai.com/research-covers/language-unsupervised/language_understanding_paper.pdf
- RADFORD, A., WU, J., CHILD, R., LUAN, D., AMODEI, D. and SUTSKEVER, I. 2019. Language Models are Unsupervised Multitask Learners. https://d4mucfpksywv.cloudfront.net/better-language-models/language_models_are_unsupervised_multitask_learners.pdf
- RAFFEL, C., SHAZEER, N., ROBERTS, A., LEE, K., NARANG, S., MATENA, M., ZHOU, Y., LI, W. and LIU, P. J. 2019. Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer (Version 4). arXiv. https://doi.org/10.48550/ARXIV.1910.10683
Přejít k původnímu zdroji...
- RAHALI, A. and AKHLOUFI, M. A. 2023. End-to-End Transformer-Based Models in Textual-Based NLP. AI, 4(1), 54-110. MDPI AG. https://doi.org/10.3390/ai4010004
Přejít k původnímu zdroji...
- SAI, A. B., MOHANKUMAR, A. K. and KHAPRA, M. M. 2020. A Survey of Evaluation Metrics Used for NLG Systems (Version 2). arXiv. https://doi.org/10.48550/ARXIV.2008.12009
Přejít k původnímu zdroji...
- SIVARAJKUMAR, S. and WANG, Y. 2022. HealthPrompt: A Zero-shot Learning Paradigm for Clinical Natural Language Processing (Version 1). arXiv. https://doi.org/10.48550/ARXIV.2203.05061
Přejít k původnímu zdroji...
- STASTNY, J. and SKORPIL, V. 2007. Analysis of Algorithms for Radial Basis Function Neural Network. In: Personal Wireless Communications. Springer New York, vol. 245, pp. 54-62, ISSN 1571- 5736, ISBN 978-0-387-74158-1, WOS:000250717300005.
Přejít k původnímu zdroji...
- STASTNY, J., SKORPIL. V., BALOGH, Z. and KLEIN, R. 2021. Job shop scheduling problem optimization by means of graph-based algorithm. Applied Sciences, 11(4), 1921. ISSN 2076-3417. https://doi.org/10.3390/app11041921
Přejít k původnímu zdroji...
- THOPPILAN, R., DE FREITAS, D., HALL, J., SHAZEER, N., KULSHRESHTHA, A., CHENG, H.-T., JIN, A., BOS, T., BAKER, L., DU, Y., LI, Y., LEE, H., ZHENG, H. S., GHAFOURI, A., MENEGALI, M., HUANG, Y., KRIKUN, M., LEPIKHIN, D., QIN, J. … and LE, Q. 2022. LaMDA: Language Models for Dialog Applications (Version 3). arXiv. https://doi.org/10.48550/ARXIV.2201.08239
Přejít k původnímu zdroji...
- TUNSTALL, Lewis, WERRA, Lenadro von, WOLF, Thomas and GÉRON, Aurélien. 2022. Natural language processing with transformers: building language applications with Hugging Face. Revised edition. Beijing: O'Reilly. ISBN 978-1-098-13679-6
- YANG, Z., DAI, Z., YANG, Y., CARBONELL, J., SALAKHUTDINOV, R. and LE, Q. V. 2019. XLNet: Generalized Autoregressive Pretraining for Language Understanding (Version 2). arXiv. https://doi.org/10.48550/ARXIV.1906.08237
Přejít k původnímu zdroji...
- ZHANG, T., KISHORE, V., WU, F., WEINBERGER, K. Q. and ARTZI, Y. 2019. BERTScore: Evaluating Text Generation with BERT (Version 3). arXiv. https://doi.org/10.48550/ARXIV.1904.09675
Přejít k původnímu zdroji...
- ZIEGLER, D. M., WU, J., WINTER, C., … and AMODEI, D. 2020. Language Models are Few-Shot Learners (Version 4). arXiv. https://doi.org/10.48550/ARXIV.2005.14165
Přejít k původnímu zdroji...