Repository at Hanyang University: Exploring the Impact of Corpus Diversity on Financial Pretrained Language Models

Browse

My Repository

Repository at Hanyang UniversityETC[S]연구정보

154 0

Full metadata record

DC Field	Value	Language
dc.contributor.author	정우환	-
dc.date.accessioned	2024-01-09T03:32:36Z	-
dc.date.available	2024-01-09T03:32:36Z	-
dc.date.issued	2023-12-10	-
dc.identifier.citation	Findings of the Association for Computational Linguistics	en_US
dc.identifier.uri	https://arxiv.org/abs/2310.13312	en_US
dc.identifier.uri	https://repository.hanyang.ac.kr/handle/20.500.11754/188097	-
dc.description.abstract	Over the past few years, various domain-specific pretrained language models (PLMs) have been proposed and have outperformed general-domain PLMs in specialized areas such as biomedical, scientific, and clinical domains. In addition, financial PLMs have been studied because of the high economic impact of financial data analysis. However, we found that financial PLMs were not pretrained on sufficiently diverse financial data. This lack of diverse training data leads to a subpar generalization performance, resulting in general-purpose PLMs, including BERT, often outperforming financial PLMs on many downstream tasks. To address this issue, we collected a broad range of financial corpus and trained the Financial Language Model (FiLM) on these diverse datasets. Our experimental results confirm that FiLM outperforms not only existing financial PLMs but also general domain PLMs. Furthermore, we provide empirical evidence that this improvement can be achieved even for unseen corpus groups.	en_US
dc.description.sponsorship	This work was supported by Institute of Information & communications Technology Planning & Evaluation(IITP) grant funded by the Korea government(MSIT) (No. RS-2023-00261068, Development of Lightweight Multimodal AntiPhishing Models and Split-Learning Techniques for Privacy-Preserving Anti-Phishing), (No.RS2022-00155885, Artificial Intelligence Convergence Innovation Human Resources Development (Hanyang University ERICA)), and (2018-0-00192, the National Program for Excellence in SW). This work was supported by the National Research Foundation of Korea(NRF) grant funded by the Korea government(MSIT) (No. NRF2022R1G1A1013549). Finally, we thank the reviewers for their detailed feedback, which helped to improve the quality of this paper	en_US
dc.language	en_US	en_US
dc.publisher	Association for Computational Linguistics	en_US
dc.relation.ispartofseries	EMNLP 2023;2101-2112	-
dc.subject	Computation and Language (cs.CL)	en_US
dc.title	Exploring the Impact of Corpus Diversity on Financial Pretrained Language Models	en_US
dc.type	Article	en_US
dc.identifier.doi	10.18653/v1/2023.findings-emnlp.138	en_US
dc.relation.page	2101-2112	-
dc.contributor.googleauthor	Choe, Jaeyoung	-
dc.contributor.googleauthor	Noh, Keonwoong	-
dc.contributor.googleauthor	Kim, Nayeon	-
dc.contributor.googleauthor	Ahn, Seyun	-
dc.contributor.googleauthor	Jung, Woohwan	-
dc.relation.code	20230059	-
dc.sector.campus	E	-
dc.sector.daehak	COLLEGE OF COMPUTING[E]	-
dc.sector.department	DEPARTMENT OF ARTIFICIAL INTELLIGENCE	-
dc.identifier.pid	whjung	-

Appears in Collections:: ETC[S] > 연구정보

Files in This Item:

Export: RIS (EndNote); XLS (Excel); XML

Show simple item record

한양대학교 리포지터리는 국립중앙도서관 OAK 보급사업으로 구축되었습니다. Feedback 개인정보처리방침

Hanyang University repository

Browse

My Repository

BROWSE