Repository at Hanyang University: Hierarchical Context Merging: Better Long Context Understanding for Pre-trained LLMs

Browse

My Repository

Repository at Hanyang UniversityETC[S]연구정보

60 0

Full metadata record

DC Field	Value	Language
dc.contributor.author	윤석민	-
dc.date.accessioned	2024-06-07T00:25:49Z	-
dc.date.available	2024-06-07T00:25:49Z	-
dc.date.issued	2024-01-16	-
dc.identifier.citation	conference paper at ICLR 2024, Page. 1-19	en_US
dc.identifier.uri	https://arxiv.org/abs/2404.10308	en_US
dc.identifier.uri	https://repository.hanyang.ac.kr/handle/20.500.11754/190522	-
dc.description.abstract	Large language models (LLMs) have shown remarkable performance in various natural language processing tasks. However, a primary constraint they face is the context limit, i.e., the maximum number of tokens they can process. Previous works have explored architectural changes and modifications in positional encoding to relax the constraint, but they often require expensive training or do not address the computational demands of self-attention. In this paper, we present Hierarchical cOntext MERging (HOMER), a new training-free scheme designed to overcome the limitations. HOMER uses a divide-and-conquer algorithm, dividing long inputs into manageable chunks. Each chunk is then processed collectively, employing a hierarchical strategy that merges adjacent chunks at progressive transformer layers. A token reduction technique precedes each merging, ensuring memory usage efficiency. We also propose an optimized computational order reducing the memory requirement to logarithmically scale with respect to input length, making it especially favorable for environments with tight memory restrictions. Our experiments demonstrate the proposed method's superior performance and memory efficiency, enabling the broader use of LLMs in contexts requiring extended context.	en_US
dc.description.sponsorship	This work was supported by Institute of Information & communications Technology Planning & Evaluation (IITP) grant funded by the Korea government (MSIT) (No.2019-0-00075, Artificial Intelligence Graduate School Program (KAIST); No.2021-0-02068, Artificial Intelligence Innovation Hub; No.2022-0-00959, Few-shot Learning of Casual Inference in Vision and Language for Decision Making).	en_US
dc.language	en_US	en_US
dc.publisher	IEEE Information Theory Society	en_US
dc.relation.ispartofseries	;1-19	-
dc.subject	Machine Learning (cs.LG)	en_US
dc.subject	Artificial Intelligence (cs.AI)	en_US
dc.title	Hierarchical Context Merging: Better Long Context Understanding for Pre-trained LLMs	en_US
dc.type	Working Paper	en_US
dc.identifier.doi	10.48550/arXiv.2404.10308	en_US
dc.relation.page	1-19	-
dc.contributor.googleauthor	Song, Woomin	-
dc.contributor.googleauthor	Oh, Seunghyuk	-
dc.contributor.googleauthor	Mo, Sangwoo	-
dc.contributor.googleauthor	Kim, Jaehyung	-
dc.contributor.googleauthor	Yun, Sukmin	-
dc.contributor.googleauthor	Ha, Jung-Woo	-
dc.contributor.googleauthor	Shin, Jinwoo	-
dc.sector.campus	E	-
dc.sector.daehak	COLLEGE OF COMPUTING[E]	-
dc.sector.department	DEPARTMENT OF ARTIFICIAL INTELLIGENCE	-
dc.identifier.pid	sukminyun	-

Appears in Collections:: ETC[S] > 연구정보

Files in This Item:

Export: RIS (EndNote); XLS (Excel); XML

Show simple item record

한양대학교 리포지터리는 국립중앙도서관 OAK 보급사업으로 구축되었습니다. Feedback 개인정보처리방침

Hanyang University repository

Browse

My Repository

BROWSE