60 0

Full metadata record

DC FieldValueLanguage
dc.contributor.author윤석민-
dc.date.accessioned2024-06-07T00:25:49Z-
dc.date.available2024-06-07T00:25:49Z-
dc.date.issued2024-01-16-
dc.identifier.citationconference paper at ICLR 2024, Page. 1-19en_US
dc.identifier.urihttps://arxiv.org/abs/2404.10308en_US
dc.identifier.urihttps://repository.hanyang.ac.kr/handle/20.500.11754/190522-
dc.description.abstractLarge language models (LLMs) have shown remarkable performance in various natural language processing tasks. However, a primary constraint they face is the context limit, i.e., the maximum number of tokens they can process. Previous works have explored architectural changes and modifications in positional encoding to relax the constraint, but they often require expensive training or do not address the computational demands of self-attention. In this paper, we present Hierarchical cOntext MERging (HOMER), a new training-free scheme designed to overcome the limitations. HOMER uses a divide-and-conquer algorithm, dividing long inputs into manageable chunks. Each chunk is then processed collectively, employing a hierarchical strategy that merges adjacent chunks at progressive transformer layers. A token reduction technique precedes each merging, ensuring memory usage efficiency. We also propose an optimized computational order reducing the memory requirement to logarithmically scale with respect to input length, making it especially favorable for environments with tight memory restrictions. Our experiments demonstrate the proposed method's superior performance and memory efficiency, enabling the broader use of LLMs in contexts requiring extended context.en_US
dc.description.sponsorshipThis work was supported by Institute of Information & communications Technology Planning & Evaluation (IITP) grant funded by the Korea government (MSIT) (No.2019-0-00075, Artificial Intelligence Graduate School Program (KAIST); No.2021-0-02068, Artificial Intelligence Innovation Hub; No.2022-0-00959, Few-shot Learning of Casual Inference in Vision and Language for Decision Making).en_US
dc.languageen_USen_US
dc.publisherIEEE Information Theory Societyen_US
dc.relation.ispartofseries;1-19-
dc.subjectMachine Learning (cs.LG)en_US
dc.subjectArtificial Intelligence (cs.AI)en_US
dc.titleHierarchical Context Merging: Better Long Context Understanding for Pre-trained LLMsen_US
dc.typeWorking Paperen_US
dc.identifier.doi10.48550/arXiv.2404.10308en_US
dc.relation.page1-19-
dc.contributor.googleauthorSong, Woomin-
dc.contributor.googleauthorOh, Seunghyuk-
dc.contributor.googleauthorMo, Sangwoo-
dc.contributor.googleauthorKim, Jaehyung-
dc.contributor.googleauthorYun, Sukmin-
dc.contributor.googleauthorHa, Jung-Woo-
dc.contributor.googleauthorShin, Jinwoo-
dc.sector.campusE-
dc.sector.daehakCOLLEGE OF COMPUTING[E]-
dc.sector.departmentDEPARTMENT OF ARTIFICIAL INTELLIGENCE-
dc.identifier.pidsukminyun-
Appears in Collections:
ETC[S] > 연구정보
Files in This Item:
There are no files associated with this item.
Export
RIS (EndNote)
XLS (Excel)
XML


qrcode

Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.

BROWSE