Repository at Hanyang University: 사용자 인터페이스 에이전트를 통한 정보추출 규칙의 자동 생성

Browse

My Repository

Repository at Hanyang UniversityCOLLEGE OF ENGINEERING SCIENCES[E](공학대학)COMPUTER SCIENCE AND ENGINEERING(컴퓨터공학과)Articles

335 0

Full metadata record

DC Field	Value	Language
dc.contributor.author	최중민	-
dc.date.accessioned	2020-03-23T06:32:01Z	-
dc.date.available	2020-03-23T06:32:01Z	-
dc.date.issued	2004-04	-
dc.identifier.citation	정보과학회논문지 : 소프트웨어 및 응용, v. 31, No. 4, Page. 447-456	en_US
dc.identifier.issn	1229-6848	-
dc.identifier.uri	http://www.dbpia.co.kr/journal/articleDetail?nodeId=NODE00617631&language=ko_KR	-
dc.identifier.uri	https://repository.hanyang.ac.kr/handle/20.500.11754/139365	-
dc.description.abstract	정보추출은 한 문서에서 그 문서의 중심적 의미를 나타내는 특정 구성요소를 인식하여 추출하는 작업으로서, 이질적인 여러 정보소스로부터 균일화된 정보추출을 수행하기 위해서는 각 정보소스에 맞는 정보추출 규칙을 생성해야 한다. 기존 정보추출 규칙의 생성 방법에는 전문가에 의한 수동 생성 방법과 에이전트 프로그램에 의한 자동 생성 방법이 있는데, 수동 생성은 규칙의 정확성은 보장되나 확장성과 효율성에 문제가 있고, 자동 생성은 확장성은 있으나 규칙 생성 자체의 어려움과 생성된 규칙의 신뢰성이 문제점으로 대두된다.본 논문에서는 이러한 두 가지 방법의 문제점을 보완하여 추출 규칙의 정확성과 확장성을 동시에 제공하기 위해 지도 학습(supervised learning)을 적용한 정보추출 규칙 생성 기법을 제안한다. 본 논문에서 제시하는 방법은 사용자 인터페이스 에이전트를 사용하여 정보추출 규칙 생성을 위한 단서 정보를 사용자로부터 받고 이 정보를 바탕으로 에이전트가 XML로 표현된 규칙을 생성하는 것이다. 결과적으로 정보추출 규칙의 수동 생성과 자동 생성을 혼합한 형태가 된다. 사용자 인터페이스 에이전트는 규칙의 생성 뿐 아니라 기존의 규칙을 수정하거나 확장하는데도 이용된다. 구인 광고와 논문모집 공고와 관련된 정보소스에 대해 이 방법을 테스트한 결과 다른 기법에서 추출하지 못했던 정보를 추출할 수 있었고, 성능 면에서도 80% 이상의 정확도와 재현율을 보였다. 본 시스템은 추후 정보 중재자 에이전트와 같은 응용 분야에 적용시킬 수 있을 것으로 기대한다. Information extraction is a process of recognizing and fetching particular information fragments from a document. In order to extract information uniformly from many heterogeneous information sources, it is necessary to produce information extraction rules called a wrapper for each source. Previous methods of information extraction can be categorized into manual wrapper generation and automatic wrapper generation. In the manual method, since the wrapper is manually generated by a human expert who analyzes documents and writes rules, the precision of the wrapper is very high whereas it reveals problems in scalability and efficiency. In the automatic method, the agent program analyzes a set of example documents and produces a wrapper through learning. Although it is very scalable, this method has difficulty in generating correct rules per se, and also the generated rules are sometimes unreliable. This paper tries to combine both manual and automatic methods by proposing a new method of learning information extraction rules. We adopt the scheme of supervised learning in which a user-interface agent is designed to get information from the user regarding what to extract from a document, and eventually XML-based information extraction rules are generated through learning according to these inputs. The interface agent is used not only to generate new extraction rules but also to modify and extend existing ones to enhance the precision and the recall measures of the extraction system. We have done a series of experiments to test the system, and the results are very promising. We hope that our system can be applied to practical systems such as information-mediator agents.	en_US
dc.language.iso	ko_KR	en_US
dc.publisher	한국정보과학회	en_US
dc.subject	인터페이스 에이전트	en_US
dc.subject	정보추출	en_US
dc.subject	기계학습	en_US
dc.subject	interface agent	en_US
dc.subject	information extraction	en_US
dc.subject	machine learning	en_US
dc.title	사용자 인터페이스 에이전트를 통한 정보추출 규칙의 자동 생성	en_US
dc.title.alternative	Automatic Generation of Information Extraction Rules Through User - interface Agents	en_US
dc.type	Article	en_US
dc.relation.journal	정보과학회논문지	-
dc.contributor.googleauthor	김용기	-
dc.contributor.googleauthor	양재영	-
dc.contributor.googleauthor	최중민	-
dc.relation.code	2012210811	-
dc.sector.campus	E	-
dc.sector.daehak	COLLEGE OF ENGINEERING SCIENCES[E]	-
dc.sector.department	DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING	-
dc.identifier.pid	jmchoi	-

Appears in Collections:: COLLEGE OF ENGINEERING SCIENCES[E](공학대학) > COMPUTER SCIENCE AND ENGINEERING(컴퓨터공학과) > Articles

Files in This Item:

Export: RIS (EndNote); XLS (Excel); XML

Show simple item record

한양대학교 리포지터리는 국립중앙도서관 OAK 보급사업으로 구축되었습니다. Feedback 개인정보처리방침

Hanyang University repository

Browse

My Repository

BROWSE