298 0

Sensitive Data Identification in Structured Data through GenNER Model based on Text Generation and NER

Title
Sensitive Data Identification in Structured Data through GenNER Model based on Text Generation and NER
Author
이동호
Issue Date
2020-04
Publisher
ACM, New York, United States
Citation
Computing, Networks and Internet of Things, Page. 36-40
Abstract
A Lot of documents in many organizations from companies to governments are shared on on-premise storage or clouds. And some of those documents may contain sensitive information such as names, social security numbers, addresses and so on. Especially a large amount of sensitive information written in Korean have been leaked nowadays. It can be severe problems to not only individuals but also many organizations. Therefore, for information protection, data loss prevention (DLP) has been needed. DLP systems based on pattern matching were popular in the past. But they have a difficulty handling new type of sensitive data whenever they come. To handle this problem, sensitive data identification with NER is proposed as a useful method of DLP system. By using NER, we can classify the words in a document into categories which consist of name, location and so on. These categories are considered as sensitive information. This approach shows good performance identifying information in unstructured data(e.g. sentences) which have contextual information whereas it has a weakness identifying sensitive information in structured data (e.g. personal names in cells of the table). Actually, a large amount of sensitive information is organized in structured data and the form of structured data varies depending on the document. Furthermore, it also has difficulties identifying data written in Korean because of its characteristics. We proposed a primary preventive measure of DLP by identifying sensitive data in tables of Korean documents combining text generation and NER models regardless of the form of tables and masking them as to share documents without disclosing sensitive information.
URI
https://dl.acm.org/doi/abs/10.1145/3398329.3398335?https://repository.hanyang.ac.kr/handle/20.500.11754/164484
DOI
10.1145/3398329.3398335
Appears in Collections:
ETC[S] > 연구정보
Files in This Item:
There are no files associated with this item.
Export
RIS (EndNote)
XLS (Excel)
XML


qrcode

Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.

BROWSE