박희진
2019-04-12T05:34:06Z
2019-04-12T05:34:06Z
2016-12
THEORETICAL COMPUTER SCIENCE, v. 656, NO. 20, Page. 173-179
0304-3975
1879-2294
https://www.sciencedirect.com/science/article/pii/S0304397516300676?via%3Dihub
https://repository.hanyang.ac.kr/handle/20.500.11754/101819
Given two strings X (|X| = m) and Y (|Y| = n) over an alphabet Sigma, the edit distance between X and Y can be computed in 0 (mn/t) time with the help of the Four Russians' lookup table whose block size is t. The Four-Russians' lookup table can be constructed in O ((3|Sigma|)(2t)t(2)) time using O((3|Sigma|(2t)t) space. However, the construction time and space requirement of the lookup table grow very fast as the alphabet size increases and thus it has been used only when |Sigma| is very small. For example, when a string is a protein sequence, |Sigma| = 20 and thus it is almost impossible to use the Four-Russians' lookup table on typical workstations. In this paper, we present an efficient alphabet-independent Four-Russians' lookup table. It requires O (3(2t)(2t)!t) space and can be constructed in O (3(2t)(2t)!t(2)) time. Thus, the Four-Russians' lookup table can be constructed and used irrespective of the alphabet size. The time and space complexity were achieved by compacting the lookup table using a clever encoding of the preprocessed strings. Experimental results show that the space requirement of the lookup table is reduced to about 1/5,172,030 of its original size when |Sigma| = 26 and t = 4. Furthermore, we present efficient multithreaded parallel algorithms for edit distance computation using the Four Russians' lookup table. The parallel algorithm for lookup table construction runs in O(t) time and the parallel algorithm for edit distance computation between X and Y runs in O (m + n) time. Experiments performed on CUDA-supported GPU show that our algorithm runs about 942 times faster than the sequential version of the original Four-Russians' algorithm for 100 pairs of random strings of length approximately 1,000 when |Sigma| = 4 and t = 4. (C) 2016 Elsevier B.V. All rights reserved.
Joong Chae Na was supported by Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Science, ICT & Future Planning (2014R1A1A1004901). Heejin Park was supported by the research fund of Signal Intelligence Research Center supervised by Defense Acquisition Program Administration and Agency for Defense Development of Korea. Jeong Seop Sim was supported by the National Research Foundation of Korea (NRF) grant funded by the Korean government (MSIP) (No. 2014R1A2A1A11050337) and by the Genome Program for Fostering New Post-Genome industry of the National Research Foundation (NRF) & funded by the Korean government (MSIP) (NRF-2014M3C9A3064706).
en
ELSEVIER SCIENCE BV
Approximate string matching
Edit distance
Four-Russians' algorithm
Parallelization
A space-efficient alphabet-independent Four-Russians' lookup table and a multithreaded Four-Russians' edit distance algorithm
Article
20
656 Part B
10.1016/j.tcs.2016.04.028
173-179
THEORETICAL COMPUTER SCIENCE
Kim, Youngho
Na, Joong Chae
Park, Heejin
Sim, Jeong Seop
2016002626
S
COLLEGE OF ENGINEERING[S]
DEPARTMENT OF COMPUTER SCIENCE
hjpark