Efficient Features for Function Matching in Multi-Architecture Binary Executables
- Title
- Efficient Features for Function Matching in Multi-Architecture Binary Executables
- Author
- 오희국
- Keywords
- Binary diffing; efficient features; function matching; multi-architecture; Electrical engineering. Electronics. Nuclear engineering; TK1-9971
- Issue Date
- 2021-07
- Publisher
- IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
- Citation
- IEEE ACCESS, v. 9, Page. 104950-104968
- Abstract
- Binary-binary function matching problem serves as a plinth in many reverse engineering
techniques such as binary diffing, malware analysis, and code plagiarism detection. In literature, function
matching is performed by first extracting function features (syntactic and semantic), and later these features
are used as selection criteria to formulate an approximate 1:1 correspondence between binary functions.
The accuracy of the approximation is dependent on the selection of efficient features. Although substantial
research has been conducted on this topic, we have explored two major drawbacks in previous research.
(i) The features are optimized only for a single architecture and their matching efficiency drops for other
architectures. (ii) function matching algorithms mainly focus on the structural properties of a function,
which are not inherently resilient against compiler optimizations. To resolve the architecture dependency
and compiler optimizations, we benefit from the intermediate representation (IR) of function assembly
and propose a set of syntactic and semantic (embedding-based) features which are efficient for multiarchitectures, and sensitive to compiler-based optimizations. The proposed function matching algorithm
employs one-shot encoding that is flexible to small changes and uses a KNN based approach to effectively
map similar functions. We have evaluated proposed features and algorithms using various binaries, which
were compiled for x86 and ARM architectures; and the prototype implementation is compared with Diaphora
(an industry-standard tool), and other baseline research. Our proposed prototype has achieved a matching
accuracy of approx. 96%, which is higher than the compared tools and consistent against optimizations and
multi-architecture binaries.
- URI
- https://doaj.org/article/0a39ed9a30004392bf7ee183c727ba24https://repository.hanyang.ac.kr/handle/20.500.11754/172495
- ISSN
- 2169-3536
- DOI
- 10.1109/ACCESS.2021.3099429
- Appears in Collections:
- ETC[S] > 연구정보
- Files in This Item:
There are no files associated with this item.
- Export
- RIS (EndNote)
- XLS (Excel)
- XML