263 0

Efficient Features for Function Matching in Multi-Architecture Binary Executables

Title
Efficient Features for Function Matching in Multi-Architecture Binary Executables
Author
오희국
Keywords
Binary diffing; efficient features; function matching; multi-architecture; Electrical engineering. Electronics. Nuclear engineering; TK1-9971
Issue Date
2021-07
Publisher
IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
Citation
IEEE ACCESS, v. 9, Page. 104950-104968
Abstract
Binary-binary function matching problem serves as a plinth in many reverse engineering techniques such as binary diffing, malware analysis, and code plagiarism detection. In literature, function matching is performed by first extracting function features (syntactic and semantic), and later these features are used as selection criteria to formulate an approximate 1:1 correspondence between binary functions. The accuracy of the approximation is dependent on the selection of efficient features. Although substantial research has been conducted on this topic, we have explored two major drawbacks in previous research. (i) The features are optimized only for a single architecture and their matching efficiency drops for other architectures. (ii) function matching algorithms mainly focus on the structural properties of a function, which are not inherently resilient against compiler optimizations. To resolve the architecture dependency and compiler optimizations, we benefit from the intermediate representation (IR) of function assembly and propose a set of syntactic and semantic (embedding-based) features which are efficient for multiarchitectures, and sensitive to compiler-based optimizations. The proposed function matching algorithm employs one-shot encoding that is flexible to small changes and uses a KNN based approach to effectively map similar functions. We have evaluated proposed features and algorithms using various binaries, which were compiled for x86 and ARM architectures; and the prototype implementation is compared with Diaphora (an industry-standard tool), and other baseline research. Our proposed prototype has achieved a matching accuracy of approx. 96%, which is higher than the compared tools and consistent against optimizations and multi-architecture binaries.
URI
https://doaj.org/article/0a39ed9a30004392bf7ee183c727ba24https://repository.hanyang.ac.kr/handle/20.500.11754/172495
ISSN
2169-3536
DOI
10.1109/ACCESS.2021.3099429
Appears in Collections:
ETC[S] > 연구정보
Files in This Item:
There are no files associated with this item.
Export
RIS (EndNote)
XLS (Excel)
XML


qrcode

Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.

BROWSE