Over the last several decades, visual tracking of 3D objects has steadily been investigated in computer vision. Recently, it has widely been used in many applications like augmented reality and human-computer interaction. However, achieving good performance is still critical in visual tracking problems. In this dissertation, methods for visual tracking of textureless 3D rigid objects in monocular RGB camera views are proposed. Real-time constraints and tracking robustness are two challenges which are addressed here. Technical contributions are demonstrated on three cases where the two challenges are often conflicted as a trade-off.
Firstly, an efficient visual tracking framework is presented in complex scenes with a variety of 3D objects where their precise and delicate modeling is not trivial. In this framework, 3D target objects are partially modeled instead of complete modeling. A target scene is also sparsely reconstructed using its texture information. For estimating camera poses, these prior knowledge is efficiently integrated in model-based tracking. In this work, the proposed fusion approach allows reducing ambiguity of camera poses; handling partial occlusions; and initializing or recovering camera poses.
Secondly, real-time visual tracking of textureless 3D objects is explored on mobile platforms where hardware resources are insufficient to fully support real-time performance. In the proposed framework, redundant processing in edge-based tracking is thoroughly analyzed for computational efficiency. A background's distinctive geometric or photometric knowledge is efficiently exploited for reliable tracking. In this work, the visual tracking framework is well established to be suitable for real-time vision processing on mobile hardware.
To overcome a critical problem of edge-based tracking in highly cluttered backgrounds, finally, an optimal local searching method is proposed. In the proposed searching scheme, searching directions of 3D-2D correspondences are constrained by object region knowledge. Candidates of correspondences are evaluated on their region levels and region appearance. In this work, the proposed method sufficiently alleviates numerous false matches due to heavy background clutter. Edge-based tracking is therefore substantially improved even in highly cluttered backgrounds while retaining real-time performance.