In-Vehicle Gaze Estimation

What Do You See in Vehicle? Comprehensive Vision Solution for In-Vehicle Gaze Estimation

Yihua Cheng¹, Yaning Zhu², Zongji Wang³, Hongquan Hao⁴, Yongwei Liu⁴,

Shiqing Cheng⁴, Xi Wang⁴, Hyung Jin Chang¹

¹ University of Birmingham, ²HUST, ³Chinese Academy of Sciences, ⁴CalmCar,

(This work is accepted by CVPR24)

Abstract

In this paper, we present three novel elements to advance vision-based in-vehicle gaze research.

1. Dataset. we introduce IVGaze, a pioneering dataset capturing in-vehicle gaze, compiled from 125 individuals and covering a large range of gaze and head poses within vehicles. Conventional gaze collection systems are inadequate for in-vehicle use. In this dataset, we propose a new vision-based solution for in-vehicle gaze collection, introducing a refined gaze target calibration method to tackle annotation challenges.

2. Gaze Estimation. Our research focuses on in-vehicle gaze estimation leveraging the IVGaze. Images of in-vehicle faces often suffer from low resolution, prompting our introduction of a gaze pyramid transformer that leverages transformer-based multilevel features integration. Expanding upon this, we introduce the dual-stream gaze pyramid transformer (GazeDPTR). Employing perspective transformation, we rotate virtual cameras to normalize images, utilizing camera pose to merge normalized and original images for accurate gaze estimation. GazeDPTR shows SOTA performance on the IVGaze dataset.

3. Extensive Application. We explore a novel strategy for gaze zone classification by extending the GazeDPTR. A foundational tri-plane and project gaze onto these planes are newly defined. Leveraging both positional features from projection points and visual feature from images, we achieve superior accuracy compared to relying solely on visual features, demonstrating the advantage of gaze estimation.

IVGaze Dataset

Data Collection System

We introduce an in-vehicle vision-based gaze collection system. This system utilizes an infrared camera to capture human faces and is calibrated with a depth camera. Gaze targets are represented by red points on stickers affixed within the vehicle.

Data Statistics

We collect 44,795 images from 125 subjects. The horizontal gaze is from -50^° to 90^°, and the vertical gaze is from -40^° to 40^°. Interpolate start reference image.

Dual-Stream Gaze Pyramid Transformer

Network Architecture

We propose a gaze pyramid transformer (GazePTR) that utilizes a transformer to integrate multilevel features. Expanding upon this, we propose a dual-stream gaze pyramid transformer (GazeDPTR). We rotate virtual cameras via perspective transformation to normalize images, and leverage camera pose to merge normalized and original images. we extend GazeDPTR for the downstream gaze zone classification task with a foundational tri-plane.

Quantitative Comparison

We evaluate our method in IVGaze dataset. Our work is built based on GazeTR. We define a new metric, Average Precision (AP). The AP of <k^° means an estimation is considered correct if the angular error is lower than k^°.

BibTeX

@InProceedings{cheng2024ivgaze, author = {Yihua Cheng and Yaning Zhu and Zongji Wang and Hongquan Hao and Yongwei Liu and Shiqing Cheng and Xi Wang and Hyung Jin Chang}, title = {What Do You See in Vehicle? Comprehensive Vision Solution for In-Vehicle Gaze Estimation}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, year = {2024}, }