Innovative 3D Object Detection Makes Autonomous Driving Perception Sharper

Unlocking new autonomous driving skills based on evolving perception

Intelligent “Eyes” for Autonomous Driving

Even in hot weather and torrential rain, autonomous trucks surrounded by stacked containers need to operate 24/7 at the ports. During the night, these trucks also need to maintain safe operation. These challenges can only be addressed by a pair of intelligent “eyes” to accurately identify and avoid obstacles.

The intelligent “eyes” of trucks are achieved through 3D detection technology. This technology plays a key role in the precise environmental perception of autonomous driving, enabling vehicles to make safe decisions. Currently, most 3D detection solutions adopt monocular cameras due to their easy deployment and lower cost. However, it faces challenges in estimating depth information in complex scenarios and poses certain safety risks. Therefore, solution developers turn to stereo cameras, which also have limitations.

3D detection technology helps

autonomous vehicles “understand” and “perceive” their surroundings

3D detection technology partially resolves the issue of depth information estimation by mimicking the imaging principle of human eyes and calculating depth through “disparity”. Nevertheless, it has relatively insufficient perceptual precision for distant or small objects under changes in strong light or interference from dynamic objects. This is the reason why vehicles have poor operational safety and efficiency due to “low vision”. As a result, how to enhance the precision and robustness of depth perception has become a bottleneck that hinders the development of 3D detection technology.

“Vision” Enhancement

If we could overcome this technical bottleneck, it would not only improve safety and automation efficiency but also reduce cost and accelerate the process of intelligence. This is undoubtedly an innovative project worth exploring. In response, Westwell and Tongji University have jointly developed a new 3D detection method—the Stereo Pyramid Transformer (SPT), providing the intelligent “eyes” of autonomous driving with a new perspective and clearer vision.

At IROS 2024 (IEEE/RSJ International Conference on Intelligent Robots and Systems), the two parties jointly published a paper on their research results titled “3D Object Detection via Stereo Pyramid Transformers with Rich Semantic Feature Fusion”, which has been officially accepted and published by IROS.

“3D Object Detection via Stereo Pyramid Transformers with Rich Semantic Feature Fusion”

has been officially accepted and published by IROS,

attracting significant attention at the conference

The SPT stands out due to its multi-layer “pyramid” structure, which enables the model to further comprehend and handle complex information by extracting and fusing image features layer by layer. This structure allows each layer to focus on specific image areas, capturing more scenario details. Meanwhile, the semantic attention mechanism helps the model understand the spatial relationships between objects, providng essential information for driving tasks including route planing.

Additionally, the SPT innovatively combines stereo detection with monocular depth prediction, improving depth perception precision for complex environments. Even in adverse weather conditions such as heavy rain or smog, it can still maintain stable and efficient detection.

Technology-validated Milestone Achievements

Tests based on the KITTI dataset and proprietary datasets show that SPT significantly outperforms previous 3D detection technologies in handling complex environments. Its detection accuracy reaches 85.14% of Map3D, demonstrating its stability in various environments. Through rigorous ablation experiments, our researchers clarified the contribution of each module, further proving the important role of “depth information” in model construction.

Experimental results on the KITTI open benchmark dataset

Ablation experiment results

Even under extreme test conditions, such as severe occlusion and complex lighting, the SPT can still accurately identify target objects, showing its strong adaptability to dynamic environments. Meanwhile, our depth error analysis shows that SPT’s average per-pixel depth estimation error is significantly decreased from 0.25m (other models) to 0.1567m, providing reliable technical support for spatial position estimation.

E-Truck is applied to port transportation.

Q-Tractor is applied to airport cargo terminal transportation.

Q-Truck handles transportation in factories and industrial parks.

We have given full play to the detection capabilities of the SPT in actual seaport scenarios. It enables Westwell’s self-developed commercial NEVs, including E-Truck (Upgradeable autonomous new energy heavy Truck), Q-Trucks (Autonomous driving heavy trucks), and Q-Tractor (Autonomous driving tractor) to not only see distant environments clearly but also quickly identify their surroundings and make accurate judgments. These vehicles can operate smoothly in complex scenarios such as airports, land ports, railway ports, factories, and industrial parks, effectively avoiding potential collisions with other vehicles or equipment and completing transportation tasks in a safe and efficient manner.

Industry-academia-research Collaboration Pushes Autonomous Driving Ahead

Based on the combination of theoretical knowledge and practical applications, the industry-academia-research collaboration between Westwell and Tongji University ensures the reliability of this innovative 3D detection technology. This innovative solution has well balanced cost and precision, laying a solid foundation for future large-scale applications.

Currently, the SPT is primarily equipped on Westwell’s self-developed binocular cameras, which are installed on autonomous heavy trucks and applied to scenario-based intelligent cargo detection. The intelligent “eyes” will evolve with software system upgrades, enhancing perception precision and automation efficiency and making significant contributions to environmental safety and sustainable development.

Westwell’s self-developed binocular cameras are applied to

vehicle-mounted and scenario-based intelligent detection

Guided by the concept of intelligent and green development, Westwell aims to continuously enhance technological innovation and applications of “AI + New Energy”, promoting sustainable and environmental transformation in multiple scenarios in the global logistics industry. The SPT not only demonstrates Westwell’s competitiveness in autonomous driving research and development through its practical results but also effectively propels and safeguards the efficient evolution of green logistics.

Innovative 3D Object Detection Makes Autonomous Driving Perception Sharper

Comments

Leave a Reply Cancel reply