Sensor Setup and Data Collection
Sensor Setup
All data in AIODrive dataset is collected under the sensor configuration shown in the figure below, which consists of five high-resolution RGB cameras including one stereo pair; five depth cameras (located at the same places as the RGB cameras), 1000 meter range LiDAR at three levels of density (up to 1M points per frame), 1000 meter range SPAD-LiDAR, four Radar sensors, and IMU/GPS. Four types of the sensors (camera, LiDAR, SPAD-LiDAR, Radar) have 360° horizontal coverage for full-surround perception.

5x High-resolution RGB cameras:
- 10Hz capture frequency
- 360° horizontal coverage and 5 viewpoints: left, right, front left, front right, back. Each camera has 120° field of view
- 1920 × 720 resolution
- Images are stored in PNG uint8 format
- Include one stereo pair in front with a baseline of 0.8 meters
5x High-resolution depth cameras:
- 10Hz capture frequency
- 360° horizontal coverage same as above RGB cameras
- 1920 × 720 resolution
- Images are stored in PNG uint8 format
- Range of depth: 1000 meters
4x Radar:
- 10Hz capture frequency
- 360° horizontal coverage with 4 viewpoints: left, right, front, back. Each viewpoint has 120° horizontal coverage, 90° vertical field of view
- Up to 150 thousand points per second
- Range of depth: 1000 meters
- Point cloud with velocity measurement stored in float32 format
1x IMU/GPS:
- 10Hz capture frequency
1x Spinning Velodyne-64 LiDAR:
- 10Hz capture frequency
- 360° horizontal coverage, +2° to -24.5 ° vertical field of view
- 64 channels
- Up to 2.2 million points per second
- Range of depth: 120 meters
- Point cloud stored in float32 format
2x Spinning long-range high-density LiDAR:
- 10Hz capture frequency
- 360° horizontal coverage, +90° to -90 ° vertical field of view
- 800/1280 channels
- Up to 10/16 million points per second
- Range of depth: 1000 meters
- Point cloud stored in float32 format
1x Long-range high-density SPAD-LiDAR:
- 10Hz capture frequency
- 360° horizontal coverage, +70° to -70 ° vertical field of view
- 700 channels
- Up to 10 million points per second
- Range of depth: 1000 meters
- Raw sensor data is stored as a 3D tensor representing photon counts
- Top-2 strongest point cloud returns are stored in float32 format
Scene Map
We collect 2.8 hours of data in all available maps (eight in total) of the Carla simulator. Different map has unique features. For example, Town1 and Town2 have basic layout including T junctions while Town3 is more complicated with 5-lane junction, roundout, unevenness and tunnel. Also, Town4 and Town6 have highway which allows higher driving speed. Town5 has a bridge and multi-lane roads which have lots of lane-changing driving data. More information about the map information can be found on the Carla website. We use data from six maps (Town1-6) for training and validation and the data collected in the other two maps (Town7, Town10) are used for testing.

Town1

Town2

Town3

Town4

Town5

Town6

Town7

Town10
RGB Camera Data

View 0 (back)

View 1 (left)

View 2 (front left)

View 3 (front right)

View 4 (right)
Depth Camera Data

View 0 (back)

View 1 (left)

View 2 (front left)

View 3 (front right)

View 4 (right)
High-Density and Long-Range Point Cloud
