Perception
Perception system for the AV2.
Overview
The perception system processes sensor data to understand the environment:
Occupancy Grid: 2D probabilistic obstacle map from LIDAR
YOLOPv2: Object detection and lane segmentation from camera
IPM: Inverse perspective mapping for bird’s-eye view
Occupancy Grid
The occupancy grid maintains a 2D probabilistic map of obstacles around the vehicle.
Algorithm
Bayesian Log-Odds Update
The grid uses log-odds representation for numerical stability:
Where: - \(l(m)\) = log-odds of cell m being occupied - \(z\) = sensor measurement
Update Process:
Receive LIDAR point cloud
Filter points by height (ground removal)
Transform to grid coordinates
Update log-odds for occupied cells
Apply temporal decay
Implementation
class OccupancyGrid:
def __init__(self, size=40.0, resolution=0.1):
self.size = size
self.resolution = resolution
self.grid_size = int(size / resolution)
self.grid = np.zeros((self.grid_size, self.grid_size))
# Log-odds parameters
self.log_odds_hit = 0.4
self.log_odds_miss = -0.2
self.decay_factor = 0.95
def update_from_lidar(self, points):
"""Update grid from LIDAR points."""
# Filter by height
mask = (points[:, 2] > -0.3) & (points[:, 2] < 0.5)
filtered = points[mask]
# Convert to grid coordinates
grid_x = ((filtered[:, 0] / self.resolution) +
self.grid_size // 2).astype(int)
grid_y = ((filtered[:, 1] / self.resolution) +
self.grid_size // 2).astype(int)
# Clip to grid bounds
valid = ((grid_x >= 0) & (grid_x < self.grid_size) &
(grid_y >= 0) & (grid_y < self.grid_size))
grid_x = grid_x[valid]
grid_y = grid_y[valid]
# Update log-odds
self.grid[grid_x, grid_y] += self.log_odds_hit
# Apply decay
self.grid *= self.decay_factor
# Clip values
np.clip(self.grid, -5.0, 5.0, out=self.grid)
def to_probability(self):
"""Convert log-odds to probability."""
return 1.0 / (1.0 + np.exp(-self.grid))
def get_closest_obstacle(self, direction):
"""Find closest obstacle in given direction."""
prob = self.to_probability()
# Search along direction...
Parameters
Parameter |
Default |
Description |
|---|---|---|
size |
40.0 |
Grid size in meters |
resolution |
0.1 |
Meters per cell |
log_odds_hit |
0.4 |
Increment for occupied |
log_odds_miss |
-0.2 |
Decrement for free |
decay_factor |
0.95 |
Temporal decay per frame |
height_min |
-0.3 |
Min height filter (meters) |
height_max |
0.5 |
Max height filter (meters) |
YOLOPv2 Detection
YOLOPv2 is a multi-task deep learning model for driving perception.
Tasks
Object Detection: Vehicles, pedestrians, cyclists
Drivable Area Segmentation: Road surface mask
Lane Line Segmentation: Lane markings mask
Architecture
Input Image (640×640×3)
│
▼
┌─────────────────┐
│ Backbone │ (CSPDarknet)
│ (Shared) │
└────────┬────────┘
│
┌─────┼─────┐
│ │ │
▼ ▼ ▼
┌─────┐ ┌─────┐ ┌─────┐
│Det. │ │Drive│ │Lane │
│Head │ │Head │ │Head │
└──┬──┘ └──┬──┘ └──┬──┘
│ │ │
▼ ▼ ▼
Boxes Mask Mask
Usage
from perception.preprocessing.yolopv2_adapter import YOLOPv2Adapter
# Initialize
detector = YOLOPv2Adapter(model_path='weights/YOLOPv2.pt')
# Process frame
frame = camera.get_frame()
output = detector.process(frame)
# Access results
detections = output.detections # List of Detection objects
drivable_mask = output.drivable_mask # Binary mask
lane_mask = output.lane_mask # Binary mask
Output Format
@dataclass
class Detection:
class_id: int
class_name: str
confidence: float
bbox: Tuple[int, int, int, int] # x1, y1, x2, y2
@dataclass
class YOLOPv2Output:
detections: List[Detection]
drivable_mask: np.ndarray # H×W binary
lane_mask: np.ndarray # H×W binary
Performance
Input Size: 640×640
Inference Time: ~50ms on GTX 1060 (GPU)
Inference Time: ~500ms on CPU (not recommended)
Inverse Perspective Mapping
IPM transforms the camera image to a bird’s-eye view (BEV).
Theory
Using camera intrinsics and extrinsics, we compute a homography matrix that maps image pixels to ground plane coordinates.
Where H is the homography matrix.
Usage
from perception.preprocessing.ipm_processor import IPMProcessor
# Initialize with camera calibration
ipm = IPMProcessor(
camera_matrix=K,
dist_coeffs=D,
camera_height=1.2,
camera_pitch=-10 # degrees
)
# Transform image
bev_image = ipm.transform(image)
Configuration
@dataclass
class BEVConfig:
output_width: int = 400
output_height: int = 400
meters_per_pixel: float = 0.05
roi_front: float = 20.0 # meters ahead
roi_rear: float = 5.0 # meters behind
roi_side: float = 10.0 # meters to each side
Perception Pipeline Integration
The complete perception pipeline:
class PerceptionPipeline:
def __init__(self):
self.occupancy_grid = OccupancyGrid()
self.yolo = YOLOPv2Adapter()
self.ipm = IPMProcessor()
self.state = PerceptionState()
def process(self, lidar_points, camera_frame):
# Update occupancy grid from LIDAR
self.occupancy_grid.update_from_lidar(lidar_points)
# Run YOLOPv2 on camera image
yolo_output = self.yolo.process(camera_frame)
# Generate bird's-eye view
bev = self.ipm.transform(camera_frame)
# Update shared state
self.state.update(
grid=self.occupancy_grid,
detections=yolo_output.detections,
bev=bev
)
return self.state