Perception

Perception system for the AV2.

Overview 

The perception system processes sensor data to understand the environment:

Occupancy Grid: 2D probabilistic obstacle map from LIDAR
YOLOPv2: Object detection and lane segmentation from camera
IPM: Inverse perspective mapping for bird’s-eye view

Occupancy Grid 

The occupancy grid maintains a 2D probabilistic map of obstacles around the vehicle.

Algorithm 

Bayesian Log-Odds Update

The grid uses log-odds representation for numerical stability:

\[l(m|z) = l(m) + \log\frac{p(m|z)}{1-p(m|z)}\]

Where: - \(l(m)\) = log-odds of cell m being occupied - \(z\) = sensor measurement

Update Process:

Receive LIDAR point cloud
Filter points by height (ground removal)
Transform to grid coordinates
Update log-odds for occupied cells
Apply temporal decay

Implementation 

class OccupancyGrid:
    def __init__(self, size=40.0, resolution=0.1):
        self.size = size
        self.resolution = resolution
        self.grid_size = int(size / resolution)
        self.grid = np.zeros((self.grid_size, self.grid_size))

        # Log-odds parameters
        self.log_odds_hit = 0.4
        self.log_odds_miss = -0.2
        self.decay_factor = 0.95

    def update_from_lidar(self, points):
        """Update grid from LIDAR points."""
        # Filter by height
        mask = (points[:, 2] > -0.3) & (points[:, 2] < 0.5)
        filtered = points[mask]

        # Convert to grid coordinates
        grid_x = ((filtered[:, 0] / self.resolution) +
                  self.grid_size // 2).astype(int)
        grid_y = ((filtered[:, 1] / self.resolution) +
                  self.grid_size // 2).astype(int)

        # Clip to grid bounds
        valid = ((grid_x >= 0) & (grid_x < self.grid_size) &
                 (grid_y >= 0) & (grid_y < self.grid_size))
        grid_x = grid_x[valid]
        grid_y = grid_y[valid]

        # Update log-odds
        self.grid[grid_x, grid_y] += self.log_odds_hit

        # Apply decay
        self.grid *= self.decay_factor

        # Clip values
        np.clip(self.grid, -5.0, 5.0, out=self.grid)

    def to_probability(self):
        """Convert log-odds to probability."""
        return 1.0 / (1.0 + np.exp(-self.grid))

    def get_closest_obstacle(self, direction):
        """Find closest obstacle in given direction."""
        prob = self.to_probability()
        # Search along direction...

Parameters 

Parameter	Default	Description
size	40.0	Grid size in meters
resolution	0.1	Meters per cell
log_odds_hit	0.4	Increment for occupied
log_odds_miss	-0.2	Decrement for free
decay_factor	0.95	Temporal decay per frame
height_min	-0.3	Min height filter (meters)
height_max	0.5	Max height filter (meters)

YOLOPv2 Detection 

YOLOPv2 is a multi-task deep learning model for driving perception.

Tasks 

Object Detection: Vehicles, pedestrians, cyclists
Drivable Area Segmentation: Road surface mask
Lane Line Segmentation: Lane markings mask

Architecture 

Input Image (640×640×3)
      │
      ▼
┌─────────────────┐
│    Backbone     │  (CSPDarknet)
│   (Shared)      │
└────────┬────────┘
         │
   ┌─────┼─────┐
   │     │     │
   ▼     ▼     ▼
┌─────┐ ┌─────┐ ┌─────┐
│Det. │ │Drive│ │Lane │
│Head │ │Head │ │Head │
└──┬──┘ └──┬──┘ └──┬──┘
   │       │       │
   ▼       ▼       ▼
Boxes   Mask    Mask

Usage 

from perception.preprocessing.yolopv2_adapter import YOLOPv2Adapter

# Initialize
detector = YOLOPv2Adapter(model_path='weights/YOLOPv2.pt')

# Process frame
frame = camera.get_frame()
output = detector.process(frame)

# Access results
detections = output.detections      # List of Detection objects
drivable_mask = output.drivable_mask  # Binary mask
lane_mask = output.lane_mask        # Binary mask

Output Format 

@dataclass
class Detection:
    class_id: int
    class_name: str
    confidence: float
    bbox: Tuple[int, int, int, int]  # x1, y1, x2, y2

@dataclass
class YOLOPv2Output:
    detections: List[Detection]
    drivable_mask: np.ndarray  # H×W binary
    lane_mask: np.ndarray      # H×W binary

Performance 

Input Size: 640×640
Inference Time: ~50ms on GTX 1060 (GPU)
Inference Time: ~500ms on CPU (not recommended)

Inverse Perspective Mapping 

IPM transforms the camera image to a bird’s-eye view (BEV).

Theory 

Using camera intrinsics and extrinsics, we compute a homography matrix that maps image pixels to ground plane coordinates.

\[\begin{split}\begin{bmatrix} x_{bev} \\ y_{bev} \\ 1 \end{bmatrix} = H \begin{bmatrix} u \\ v \\ 1 \end{bmatrix}\end{split}\]

Where H is the homography matrix.

Usage 

from perception.preprocessing.ipm_processor import IPMProcessor

# Initialize with camera calibration
ipm = IPMProcessor(
    camera_matrix=K,
    dist_coeffs=D,
    camera_height=1.2,
    camera_pitch=-10  # degrees
)

# Transform image
bev_image = ipm.transform(image)

Configuration 

@dataclass
class BEVConfig:
    output_width: int = 400
    output_height: int = 400
    meters_per_pixel: float = 0.05
    roi_front: float = 20.0   # meters ahead
    roi_rear: float = 5.0     # meters behind
    roi_side: float = 10.0    # meters to each side

Perception Pipeline Integration 

The complete perception pipeline:

class PerceptionPipeline:
    def __init__(self):
        self.occupancy_grid = OccupancyGrid()
        self.yolo = YOLOPv2Adapter()
        self.ipm = IPMProcessor()
        self.state = PerceptionState()

    def process(self, lidar_points, camera_frame):
        # Update occupancy grid from LIDAR
        self.occupancy_grid.update_from_lidar(lidar_points)

        # Run YOLOPv2 on camera image
        yolo_output = self.yolo.process(camera_frame)

        # Generate bird's-eye view
        bev = self.ipm.transform(camera_frame)

        # Update shared state
        self.state.update(
            grid=self.occupancy_grid,
            detections=yolo_output.detections,
            bev=bev
        )

        return self.state