==========
Perception
==========

Perception system for the AV2.

.. contents:: Contents
   :local:
   :depth: 2

Overview
========

The perception system processes sensor data to understand the environment:

- **Occupancy Grid**: 2D probabilistic obstacle map from LIDAR
- **YOLOPv2**: Object detection and lane segmentation from camera
- **IPM**: Inverse perspective mapping for bird's-eye view

Occupancy Grid
==============

The occupancy grid maintains a 2D probabilistic map of obstacles around the vehicle.

Algorithm
---------

**Bayesian Log-Odds Update**

The grid uses log-odds representation for numerical stability:

.. math::

   l(m|z) = l(m) + \log\frac{p(m|z)}{1-p(m|z)}

Where:
- :math:`l(m)` = log-odds of cell m being occupied
- :math:`z` = sensor measurement

**Update Process**:

1. Receive LIDAR point cloud
2. Filter points by height (ground removal)
3. Transform to grid coordinates
4. Update log-odds for occupied cells
5. Apply temporal decay

Implementation
--------------

.. code-block:: python

   class OccupancyGrid:
       def __init__(self, size=40.0, resolution=0.1):
           self.size = size
           self.resolution = resolution
           self.grid_size = int(size / resolution)
           self.grid = np.zeros((self.grid_size, self.grid_size))

           # Log-odds parameters
           self.log_odds_hit = 0.4
           self.log_odds_miss = -0.2
           self.decay_factor = 0.95

       def update_from_lidar(self, points):
           """Update grid from LIDAR points."""
           # Filter by height
           mask = (points[:, 2] > -0.3) & (points[:, 2] < 0.5)
           filtered = points[mask]

           # Convert to grid coordinates
           grid_x = ((filtered[:, 0] / self.resolution) +
                     self.grid_size // 2).astype(int)
           grid_y = ((filtered[:, 1] / self.resolution) +
                     self.grid_size // 2).astype(int)

           # Clip to grid bounds
           valid = ((grid_x >= 0) & (grid_x < self.grid_size) &
                    (grid_y >= 0) & (grid_y < self.grid_size))
           grid_x = grid_x[valid]
           grid_y = grid_y[valid]

           # Update log-odds
           self.grid[grid_x, grid_y] += self.log_odds_hit

           # Apply decay
           self.grid *= self.decay_factor

           # Clip values
           np.clip(self.grid, -5.0, 5.0, out=self.grid)

       def to_probability(self):
           """Convert log-odds to probability."""
           return 1.0 / (1.0 + np.exp(-self.grid))

       def get_closest_obstacle(self, direction):
           """Find closest obstacle in given direction."""
           prob = self.to_probability()
           # Search along direction...

Parameters
----------

+-------------------+---------+--------------------------------+
| Parameter         | Default | Description                    |
+===================+=========+================================+
| size              | 40.0    | Grid size in meters            |
+-------------------+---------+--------------------------------+
| resolution        | 0.1     | Meters per cell                |
+-------------------+---------+--------------------------------+
| log_odds_hit      | 0.4     | Increment for occupied         |
+-------------------+---------+--------------------------------+
| log_odds_miss     | -0.2    | Decrement for free             |
+-------------------+---------+--------------------------------+
| decay_factor      | 0.95    | Temporal decay per frame       |
+-------------------+---------+--------------------------------+
| height_min        | -0.3    | Min height filter (meters)     |
+-------------------+---------+--------------------------------+
| height_max        | 0.5     | Max height filter (meters)     |
+-------------------+---------+--------------------------------+

YOLOPv2 Detection
=================

YOLOPv2 is a multi-task deep learning model for driving perception.

Tasks
-----

1. **Object Detection**: Vehicles, pedestrians, cyclists
2. **Drivable Area Segmentation**: Road surface mask
3. **Lane Line Segmentation**: Lane markings mask

Architecture
------------

.. code-block:: text

   Input Image (640×640×3)
         │
         ▼
   ┌─────────────────┐
   │    Backbone     │  (CSPDarknet)
   │   (Shared)      │
   └────────┬────────┘
            │
      ┌─────┼─────┐
      │     │     │
      ▼     ▼     ▼
   ┌─────┐ ┌─────┐ ┌─────┐
   │Det. │ │Drive│ │Lane │
   │Head │ │Head │ │Head │
   └──┬──┘ └──┬──┘ └──┬──┘
      │       │       │
      ▼       ▼       ▼
   Boxes   Mask    Mask

Usage
-----

.. code-block:: python

   from perception.preprocessing.yolopv2_adapter import YOLOPv2Adapter

   # Initialize
   detector = YOLOPv2Adapter(model_path='weights/YOLOPv2.pt')

   # Process frame
   frame = camera.get_frame()
   output = detector.process(frame)

   # Access results
   detections = output.detections      # List of Detection objects
   drivable_mask = output.drivable_mask  # Binary mask
   lane_mask = output.lane_mask        # Binary mask

Output Format
-------------

.. code-block:: python

   @dataclass
   class Detection:
       class_id: int
       class_name: str
       confidence: float
       bbox: Tuple[int, int, int, int]  # x1, y1, x2, y2

   @dataclass
   class YOLOPv2Output:
       detections: List[Detection]
       drivable_mask: np.ndarray  # H×W binary
       lane_mask: np.ndarray      # H×W binary

Performance
-----------

- **Input Size**: 640×640
- **Inference Time**: ~50ms on GTX 1060 (GPU)
- **Inference Time**: ~500ms on CPU (not recommended)

Inverse Perspective Mapping
===========================

IPM transforms the camera image to a bird's-eye view (BEV).

Theory
------

Using camera intrinsics and extrinsics, we compute a homography matrix that maps image pixels to ground plane coordinates.

.. math::

   \begin{bmatrix} x_{bev} \\ y_{bev} \\ 1 \end{bmatrix} = H \begin{bmatrix} u \\ v \\ 1 \end{bmatrix}

Where H is the homography matrix.

Usage
-----

.. code-block:: python

   from perception.preprocessing.ipm_processor import IPMProcessor

   # Initialize with camera calibration
   ipm = IPMProcessor(
       camera_matrix=K,
       dist_coeffs=D,
       camera_height=1.2,
       camera_pitch=-10  # degrees
   )

   # Transform image
   bev_image = ipm.transform(image)

Configuration
-------------

.. code-block:: python

   @dataclass
   class BEVConfig:
       output_width: int = 400
       output_height: int = 400
       meters_per_pixel: float = 0.05
       roi_front: float = 20.0   # meters ahead
       roi_rear: float = 5.0     # meters behind
       roi_side: float = 10.0    # meters to each side

Perception Pipeline Integration
===============================

The complete perception pipeline:

.. code-block:: python

   class PerceptionPipeline:
       def __init__(self):
           self.occupancy_grid = OccupancyGrid()
           self.yolo = YOLOPv2Adapter()
           self.ipm = IPMProcessor()
           self.state = PerceptionState()

       def process(self, lidar_points, camera_frame):
           # Update occupancy grid from LIDAR
           self.occupancy_grid.update_from_lidar(lidar_points)

           # Run YOLOPv2 on camera image
           yolo_output = self.yolo.process(camera_frame)

           # Generate bird's-eye view
           bev = self.ipm.transform(camera_frame)

           # Update shared state
           self.state.update(
               grid=self.occupancy_grid,
               detections=yolo_output.detections,
               bev=bev
           )

           return self.state