Part
3
  |  
Seeing the World
  |  
Chapter
12

Live Video and the Pi Camera

The Pi Camera Module is a purpose-built sensor connected directly to the GPU, and treating it like a USB webcam throws away most of its capability.
Reading Time
11
mins
BACK TO RASPBERRY PI MASTERCLASS

The trap is treating the Pi Camera like any USB webcam. You plug in a Camera Module, try cv2.VideoCapture(0), and either it works but runs at half the frame rate the hardware supports, or it doesn't work at all and throws a cryptic GStreamer error. The Camera Module connects through the CSI ribbon cable — a dedicated bus with direct access to the Pi's GPU for hardware-accelerated encoding. When you treat it as a generic Video4Linux device, you bypass that acceleration entirely.

USB cameras and the Pi Camera Module need different capture strategies. I use both, but for different reasons: USB cameras for quick prototyping and compatibility testing, the Pi Camera for anything that needs to run in production at consistent frame rates.

The difference isn't just API convenience — it's architectural. A USB camera shares bandwidth with every other USB device on the bus. A Pi Camera Module has a dedicated hardware path to the GPU. When your application runs alongside USB storage, a USB network adapter, or a USB keyboard, the USB camera's frame delivery gets interrupted. The Pi Camera's frame delivery doesn't. That consistency is what makes the difference between a system that works on your bench and one that works in deployment.

A USB webcam is a peripheral. The Pi Camera Module is an integrated sensor. The code you write for each should reflect that difference.

USB Cameras: The Quick Start Path

USB cameras work through OpenCV's VideoCapture — the same API you'd use on a desktop. Plug in any UVC-compliant webcam and capture frames:

import cv2

cap = cv2.VideoCapture(0)  # 0 = first camera device

if not cap.isOpened():
    print("Cannot open camera")
    exit(1)

# Set resolution (request — camera may not support it)
cap.set(cv2.CAP_PROP_FRAME_WIDTH, 640)
cap.set(cv2.CAP_PROP_FRAME_HEIGHT, 480)

# Verify what we actually got
actual_w = int(cap.get(cv2.CAP_PROP_FRAME_WIDTH))
actual_h = int(cap.get(cv2.CAP_PROP_FRAME_HEIGHT))
print(f"Capturing at: {actual_w}x{actual_h}")

while True:
    ret, frame = cap.read()
    if not ret:
        print("Failed to grab frame")
        break

    # Process the frame here
    gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)

    # Display (only if you have a monitor)
    cv2.imshow('USB Camera', gray)
    if cv2.waitKey(1) & 0xFF == ord('q'):
        break

cap.release()
cv2.destroyAllWindows()
cap.set() is a request, not a command

When you call cap.set(cv2.CAP_PROP_FRAME_WIDTH, 1920), you're requesting a resolution. The camera driver decides whether to honor it. Always read back the actual values with cap.get() after setting. I've seen this pattern where teams set 1080p, assume it worked, and then wonder why their bounding box coordinates are off — because the camera silently fell back to 640x480.

The ret, frame = cap.read() pattern is the fundamental frame-grab loop in OpenCV. ret is a boolean — True if a frame was successfully captured, False if the camera disconnected or the stream ended. Always check ret. When a USB camera overheats or the cable wiggles loose, ret goes False and frame becomes None. Processing None as a NumPy array crashes your entire script.

There's a subtlety about USB camera initialization that tutorials skip. The first few frames from a USB camera are often garbage — black frames, frames with wrong exposure, or frames with auto-white-balance still converging. I always add a warm-up phase: read and discard the first 10–15 frames before entering the main processing loop.

# Warm-up: discard the first 15 frames
for _ in range(15):
    cap.read()

This takes less than a second at 30 FPS and guarantees that every frame your processing code sees has correct exposure and color balance. Without it, your first detection attempt runs on a dark or color-shifted frame and produces misleading results — which is especially confusing during development when you're trying to verify that your pipeline works.

The Pi Camera Module: Native Performance

The Pi Camera Module (v2 or v3) connects through the CSI (Camera Serial Interface) port — a flat ribbon cable that plugs directly into the board. This isn't a USB device. It's a sensor with a dedicated data path to the GPU, capable of hardware H.264 encoding, fast autofocus (on v3), and high-dynamic-range capture that no $20 USB webcam can match.

Modern Raspberry Pi OS uses the libcamera stack. The old raspicam tools (raspistill, raspivid) are deprecated. If a tutorial tells you to use those, it was written before 2022. The Python library you want is picamera2:

# picamera2 is pre-installed on Raspberry Pi OS with desktop.
# On Lite, install it:
sudo apt install -y python3-picamera2
from picamera2 import Picamera2
import cv2

picam2 = Picamera2()

# Configure for still capture
still_config = picam2.create_still_configuration(
    main={"size": (1920, 1080), "format": "RGB888"}
)
picam2.configure(still_config)
picam2.start()

# Capture a single frame as a NumPy array
frame = picam2.capture_array()
print(f"Frame shape: {frame.shape}")  # (1080, 1920, 3)

# Save it
cv2.imwrite('/home/pi/capture.jpg', cv2.cvtColor(frame, cv2.COLOR_RGB2BGR))

picam2.stop()

Notice that picamera2 returns frames in RGB format, not BGR. This is the opposite of OpenCV's convention. When you pass a picamera2 frame to OpenCV functions that expect BGR (like imwrite or drawing functions that use BGR color tuples), convert it first with cv2.cvtColor(frame, cv2.COLOR_RGB2BGR).

You can also request BGR output directly from picamera2 by specifying the format in the configuration. This avoids the conversion overhead entirely:

config = picam2.create_video_configuration(
    main={"size": (640, 480), "format": "BGR888"}  # Direct BGR output
)

I use BGR888 when the pipeline is entirely OpenCV-based. I use RGB888 when I need to pass frames to other libraries (PIL, matplotlib, web APIs) in addition to OpenCV. The choice depends on where the majority of your downstream processing happens. Pick the format that minimizes conversions — every cvtColor call costs time, and in a 30 FPS pipeline, even sub-millisecond savings per frame compound across thousands of frames per minute.

Framework · The Frame Budget · Know your FPS ceiling before you design

Your Pi can process roughly 15–30 FPS at 640x480 with simple operations (grayscale conversion, a single blur). Every processing step you add — detection, annotation, saving — eats into that budget. Measure your actual FPS before you design your pipeline, not after. If your target is 15 FPS and your pipeline already takes 80 ms per frame at the minimum viable resolution, you're 14 ms over budget and no amount of optimization will save you.

Video Streaming with picamera2

For continuous video processing — the use case you actually care about — configure picamera2 for video and grab frames in a loop:

from picamera2 import Picamera2
import cv2
import time

picam2 = Picamera2()

# Video configuration: lower resolution for real-time processing
video_config = picam2.create_video_configuration(
    main={"size": (640, 480), "format": "RGB888"}
)
picam2.configure(video_config)
picam2.start()

# Let the camera warm up (auto-exposure needs a few frames)
time.sleep(2)

fps_counter = 0
fps_start = time.perf_counter()
current_fps = 0.0

try:
    while True:
        frame = picam2.capture_array()

        # Convert RGB → BGR for OpenCV processing
        bgr_frame = cv2.cvtColor(frame, cv2.COLOR_RGB2BGR)

        # Your processing pipeline goes here
        gray = cv2.cvtColor(bgr_frame, cv2.COLOR_BGR2GRAY)
        processed = cv2.GaussianBlur(gray, (5, 5), 0)

        # Measure FPS
        fps_counter += 1
        elapsed = time.perf_counter() - fps_start
        if elapsed >= 1.0:
            current_fps = fps_counter / elapsed
            fps_counter = 0
            fps_start = time.perf_counter()
            print(f"FPS: {current_fps:.1f}")

except KeyboardInterrupt:
    print("Stopping...")

finally:
    picam2.stop()

Measure your FPS with your actual processing pipeline running, not with an empty loop. The camera's maximum frame rate is irrelevant — what matters is how many frames per second survive your processing code.

The time.sleep(2) after picam2.start() is not a hack. The camera's auto-exposure and auto-white-balance algorithms need a few frames to converge. If you start processing immediately, the first 15–30 frames will be too dark or too bright, and your detection code will misfire on every one of them. This is the same warm-up principle I mentioned for USB cameras, but the Pi Camera's auto-exposure convergence is slower because the libcamera stack runs more sophisticated algorithms — adaptive histogram-based exposure, gain control, and white-balance estimation. Two seconds is the conservative default; in well-lit environments, one second is often enough. In dim or mixed-lighting environments, I increase it to three.

Measuring Real Pipeline Performance

Frame rate measurement is not optional — it's your primary engineering constraint. Here's a reusable FPS calculator class I put in every project:

import time

class FPSCounter:
    def __init__(self, avg_over=30):
        self.timestamps = []
        self.avg_over = avg_over

    def tick(self):
        now = time.perf_counter()
        self.timestamps.append(now)
        # Keep only the last N timestamps
        if len(self.timestamps) > self.avg_over:
            self.timestamps = self.timestamps[-self.avg_over:]

    def fps(self):
        if len(self.timestamps) < 2:
            return 0.0
        elapsed = self.timestamps[-1] - self.timestamps[0]
        if elapsed == 0:
            return 0.0
        return (len(self.timestamps) - 1) / elapsed

Use it in your processing loop:

counter = FPSCounter(avg_over=30)

while True:
    frame = picam2.capture_array()
    # ... process frame ...
    counter.tick()
    print(f"\rFPS: {counter.fps():.1f}", end="", flush=True)
Averaging over 30 frames smooths the jitter

Single-frame timing fluctuates wildly because of OS scheduling, garbage collection, and thermal throttling. Averaging over 30 frames gives you a stable number you can actually make engineering decisions against. If you need per-frame profiling, use time.perf_counter() deltas, but report the rolling average.

Resource Cleanup: The Part Everyone Forgets

Cameras are system resources. If your script crashes without releasing the camera, the device stays locked. The next time you try to open it, you get "resource busy" or a silent failure. This is especially painful with the Pi Camera Module, which has a single CSI bus — no fallback device to try.

# USB camera cleanup
cap = cv2.VideoCapture(0)
try:
    while True:
        ret, frame = cap.read()
        if not ret:
            break
        # process...
finally:
    cap.release()
    cv2.destroyAllWindows()

# Pi Camera cleanup
from picamera2 import Picamera2

picam2 = Picamera2()
picam2.configure(picam2.create_video_configuration())
picam2.start()
try:
    while True:
        frame = picam2.capture_array()
        # process...
except KeyboardInterrupt:
    pass
finally:
    picam2.stop()
    picam2.close()
Always use try/finally for camera resources

Ctrl+C triggers KeyboardInterrupt, which skips any cleanup code after the loop. Without a try/finally block, your camera stays locked. On a headless Pi running via SSH, a locked camera means you have to reboot the Pi or manually kill the Python process with kill -9. Neither is acceptable in a production system.

The pattern is the same as database connections, file handles, and network sockets: acquire in setup, release in finally, never assume your loop will exit cleanly.

For production deployments, I go further than try/finally. I wrap the entire camera lifecycle in a context manager so that resource cleanup is automatic:

from contextlib import contextmanager
from picamera2 import Picamera2

@contextmanager
def pi_camera(width=640, height=480, warmup=2):
    picam2 = Picamera2()
    config = picam2.create_video_configuration(
        main={"size": (width, height), "format": "RGB888"}
    )
    picam2.configure(config)
    picam2.start()
    import time
    time.sleep(warmup)
    try:
        yield picam2
    finally:
        picam2.stop()
        picam2.close()

# Usage — cleanup is guaranteed
with pi_camera(640, 480) as cam:
    while True:
        frame = cam.capture_array()
        # process...

This pattern means you never forget cleanup, even if an exception you didn't anticipate terminates your processing loop. The with block guarantees stop() and close() run regardless of how the block exits.

Key takeaway

A camera pipeline has three engineering constraints: resolution (determines CPU load), frame rate (determines responsiveness), and cleanup (determines reliability). Get all three right or your system fails in production in ways that are hard to diagnose remotely.

USB vs Pi Camera: When to Use Each

      My rule: prototype with a USB camera because it's faster to set up. Ship with the Pi Camera Module because it's faster to run. The CSI bus gives you consistent frame delivery that doesn't compete with your USB keyboard, mouse, and network adapter for bandwidth.

      There's also a practical consideration around multi-camera setups. The Pi has one CSI port (two on the Pi 5 with the right adapter), so if you need more than one camera angle, USB cameras are your only option. A common pattern for monitoring projects is one Pi Camera as the primary high-quality feed and one or two USB cameras for supplementary angles. Each USB camera gets its own VideoCapture instance with its own device index (0, 1, 2), and you process them in sequence or with threading.

      One more thing worth mentioning: the Pi Camera Module v3 has a built-in autofocus motor. For projects where the camera-to-subject distance varies — monitoring a doorway where people pass at different distances, for instance — the v3's autofocus is a significant advantage over fixed-focus USB cameras. You control it through picamera2:

      # Set autofocus mode
      picam2.set_controls({"AfMode": 2})   # 2 = continuous autofocus
      # Or manual focus
      picam2.set_controls({"AfMode": 0, "LensPosition": 4.0})  # Manual, specific distance
      

      For fixed-distance applications (a camera mounted above a conveyor belt at a known height, for example), manual focus locked to the correct distance gives sharper images than autofocus, because autofocus occasionally hunts and produces soft frames during transitions. Soft frames during autofocus hunting are the kind of intermittent failure that makes detection accuracy look random — your algorithm works on 95% of frames and fails on the 5% where the camera was refocusing. Locking manual focus eliminates that entire class of problem.

      What to Do Monday Morning

      Capture frames from a USB camera

      Plug in any USB webcam and run the basic VideoCapture loop from this chapter. Verify the actual resolution matches what you requested by reading it back with cap.get(). Save 10 frames to disk to confirm the capture pipeline works end-to-end.

      Set up picamera2 with the Pi Camera Module

      Connect the CSI ribbon cable (power off first), enable the camera in raspi-config, and run the still-capture example. Confirm you get a valid NumPy array with the correct shape. Remember to convert RGB to BGR before passing to OpenCV functions.

      Measure your baseline FPS

      Run the video streaming loop with the FPS counter class. First measure with an empty processing body (just grab and count). Then add grayscale conversion. Then add Gaussian blur. Record the FPS drop at each stage — this is your Frame Budget in action.

      Break the cleanup to understand it

      Start a camera capture script, then kill it with Ctrl+C without a try/finally block. Try to restart the script. Observe the error. Then add proper cleanup and confirm the camera releases correctly on interrupt. This experience is worth more than any documentation.

      Benchmark USB vs Pi Camera at the same resolution

      Run the same processing pipeline (grayscale + blur + edge detection) at 640x480 on both a USB camera and the Pi Camera Module. Compare FPS. The Pi Camera should win by 20–40% on sustained throughput because it doesn't share the USB bus.