The typical architecture I see in beginner Pi projects looks like this: a sensor reads a value, the Pi packages it into a JSON blob, and a script fires it at a cloud endpoint — an AWS Lambda, a Google Cloud Function, a self-hosted server somewhere. Every reading. Every second. Raw, unprocessed, and duplicated. The Pi is doing the work of a $2 WiFi-enabled microcontroller, and the cloud server is doing all the thinking. The architecture treats a quad-core Linux computer with 8 GB of RAM as a serial-to-HTTP bridge.
That's backwards.
The Pi shouldn't just harvest data and forward it. It should serve data locally, process it at the edge, and only send what matters upstream.
Your Pi runs a full operating system. It has a web server, a database, a cron daemon, and a Python runtime. It can filter, aggregate, deduplicate, and serve data — locally — without a network round trip. It can expose an API that other devices on the network query directly. It can receive commands over HTTP and act on them in real time, with latency measured in milliseconds instead of the 50–200ms you'd get from a cloud round trip.
This chapter is about turning your Pi into an edge API server — a device that doesn't just collect data but serves it, processes it, and makes decisions locally.
Flask is the right starting point for a Pi API. It's minimal, well-documented, and runs comfortably on a Pi Zero. A Flask application that serves sensor data and accepts commands fits in under 80 lines of Python.
Your Pi shouldn't just harvest data and forward it. It should serve data locally, process it at the edge, and only send what matters upstream. The network is the bottleneck, not the Pi. Every reading you can process locally is a reading you don't pay to transmit, store, or query remotely.
Here is a complete Flask application that reads GPIO state, serves sensor data, and controls an LED via HTTP:
#!/usr/bin/env python3
"""edge_api.py — Flask API server running on a Raspberry Pi."""
from flask import Flask, jsonify, request
from datetime import datetime
import time
# ── Try real GPIO; fall back to simulation for development ──────────
try:
from gpiozero import LED, Button, CPUTemperature
led = LED(17)
button = Button(27)
cpu = CPUTemperature()
SIMULATED = False
except Exception:
SIMULATED = True
print("GPIO not available — running in simulated mode")
class FakeLED:
def __init__(self):
self.is_lit = False
def on(self):
self.is_lit = True
def off(self):
self.is_lit = False
def toggle(self):
self.is_lit = not self.is_lit
class FakeButton:
is_pressed = False
class FakeCPU:
temperature = 42.0
led = FakeLED()
button = FakeButton()
cpu = FakeCPU()
app = Flask(__name__)
# ── In-memory storage for recent readings ───────────────────────────
readings = []
MAX_READINGS = 1000
def get_current_reading():
"""Capture a sensor snapshot."""
return {
"cpu_temp_c": round(cpu.temperature, 1),
"led_state": "on" if led.is_lit else "off",
"button_pressed": button.is_pressed,
"timestamp": datetime.utcnow().isoformat() + "Z",
"simulated": SIMULATED,
}
# ── GET endpoints ───────────────────────────────────────────────────
@app.route("/api/status", methods=["GET"])
def status():
"""Current device status — single snapshot."""
return jsonify(get_current_reading())
@app.route("/api/readings", methods=["GET"])
def get_readings():
"""Return recent readings. Optional ?limit=N query param."""
limit = request.args.get("limit", 50, type=int)
limit = min(limit, MAX_READINGS)
return jsonify({
"count": len(readings[-limit:]),
"readings": readings[-limit:],
})
@app.route("/api/health", methods=["GET"])
def health():
"""Health check for monitoring systems."""
return jsonify({
"status": "healthy",
"uptime_seconds": round(time.time() - app.config["START_TIME"], 1),
"readings_stored": len(readings),
})
# ── POST endpoints (commands) ───────────────────────────────────────
@app.route("/api/led", methods=["POST"])
def control_led():
"""Control the LED. Body: {"action": "on" | "off" | "toggle"}"""
data = request.get_json(silent=True) or {}
action = data.get("action", "").lower()
if action == "on":
led.on()
elif action == "off":
led.off()
elif action == "toggle":
led.toggle()
else:
return jsonify({"error": f"Unknown action: {action}",
"valid": ["on", "off", "toggle"]}), 400
return jsonify({
"action": action,
"led_state": "on" if led.is_lit else "off",
})
# ── Background: collect readings periodically ──────────────────────
def collect_reading():
"""Store a reading (called by the scheduler or manually)."""
reading = get_current_reading()
readings.append(reading)
if len(readings) > MAX_READINGS:
readings.pop(0)
return reading
if __name__ == "__main__":
import threading
app.config["START_TIME"] = time.time()
# Background thread: collect a reading every 10 seconds
def collector():
while True:
collect_reading()
time.sleep(10)
t = threading.Thread(target=collector, daemon=True)
t.start()
# Run Flask on all interfaces, port 5000
app.run(host="0.0.0.0", port=5000, debug=False)
Save this as edge_api.py, install Flask in a virtualenv, and run it:
python3 -m venv ~/edge-env
source ~/edge-env/bin/activate
pip install flask gpiozero
python edge_api.py
From any machine on the same network:
# Get current status
curl http://pi-sensor-01.local:5000/api/status
# Turn the LED on
curl -X POST http://pi-sensor-01.local:5000/api/led \
-H "Content-Type: application/json" \
-d '{"action": "on"}'
# Get the last 10 readings
curl http://pi-sensor-01.local:5000/api/readings?limit=10
A Flask API on the Pi turns a data-collection device into a queryable, controllable edge server. Other devices on the network can read state and issue commands without going through the cloud.
Flask's built-in server is single-threaded, has no process management, and prints a warning telling you not to use it in production. It's right to warn you. A single slow request blocks every other request until it completes. If a client opens a connection and never closes it, your entire API hangs.
Gunicorn solves this. It's a production-grade WSGI server that runs multiple worker processes, handles connection management, and restarts crashed workers automatically:
pip install gunicorn
# Run with 2 workers (good for Pi 4; use 1 for Pi Zero)
gunicorn --bind 0.0.0.0:5000 --workers 2 --timeout 30 edge_api:app
Wrap it in a systemd service for automatic startup:
# /etc/systemd/system/edge-api.service
[Unit]
Description=Edge API server (Flask + Gunicorn)
After=network-online.target
Wants=network-online.target
[Service]
Type=simple
User=hesham
WorkingDirectory=/home/hesham/edge-api
ExecStart=/home/hesham/edge-env/bin/gunicorn \
--bind 0.0.0.0:5000 \
--workers 2 \
--timeout 30 \
--access-logfile - \
edge_api:app
Restart=on-failure
RestartSec=5
Environment=PYTHONUNBUFFERED=1
[Install]
WantedBy=multi-user.target
sudo systemctl enable edge-api.service
sudo systemctl start edge-api.service
The rule of thumb for Gunicorn workers is (2 × CPU cores) + 1. On a Pi 4 (4 cores), that's 9 workers — too many. Each worker is a full Python process consuming 30–50 MB of RAM. On a Pi 4 with 2 GB of RAM, I use 2–3 workers. On a Pi Zero 2W, I use 1. More workers than your RAM can support triggers swapping, and swap on an SD card is catastrophically slow.
If you're starting a new project and your Pi runs Python 3.8+, FastAPI is worth considering. It generates interactive API documentation automatically, validates request bodies with Pydantic, and supports async handlers natively. The performance difference over Flask on a Pi is negligible — both are I/O-bound by network and GPIO speed, not by the framework — but the developer experience is meaningfully better.
#!/usr/bin/env python3
"""edge_api_fast.py — FastAPI version of the edge API."""
from fastapi import FastAPI, HTTPException
from pydantic import BaseModel
from datetime import datetime
from enum import Enum
app = FastAPI(
title="Pi Edge API",
description="Sensor data and device control on a Raspberry Pi",
version="1.0.0",
)
class LEDAction(str, Enum):
on = "on"
off = "off"
toggle = "toggle"
class LEDCommand(BaseModel):
action: LEDAction
@app.get("/api/status")
async def status():
return {
"cpu_temp_c": 42.0, # replace with real sensor read
"led_state": "off",
"timestamp": datetime.utcnow().isoformat() + "Z",
}
@app.post("/api/led")
async def control_led(command: LEDCommand):
# In production, call gpiozero LED methods here
return {"action": command.action, "led_state": command.action}
pip install fastapi uvicorn
# Run in production with uvicorn
uvicorn edge_api_fast:app --host 0.0.0.0 --port 5000
Navigate to http://pi-sensor-01.local:5000/docs and you get a full Swagger UI — interactive documentation where you can test every endpoint from a browser. For teams where multiple people interact with the Pi's API, this auto-documentation saves hours of "what parameters does this endpoint accept?" conversations.
FastAPI generates interactive API documentation automatically. For a team project, that Swagger page eliminates every "what does this endpoint expect?" question before it's asked.
If you want to build a browser-based dashboard that calls your Pi's API directly — a React app, a plain HTML page with fetch calls, anything running in a browser — you'll hit CORS errors immediately. Browsers block JavaScript from making requests to a different origin (your Pi's IP) unless the server explicitly allows it.
pip install flask-cors
from flask_cors import CORS
app = Flask(__name__)
CORS(app, resources={r"/api/*": {"origins": "*"}})
origins: "*" allows any website on the internet to call your Pi's API from JavaScript. For a home lab, this is fine. For a Pi connected to actuators — motors, relays, locks — restrict CORS to the specific origins that should have access: origins: ["http://192.168.1.100:3000"].
An unauthenticated API on your local network means anyone who connects to your WiFi — a guest, a compromised IoT device, a neighbor on a shared network — can toggle your relays, read your sensor data, or issue arbitrary commands. The minimum viable boundary is an API key.
from functools import wraps
from flask import request, jsonify
API_KEY = "your-secret-key-here" # In production, load from env var
def require_api_key(f):
@wraps(f)
def decorated(*args, **kwargs):
key = request.headers.get("X-API-Key")
if key != API_KEY:
return jsonify({"error": "Invalid or missing API key"}), 401
return f(*args, **kwargs)
return decorated
@app.route("/api/led", methods=["POST"])
@require_api_key
def control_led():
# ... same as before
pass
# Authenticated request
curl -X POST http://pi-sensor-01.local:5000/api/led \
-H "Content-Type: application/json" \
-H "X-API-Key: your-secret-key-here" \
-d '{"action": "on"}'
This is not industrial-grade security. It's a shared secret in a header. But it's the difference between "anyone on the network can control my hardware" and "only clients with the key can control my hardware." For a home lab or a small deployment, that difference matters.
An API key in a custom header is the minimum viable security boundary for a Pi API. It takes ten lines of code and prevents casual unauthorized access to hardware controls.
The deeper reason to run APIs on the Pi — rather than forwarding everything to the cloud — is reliability. Networks fail. ISPs go down. Cloud services have outages. A Pi that depends on a cloud endpoint to make decisions stops making decisions when the network drops. A Pi that processes data locally and serves it locally keeps working.
I've seen this pattern where a greenhouse monitoring system sends every sensor reading to a cloud dashboard. The system works beautifully for months. Then the rural ISP has a twelve-hour outage. The cloud dashboard shows a gap in the data. But worse, the ventilation triggers — which depend on cloud-side logic evaluating temperature thresholds — stop firing. The greenhouse overheats. The crop damage costs more than the entire monitoring system.
If the Pi had processed the temperature threshold locally and controlled the ventilation relay directly, the ISP outage would have been invisible. The dashboard would have a gap, but the plants would be alive.
The edge-first architecture doesn't mean you never send data to the cloud. It means the Pi processes data first, makes time-critical decisions locally, and sends only what matters upstream — summaries, alerts, aggregated metrics. A sensor that reads every second generates 86,400 readings per day. An edge-first Pi sends a 5-minute average every 5 minutes — 288 data points — and an immediate alert when a threshold is crossed. The cloud gets the information it needs. The network carries 0.3% of the raw traffic. The Pi handles the rest.
Here is the concrete pattern for edge aggregation — a function that buffers raw readings and periodically sends only the statistical summary upstream:
import statistics
import requests
import time
# ── Edge aggregation buffer ─────────────────────────────────────────
buffer = []
FLUSH_INTERVAL = 300 # 5 minutes
CLOUD_ENDPOINT = "https://your-cloud.example.com/api/ingest"
def add_reading(value):
"""Buffer a raw reading locally."""
buffer.append({"value": value, "timestamp": time.time()})
def flush_to_cloud():
"""Send a statistical summary, not the raw data."""
if not buffer:
return
values = [r["value"] for r in buffer]
summary = {
"mean": round(statistics.mean(values), 2),
"min": round(min(values), 2),
"max": round(max(values), 2),
"stdev": round(statistics.stdev(values), 2) if len(values) > 1 else 0,
"count": len(values),
"period_start": buffer[0]["timestamp"],
"period_end": buffer[-1]["timestamp"],
}
try:
requests.post(CLOUD_ENDPOINT, json=summary, timeout=10)
except requests.RequestException as e:
print(f"Cloud upload failed — data retained locally: {e}")
return # don't clear buffer if upload failed
buffer.clear()
The key insight in this code is the return without clearing the buffer when the upload fails. If the network is down, the Pi retains the raw data and tries again on the next flush cycle. No data is lost. The cloud eventually gets the summary when connectivity returns. Compare this to the naive approach of firing every reading at a cloud endpoint and silently dropping the ones that fail — which is exactly what happens in most beginner Pi projects.
Edge-first means the Pi processes data locally, makes time-critical decisions without the network, and sends only statistical summaries upstream. The network carries 0.3% of the raw traffic. The Pi handles the rest.
Copy edge_api.py to your Pi, install Flask in a virtualenv, and run it. From your laptop, use curl to hit /api/status, /api/readings, and /api/led. Confirm the LED responds to POST commands (or that simulated mode reports state changes correctly).
Install Gunicorn, create the systemd service file, enable it, and reboot the Pi. After reboot, confirm the API is accessible. Check journalctl -u edge-api.service for any startup errors.
Implement the require_api_key decorator on any endpoint that controls hardware (LED, relay, motor). Test that requests without the key get a 401 response. Keep read endpoints (status, readings) open for now — or protect them too if your network is shared.
If you currently send raw sensor data to a cloud service, modify the flow: process the data on the Pi, send only aggregated summaries or threshold alerts upstream. Measure the reduction in network traffic and cloud API calls.
Install FastAPI and uvicorn. Create a single endpoint. Open /docs in a browser and interact with the Swagger UI. Decide whether the auto-documentation justifies switching from Flask for your next project.
The network is the bottleneck, not the Pi. Every reading you can process locally is a reading you don't pay to transmit, store, or query remotely.