Create a real-time object tracking camera with TensorFlow and Raspberry Pi

Get started with machine learning by building a portable computer vision and motion tracking system on a budget.
193 readers like this.
Vector, generic Raspberry Pi board

Are you just getting started with machine/deep learning, TensorFlow, or Raspberry Pi?

I created rpi-deep-pantilt as an interactive demo of object detection in the wild, and in this article, I'll show you how to reproduce the video below, which depicts a camera panning and tilting to track my movement across a room.

Real-time tracking setup

Raspberry Pi 4GB, Pi Camera v2.1, Pimoroni Pan-Tilt HAT, Coral Edge TPU USB Accelerator

This article will cover:

  1. Build materials and hardware assembly instructions.
  2. Deploying a TensorFlow Lite object-detection model (MobileNetV3-SSD) to a Raspberry Pi.
  3. Sending tracking instructions to pan/tilt servo motors using a proportional–integral–derivative (PID) controller.
  4. Accelerating inferences of any TensorFlow Lite model with Coral's USB Edge TPU Accelerator and Edge TPU Compiler.

Terms and references

  • Raspberry Pi: A small, affordable computer popular with educators, hardware hobbyists, and robot enthusiasts.
  • Raspbian: The Raspberry Pi Foundation's official operating system for the Pi. Raspbian is derived from Debian Linux.
  • TensorFlow: An open source framework for dataflow programming used for machine learning and deep neural learning.
  • TensorFlow Lite: An open source framework for deploying TensorFlow models on mobile and embedded devices.
  • Convolutional neural network: CNN is a type of neural network architecture that is well-suited for image classification and object detection tasks.
  • Single-shot detector: SSD is a type of CNN architecture specialized for real-time object detection, classification, and bounding box localization.
  • MobileNetV3: A state-of-the-art computer vision model optimized for performance on modest mobile phone processors.
  • MobileNetV3-SSD: An SSD based on MobileNet architecture. This tutorial will use MobileNetV3-SSD models available through TensorFlow's object-detection model zoo.

    Comparison of computer vision neural networks

    Comparison of computer vision neural networks

  • Edge TPU: a tensor processing unit (TPU) is an integrated circuit for accelerating computations performed by TensorFlow. The Edge TPU was developed with a small footprint for mobile and embedded devices "at the edge."
Cloud TPUv1
Cloud TPUv2
Edge TPUs

Cloud TPUs (left and center) accelerate TensorFlow model training and inference. Edge TPUs (right) accelerate inferences in mobile devices.

Build list



Looking for a project with fewer moving pieces? Check out Portable Computer Vision: TensorFlow 2.0 on a Raspberry Pi to create a hand-held image classifier.

Set up the Raspberry Pi

There are two ways you can install Raspbian to your MicroSD card:

  1. NOOBS ("New Out Of Box Software") is a GUI operating system installation manager. If this is your first Raspberry Pi project, I'd recommend starting here.
  2. Write the Raspbian image to an SD card.

This tutorial and supporting software were written using Raspbian (Buster). If you're using a different version of Raspbian or another platform, you'll probably experience some pains.

Before proceeding, you'll need to:

Install software

  1. Install system dependencies:
    $ sudo apt-get update && sudo apt-get install -y python3-dev libjpeg-dev libatlas-base-dev raspi-gpio libhdf5-dev python3-smbus
  2. Create a new project directory:
    $ mkdir rpi-deep-pantilt && cd rpi-deep-pantilt
  3. Create a new virtual environment:
    $ python3 -m venv .venv
  4. Activate the virtual environment:
    $ source .venv/bin/activate && python3 -m pip install --upgrade pip
  5. Install TensorFlow 2.0 from a community-built wheel:
    $ pip install
  6. Install the rpi-deep-pantilt Python package:
    $ python3 -m pip install rpi-deep-pantilt

Assemble Pan-Tilt HAT hardware

If you purchased a pre-assembled Pan-Tilt HAT kit, you can skip to the next section. Otherwise, follow the steps in Assembling Pan-Tilt HAT before proceeding.

Connect the Pi Camera

  1. Turn off the Raspberry Pi.
  2. Locate the camera module between the USB module and HDMI modules.
  3. Unlock the black plastic clip by (gently) pulling upward.
  4. Insert the camera module's ribbon cable (with metal connectors facing away from the Ethernet/USB ports on a Raspberry Pi 4).
  5. Lock the black plastic clip.

Getting started with the Pi Camera

Enable the Pi Camera

  1. Turn the Raspberry Pi on.
  2. Run sudo raspi-config and select Interfacing Options from the Raspberry Pi Software Configuration Tool's main menu. Press Enter.

    Raspberry Pi Software Configuration Tool
  3. Select the Enable Camera menu option and press Enter.

    Enable Camera
  4. In the next menu, use the Right arrow key to highlight Enable and press Enter.

    Enable Raspberry Pi Camera

Test the Pan-Tilt HAT

Next, test the installation and setup of your Pan-Tilt HAT module.

  1. SSH into your Raspberry Pi.
  2. Activate your virtual environment:
    source .venv/bin/activate
  3. Run:
    rpi-deep-pantilt test pantilt
  4. Exit the test with Ctrl+C.

If you installed the HAT correctly, you should see both servos moving in a smooth sinusoidal motion while the test is running.

Pan-Tilt HAT in motion

Test the Pi Camera

Next, verify that the Pi Camera is installed correctly by starting the camera's preview overlay. The overlay will render on the Pi's primary display (HDMI).

  1. Plug your Raspberry Pi into an HDMI screen.
  2. SSH into your Raspberry Pi.
  3. Activate your virtual environment:
    $ source .venv/bin/activate
  4. Run:
    $ rpi-deep-pantilt test camera
  5. Exit the test with Ctrl+C.

If you installed the Pi Camera correctly, you should see footage from the camera rendered on your HDMI or composite display.

Test object detection

Next, verify you can run an object-detection model (MobileNetV3-SSD) on your Raspberry Pi.

  1. SSH into your Raspberry Pi.
  2. Activate your Virtual Environment:
    $ source .venv/bin/activate
  3. Run:
    $ rpi-deep-pantilt detect

Your Raspberry Pi should detect objects, attempt to classify them, and draw bounding boxes around them. Note: Only the following objects can be detected and tracked using the default MobileNetV3-SSD model.

$ rpi-deep-pantilt list-labels
[‘person’, ‘bicycle’, ‘car’, ‘motorcycle’, ‘airplane’, ‘bus’, ‘train’, ‘truck’, ‘boat’, ‘traffic light’, ‘fire hydrant’, ‘stop sign’, ‘parking meter’, ‘bench’, ‘bird’, ‘cat’, ‘dog’, ‘horse’, ‘sheep’, ‘cow’, ‘elephant’, ‘bear’, ‘zebra’, ‘giraffe’, ‘backpack’, ‘umbrella’, ‘handbag’, ‘tie’, ‘suitcase’, ‘frisbee’, ‘skis’, ‘snowboard’, ‘sports ball’, ‘kite’, ‘baseball bat’, ‘baseball glove’, ‘skateboard’, ‘surfboard’, ‘tennis racket’, ‘bottle’, ‘wine glass’, ‘cup’, ‘fork’, ‘knife’, ‘spoon’, ‘bowl’, ‘banana’, ‘apple’, ‘sandwich’, ‘orange’, ‘broccoli’, ‘carrot’, ‘hot dog’, ‘pizza’, ‘donut’, ‘cake’, ‘chair’, ‘couch’, ‘potted plant’, ‘bed’, ‘dining table’, ‘toilet’, ‘tv’, ‘laptop’, ‘mouse’, ‘remote’, ‘keyboard’, ‘cell phone’, ‘microwave’, ‘oven’, ‘toaster’, ‘sink’, ‘refrigerator’, ‘book’, ‘clock’, ‘vase’, ‘scissors’, ‘teddy bear’, ‘hair drier’, ‘toothbrush’]

Track objects at ~8FPS

This is the moment you've been waiting for! Take the following steps to track an object at roughly eight frames per second (FPS) using the Pan-Tilt HAT.

  1. SSH into your Raspberry Pi.
  2. Activate your virtual environment:
    $source .venv/bin/activate
  3. Run:
    $ rpi-deep-pantilt track

By default, this will track objects with the label person. You can track a different type of object using the --label parameter.

For example, to track a banana, you would run:

$ rpi-deep-pantilt track --label=banana

On a Raspberry Pi 4 (4GB), I benchmarked my model at roughly 8FPS.

INFO:root:FPS: 8.100870481091935
INFO:root:FPS: 8.130448201926173
INFO:root:FPS: 7.6518234817241355
INFO:root:FPS: 7.657477766009717
INFO:root:FPS: 7.861758172395542
INFO:root:FPS: 7.8549541944597
INFO:root:FPS: 7.907857699044301

Track objects in real-time with Edge TPU

You can accelerate model inference speed with Coral's USB Accelerator. The USB Accelerator contains an Edge TPU, which is an ASIC chip specialized for TensorFlow Lite operations. For more info, check out Getting started with the USB Accelerator.

  1. SSH into your Raspberry Pi.
  2. Install the Edge TPU runtime:
    $ echo "deb coral-edgetpu-stable main" | sudo tee /etc/apt/sources.list.d/coral-edgetpu.list
    $ curl | sudo apt-key add -
    $ sudo apt-get update && sudo apt-get install libedgetpu1-std
  3. Plug in the Edge TPU (preferably into a USB 3.0 port). If your Edge TPU was already plugged in, remove and re-plug it so the udev device manager can detect it.
  4. Try the detect command with the --edge-tpu option. You should be able to detect objects in real-time!
    $ rpi-deep-pantilt detect --edge-tpu --loglevel=INFO

    Note that loglevel=INFO will show you the FPS at which objects are detected and bounding boxes are rendered to the Raspberry Pi Camera's overlay.

    You should see around ~24FPS, which is the rate at which frames are sampled from the Pi Camera into a frame buffer:

    INFO:root:FPS: 24.716493958392558
    INFO:root:FPS: 24.836166606505206
    INFO:root:FPS: 23.031063233367547
    INFO:root:FPS: 25.467177106703623
    INFO:root:FPS: 27.480438524486594
    INFO:root:FPS: 25.41399952505432
  5. Try the track command with the --edge-tpu option:
    $ rpi-deep-pantilt track --edge-tpu

Wrapping up

Congratulations! You're now the proud owner of a DIY object-tracking system, which uses a single-shot detector (a type of convolutional neural network) to classify and localize objects.

PID controller

The pan/tilt tracking system uses a proportional–integral–derivative (PID) controller to track the centroid of a bounding box smoothly.

PID Controller Architecture

TensorFlow model zoo

The models in this tutorial are derived from ssd_mobilenet_v3_small_coco and ssd_mobilenet_edgetpu_coco in the TensorFlow detection model zoo.

My models are available for download via GitHub releases notes in leigh-johnson/rpi-deep-pantilt.

I added the custom TFLite_Detection_PostProcess operation, which implements a variation of non-maximum suppression (NMS) on model output. NMS is a technique that filters many bounding box proposals using set operations.

Non-maximum Suppression (NMS)

Special thanks and acknowledgments

  • MobileNetEdgeTPU SSDLite contributors: Yunyang Xiong, Bo Chen, Suyog Gupta, Hanxiao Liu, Gabriel Bender, Mingxing Tan, Berkin Akin, Zhichao Lu, Quoc Le
  • MobileNetV3 SSDLite contributors: Bo Chen, Zhichao Lu, Vivek Rathod, Jonathan Huang
  • Adrian Rosebrock for writing Pan/tilt face tracking with a Raspberry Pi and OpenCV, which was the inspiration for this whole project
  • Jason Zaman for reviewing this article and early release candidates

This article was originally published on the Towards Data Science Medium channel and is reused with permission.

What to read next

7 favorite Raspberry Pi projects

Having recently co-authored a book about building things with the Raspberry Pi ( Raspberry Pi Hacks), I've spent a lot of the last couple of years talking about this credit…

User profile image.
Leigh is a Staff Machine Learning Engineer at Slack, and is recognized as a Google Developer Expert in Machine Learning. She is a GDG Cloud San Francisco and Django Girls organizer.

Comments are closed.

Creative Commons LicenseThis work is licensed under a Creative Commons Attribution-Share Alike 4.0 International License.