Create a real-time object tracking camera with TensorFlow and Raspberry Pi |

Create a real-time object tracking camera with TensorFlow and Raspberry Pi

Get started with machine learning by building a portable computer vision and motion tracking system on a budget.

Vector, generic Raspberry Pi board

Subscribe now

Get the highlights in your inbox every week.

Are you just getting started with machine/deep learning, TensorFlow, or Raspberry Pi?

I created rpi-deep-pantilt as an interactive demo of object detection in the wild, and in this article, I'll show you how to reproduce the video below, which depicts a camera panning and tilting to track my movement across a room.


Real-time tracking setup

Raspberry Pi 4GB, Pi Camera v2.1, Pimoroni Pan-Tilt HAT, Coral Edge TPU USB Accelerator

This article will cover:

  1. Build materials and hardware assembly instructions.
  2. Deploying a TensorFlow Lite object-detection model (MobileNetV3-SSD) to a Raspberry Pi.
  3. Sending tracking instructions to pan/tilt servo motors using a proportional–integral–derivative (PID) controller.
  4. Accelerating inferences of any TensorFlow Lite model with Coral's USB Edge TPU Accelerator and Edge TPU Compiler.

Terms and references

  • Raspberry Pi: A small, affordable computer popular with educators, hardware hobbyists, and robot enthusiasts.
  • Raspbian: The Raspberry Pi Foundation's official operating system for the Pi. Raspbian is derived from Debian Linux.
  • TensorFlow: An open source framework for dataflow programming used for machine learning and deep neural learning.
  • TensorFlow Lite: An open source framework for deploying TensorFlow models on mobile and embedded devices.
  • Convolutional neural network: CNN is a type of neural network architecture that is well-suited for image classification and object detection tasks.
  • Single-shot detector: SSD is a type of CNN architecture specialized for real-time object detection, classification, and bounding box localization.
  • MobileNetV3: A state-of-the-art computer vision model optimized for performance on modest mobile phone processors.
  • MobileNetV3-SSD: An SSD based on MobileNet architecture. This tutorial will use MobileNetV3-SSD models available through TensorFlow's object-detection model zoo.


    Comparison of computer vision neural networks

    Comparison of computer vision neural networks

  • Edge TPU: a tensor processing unit (TPU) is an integrated circuit for accelerating computations performed by TensorFlow. The Edge TPU was developed with a small footprint for mobile and embedded devices "at the edge."

Cloud TPUs (left and center) accelerate TensorFlow model training and inference. Edge TPUs (right) accelerate inferences in mobile devices.

Build list



Looking for a project with fewer moving pieces? Check out Portable Computer Vision: TensorFlow 2.0 on a Raspberry Pi to create a hand-held image classifier.

Set up the Raspberry Pi

There are two ways you can install Raspbian to your MicroSD card:

  1. NOOBS ("New Out Of Box Software") is a GUI operating system installation manager. If this is your first Raspberry Pi project, I'd recommend starting here.
  2. Write the Raspbian image to an SD card.

This tutorial and supporting software were written using Raspbian (Buster). If you're using a different version of Raspbian or another platform, you'll probably experience some pains.

Before proceeding, you'll need to:

Install software

  1. Install system dependencies:
    $ sudo apt-get update && sudo apt-get install -y python3-dev libjpeg-dev libatlas-base-dev raspi-gpio libhdf5-dev python3-smbus
  2. Create a new project directory:
    $ mkdir rpi-deep-pantilt && cd rpi-deep-pantilt
  3. Create a new virtual environment:
    $ python3 -m venv .venv
  4. Activate the virtual environment:
    $ source .venv/bin/activate && python3 -m pip install --upgrade pip
  5. Install TensorFlow 2.0 from a community-built wheel:
    $ pip install
  6. Install the rpi-deep-pantilt Python package:
    $ python3 -m pip install rpi-deep-pantilt

Assemble Pan-Tilt HAT hardware

If you purchased a pre-assembled Pan-Tilt HAT kit, you can skip to the next section. Otherwise, follow the steps in Assembling Pan-Tilt HAT before proceeding.

Connect the Pi Camera

  1. Turn off the Raspberry Pi.
  2. Locate the camera module between the USB module and HDMI modules.
  3. Unlock the black plastic clip by (gently) pulling upward.
  4. Insert the camera module's ribbon cable (with metal connectors facing away from the Ethernet/USB ports on a Raspberry Pi 4).
  5. Lock the black plastic clip.

Enable the Pi Camera

  1. Turn the Raspberry Pi on.
  2. Run sudo raspi-config and select Interfacing Options from the Raspberry Pi Software Configuration Tool's main menu. Press Enter.
  3. Select the Enable Camera menu option and press Enter.
  4. In the next menu, use the Right arrow key to highlight Enable and press Enter.

Test the Pan-Tilt HAT

Next, test the installation and setup of your Pan-Tilt HAT module.

  1. SSH into your Raspberry Pi.
  2. Activate your virtual environment:
    source .venv/bin/activate
  3. Run:
    rpi-deep-pantilt test pantilt
  4. Exit the test with Ctrl+C.

If you installed the HAT correctly, you should see both servos moving in a smooth sinusoidal motion while the test is running.

Test the Pi Camera

Next, verify that the Pi Camera is installed correctly by starting the camera's preview overlay. The overlay will render on the Pi's primary display (HDMI).

  1. Plug your Raspberry Pi into an HDMI screen.
  2. SSH into your Raspberry Pi.
  3. Activate your virtual environment:
    $ source .venv/bin/activate
  4. Run:
    $ rpi-deep-pantilt test camera
  5. Exit the test with Ctrl+C.

If you installed the Pi Camera correctly, you should see footage from the camera rendered on your HDMI or composite display.

Test object detection

Next, verify you can run an object-detection model (MobileNetV3-SSD) on your Raspberry Pi.

  1. SSH into your Raspberry Pi.
  2. Activate your Virtual Environment:
    $ source .venv/bin/activate
  3. Run:
    $ rpi-deep-pantilt detect

Your Raspberry Pi should detect objects, attempt to classify them, and draw bounding boxes around them. Note: Only the following objects can be detected and tracked using the default MobileNetV3-SSD model.

$ rpi-deep-pantilt list-labels
[‘person’, ‘bicycle’, ‘car’, ‘motorcycle’, ‘airplane’, ‘bus’, ‘train’, ‘truck’, ‘boat’, ‘traffic light’, ‘fire hydrant’, ‘stop sign’, ‘parking meter’, ‘bench’, ‘bird’, ‘cat’, ‘dog’, ‘horse’, ‘sheep’, ‘cow’, ‘elephant’, ‘bear’, ‘zebra’, ‘giraffe’, ‘backpack’, ‘umbrella’, ‘handbag’, ‘tie’, ‘suitcase’, ‘frisbee’, ‘skis’, ‘snowboard’, ‘sports ball’, ‘kite’, ‘baseball bat’, ‘baseball glove’, ‘skateboard’, ‘surfboard’, ‘tennis racket’, ‘bottle’, ‘wine glass’, ‘cup’, ‘fork’, ‘knife’, ‘spoon’, ‘bowl’, ‘banana’, ‘apple’, ‘sandwich’, ‘orange’, ‘broccoli’, ‘carrot’, ‘hot dog’, ‘pizza’, ‘donut’, ‘cake’, ‘chair’, ‘couch’, ‘potted plant’, ‘bed’, ‘dining table’, ‘toilet’, ‘tv’, ‘laptop’, ‘mouse’, ‘remote’, ‘keyboard’, ‘cell phone’, ‘microwave’, ‘oven’, ‘toaster’, ‘sink’, ‘refrigerator’, ‘book’, ‘clock’, ‘vase’, ‘scissors’, ‘teddy bear’, ‘hair drier’, ‘toothbrush’]

Track objects at ~8FPS

This is the moment you've been waiting for! Take the following steps to track an object at roughly eight frames per second (FPS) using the Pan-Tilt HAT.

  1. SSH into your Raspberry Pi.
  2. Activate your virtual environment:
    $source .venv/bin/activate
  3. Run:
    $ rpi-deep-pantilt track

By default, this will track objects with the label person. You can track a different type of object using the --label parameter.

For example, to track a banana, you would run:

$ rpi-deep-pantilt track --label=banana

On a Raspberry Pi 4 (4GB), I benchmarked my model at roughly 8FPS.

INFO:root:FPS: 8.100870481091935
INFO:root:FPS: 8.130448201926173
INFO:root:FPS: 7.6518234817241355
INFO:root:FPS: 7.657477766009717
INFO:root:FPS: 7.861758172395542
INFO:root:FPS: 7.8549541944597
INFO:root:FPS: 7.907857699044301

Track objects in real-time with Edge TPU

You can accelerate model inference speed with Coral's USB Accelerator. The USB Accelerator contains an Edge TPU, which is an ASIC chip specialized for TensorFlow Lite operations. For more info, check out Getting started with the USB Accelerator.

  1. SSH into your Raspberry Pi.
  2. Install the Edge TPU runtime:
    $ echo "deb coral-edgetpu-stable main" | sudo tee /etc/apt/sources.list.d/coral-edgetpu.list

    $ curl | sudo apt-key add -

    $ sudo apt-get update && sudo apt-get install libedgetpu1-std
  3. Plug in the Edge TPU (preferably into a USB 3.0 port). If your Edge TPU was already plugged in, remove and re-plug it so the udev device manager can detect it.
  4. Try the detect command with the --edge-tpu option. You should be able to detect objects in real-time!
    $ rpi-deep-pantilt detect --edge-tpu --loglevel=INFO

    Note that loglevel=INFO will show you the FPS at which objects are detected and bounding boxes are rendered to the Raspberry Pi Camera's overlay.

    You should see around ~24FPS, which is the rate at which frames are sampled from the Pi Camera into a frame buffer:

    INFO:root:FPS: 24.716493958392558
    INFO:root:FPS: 24.836166606505206
    INFO:root:FPS: 23.031063233367547
    INFO:root:FPS: 25.467177106703623
    INFO:root:FPS: 27.480438524486594
    INFO:root:FPS: 25.41399952505432
  5. Try the track command with the --edge-tpu option:
    $ rpi-deep-pantilt track --edge-tpu

Wrapping up

Congratulations! You're now the proud owner of a DIY object-tracking system, which uses a single-shot detector (a type of convolutional neural network) to classify and localize objects.

PID controller

The pan/tilt tracking system uses a proportional–integral–derivative (PID) controller to track the centroid of a bounding box smoothly.

TensorFlow model zoo

The models in this tutorial are derived from ssd_mobilenet_v3_small_coco and ssd_mobilenet_edgetpu_coco in the TensorFlow detection model zoo.

My models are available for download via GitHub releases notes in leigh-johnson/rpi-deep-pantilt.

I added the custom TFLite_Detection_PostProcess operation, which implements a variation of non-maximum suppression (NMS) on model output. NMS is a technique that filters many bounding box proposals using set operations.

Special thanks and acknowledgments

  • MobileNetEdgeTPU SSDLite contributors: Yunyang Xiong, Bo Chen, Suyog Gupta, Hanxiao Liu, Gabriel Bender, Mingxing Tan, Berkin Akin, Zhichao Lu, Quoc Le
  • MobileNetV3 SSDLite contributors: Bo Chen, Zhichao Lu, Vivek Rathod, Jonathan Huang
  • Adrian Rosebrock for writing Pan/tilt face tracking with a Raspberry Pi and OpenCV, which was the inspiration for this whole project
  • Jason Zaman for reviewing this article and early release candidates

This article was originally published on the Towards Data Science Medium channel and is reused with permission.

open source button on keyboard

Having recently co-authored a book about building things with the Raspberry Pi ( Raspberry Pi Hacks...
Author photos with filters with Raspberry Pi

Learn about the Raspberry Pi camera options and fun photo and video projects.
Developing code.

TensorFlow and the Raspberry Pi are working together in the city and on the farm. Learn about three recent, innovative projects.

About the author

Leigh Johnson - Leigh is a Staff Machine Learning Engineer at Slack, and is recognized as a Google Developer Expert in Machine Learning. She is a GDG Cloud San Francisco and Django Girls organizer. Leigh's expertise intersects machine learning, data engineering, database reliability, and distributed systems. She blogs and speaks about TensorFlow Lite on Raspberry Pi, automation for distributed systems, and data science.