The most challenging part of deep learning is labeling, as you'll see in part one of this two-part series, Learn how to classify images with TensorFlow. Proper training is critical to effective future classification, and for training to work, we need lots of accurately labeled data. In part one, I skipped over this challenge by downloading 3,000 prelabeled images. I then showed you how to use this labeled data to train your classifier with TensorFlow. In this part we'll train with a new data set, and I'll introduce the TensorBoard suite of data visualization tools to make it easier to understand, debug, and optimize our TensorFlow code.
Given my work as VP of engineering and compliance at healthcare technology company C-SATS, I was eager to build a classifier for something related to surgery. Suturing seemed like a great place to start. It is immediately useful, and I know how to recognize it. It is useful because, for example, if a machine can see when suturing is occurring, it can automatically identify the step (phase) of a surgical procedure where suturing takes place, e.g. anastomosis. And I can recognize it because the needle and thread of a surgical suture are distinct, even to my layperson's eyes.
My goal was to train a machine to identify suturing in medical videos.
I have access to billions of frames of non-identifiable surgical video, many of which contain suturing. But I'm back to the labeling problem. Luckily, C-SATS has an army of experienced annotators who are experts at doing exactly this. My source data were video files and annotations in JSON.
The annotations look like this:
[
{
"annotations": [
{
"endSeconds": 2115.215,
"label": "suturing",
"startSeconds": 2319.541
},
{
"endSeconds": 2976.301,
"label": "suturing",
"startSeconds": 2528.884
}
],
"durationSeconds": 2975,
"videoId": 5
},
{
"annotations": [
// ...etc...
I wrote a Python script to use the JSON annotations to decide which frames to grab from the .mp4 video files. ffmpeg
does the actual grabbing. I decided to grab at most one frame per second, then I divided the total number of video seconds by four to get 10k seconds (10k frames). After I figured out which seconds to grab, I ran a quick test to see if a particular second was inside or outside a segment annotated as suturing (isWithinSuturingSegment()
in the code below). Here's grab.py
:
#!/usr/bin/python
# Grab frames from videos with ffmpeg. Use multiple cores.
# Minimum resolution is 1 second--this is a shortcut to get less frames.
# (C)2017 Adam Monsen. License: AGPL v3 or later.
import json
import subprocess
from multiprocessing import Pool
import os
frameList = []
def isWithinSuturingSegment(annotations, timepointSeconds):
for annotation in annotations:
startSeconds = annotation['startSeconds']
endSeconds = annotation['endSeconds']
if timepointSeconds > startSeconds and timepointSeconds < endSeconds:
return True
return False
with open('available-suturing-segments.json') as f:
j = json.load(f)
for video in j:
videoId = video['videoId']
videoDuration = video['durationSeconds']
# generate many ffmpeg frame-grabbing commands
start = 1
stop = videoDuration
step = 4 # Reduce to grab more frames
for timepointSeconds in xrange(start, stop, step):
inputFilename = '/home/adam/Downloads/suturing-videos/{}.mp4'.format(videoId)
outputFilename = '{}-{}.jpg'.format(video['videoId'], timepointSeconds)
if isWithinSuturingSegment(video['annotations'], timepointSeconds):
outputFilename = 'suturing/{}'.format(outputFilename)
else:
outputFilename = 'not-suturing/{}'.format(outputFilename)
outputFilename = '/home/adam/local/{}'.format(outputFilename)
commandString = 'ffmpeg -loglevel quiet -ss {} -i {} -frames:v 1 {}'.format(
timepointSeconds, inputFilename, outputFilename)
frameList.append({
'outputFilename': outputFilename,
'commandString': commandString,
})
def grabFrame(f):
if os.path.isfile(f['outputFilename']):
print 'already completed {}'.format(f['outputFilename'])
else:
print 'processing {}'.format(f['outputFilename'])
subprocess.check_call(f['commandString'].split())
p = Pool(4) # for my 4-core laptop
p.map(grabFrame, frameList)
Now we're ready to retrain the model again, exactly as before.
To use this script to snip out 10k frames took me about 10 minutes, then an hour or so to retrain Inception to recognize suturing at 90% accuracy. I did spot checks with new data that wasn't from the training set, and every frame I tried was correctly identified (mean confidence score: 88%, median confidence score: 91%).
Here are my spot checks. (WARNING: Contains links to images of blood and guts.)
Image | Not suturing score | Suturing score |
---|---|---|
Not-Suturing-01.jpg | 0.71053 | 0.28947 |
Not-Suturing-02.jpg | 0.94890 | 0.05110 |
Not-Suturing-03.jpg | 0.99825 | 0.00175 |
Suturing-01.jpg | 0.08392 | 0.91608 |
Suturing-02.jpg | 0.08851 | 0.91149 |
Suturing-03.jpg | 0.18495 | 0.81505 |
How to use TensorBoard
Visualizing what's happening under the hood and communicating this with others is at least as hard with deep learning as it is in any other kind of software. TensorBoard to the rescue!
Retrain.py
from part one automatically generates the files TensorBoard uses to generate graphs representing what happened during retraining.
To set up TensorBoard, run the following inside the container after running retrain.py
.
pip install tensorboard
tensorboard --logdir /tmp/retrain_logs
Watch the output and open the printed URL in a browser.
Starting TensorBoard 41 on port 6006
(You can navigate to http://172.17.0.2:6006)
You'll see something like this:
I hope this will help; if not, you'll at least have something cool to show. During retraining, I found it helpful to see under the "SCALARS" tab how accuracy increases while cross-entropy decreases as we perform more training steps. This is what we want.
Learn more
If you'd like to learn more, explore these resources:
- Pete Warden's wonderful TensorFlow for poets is a nice, wholly practical take on transfer learning with Inception, but some of the links are broken. This step-by-step tutorial is current and the breakdown is handy.
- For more code and deeper explanations, try the image recognition and retraining tutorials on tensorflow.org.
- I prefer reading to viewing when learning, but I found the video Build a TensorFlow Image Classifier in 5 Min surprisingly useful and complete. If you prefer something with less silliness, maybe steer toward Josh Gordon's tutorials, like this one.
Here are other resources that I used in writing this series, which may help you, too:
- Original and derived source code used in this series
- 3 cool machine learning projects using TensorFlow and the Raspberry Pi
- Create a simple image classifier using TensorFlow
- Ready-to-use TensorFlow examples
- TensorBoard demo & hints video (from TensorBoard: Visualizing learning)
- Stanford's CS 20SI: TensorFlow for Deep Learning Research course (syllabus and GitHub)
- Access unexposed container port
docker inspect tensorflow | grep --max-count=1 '\bIPAddress'
- Continuous online video classification with TensorFlow, Inception, and a Raspberry Pi Part 1 and Part 2
- Multi-label image classification with Inception net
- The unreasonable effectiveness of recurrent neural networks
- TensorFlow for poets 2: Optimize for mobile (and its GitHub repo)
If you'd like to chat about this topic, please drop by the ##tfadam topical channel on Freenode IRC. You can also email me or leave a comment below.
This series would never have happened without great feedback from Eva Monsen, Brian C. Lane, Rob Smith, Alex Simes, VM Brasseur, Bri Hatch, and the editors at Opensource.com.
2 Comments