How to write a web service using Python Flask

What if you could write your own web services? Get started with this tutorial.

Image by:

Yuko Honda on Flickr. CC BY-SA 2.0

Many of our customers are building useful services using our webhook feature—but unfortunately, others are not. Often we hear that no one on their team is proficient enough to write a service that can ingest a webhook payload and do something with the data. That leaves them either hoping to get cycles from their development team (unlikely) or continuing to do without.

But what if you could write your own web services? How many routine tasks that involve taking data from system A and inputting it into system B could you automate?

Learning to code well enough can be a major skill in your tool chest and a major asset for optimizing security processes in your organization. In this post, I'm going to walk you through a tutorial that will get you started on the road to writing your own web services using Python Flask.

What we're building

Specifically, I'm going to walk through the creation of a simple Python Flask app that provides a RESTful web service. The service will provide an endpoint to:

Ingest a JSON formatted payload (webhook) from Threat Stack
Parse the payload for Threat Stack Alert IDs
Retrieve detailed alert data from Threat Stack
Archive the webhook and alert data to AWS S3

But before I jump in, keep a couple of things to keep in mind. First, I will not be bothering with any sort of frontend display functionality, so you don't need to worry about HTML or CSS. Second, my organization follows Flask's own suggested organization. I am going to skip the single module pattern and go straight to the Packages and Blueprints models.

There is a large range of Flask tutorials. On one hand, there are tutorials that explain how to build small, simple apps (where the entire app fits in a single file). On the other hand, there are tutorials that explain how to build much larger, complicated apps. This tutorial fills a sweet spot in the middle and demonstrates a structure that is simple, but which can immediately accommodate increasingly complex requirements.

Project structure

The structure of the project that I'm going to build, which comes from Explore Flask, is shown below:

Threatstack-to-s3

├── app

│   ├── __init__.py

│   ├── models

│   │   ├── __init__.py

│   │   ├── s3.py

│   │   └── threatstack.py

│   └── views

│       ├── __init__.py

│       └── s3.py

├── gunicorn.conf.py

├── requirements.osx.txt

├── requirements.txt

└── threatstack-to-s3.py

Top-level files

I'll start the discussion with the top-level files that are useful to me as I build the service:

Gunicorn.conf.py: This is a configuration file for the Gunicorn WSGI HTTP server that will serve up this app. While the application can run and accept connections on its own, Gunicorn is more efficient at handling multiple connections and allowing the app to scale with load.

Requirements.txt/requirements.osx.txt: The app's dependencies are listed in this file. It is used by the pip utility to install the needed Python packages. For information on installing dependencies, see the Setup section of this README.md.

Threatstack-to-s3.py: This is the application launcher. It can be run directly using "python" if you are doing local debugging, or it can be passed as an argument to "gunicorn" as the application entry point. For information on how to launch a service, see README.md.

App package (app/ directory)

The app package is my application package. The logic for the application is underneath this directory. As I mentioned earlier, I have chosen to break the app into a collection of smaller modules rather than use a single, monolithic module file.

The following four usable modules defined in this package are:

Note: app.views and app.models do not provide anything and their __init__.py files are empty.

App module

The app module has the job of creating the Flask application. It exports a single function, create_app(), that will create a Flask application object and configure it. Currently it initializes application blueprints that correspond to my application views. Eventually, create_app() will do other things such as initialize logging, but I'm skipping that now for clarity and simplicity.

App/init.py

from flask import Flask

def _initialize_blueprints(application):
    '''
    Register Flask blueprints
    '''
    from app.views.s3 import s3
    application.register_blueprint(s3, url_prefix='/api/v1/s3')

def create_app():
    '''
    Create an app by initializing components.
    '''
    application = Flask(__name__)

    _initialize_blueprints(application)

    # Do it!
    return application
Copy

This module is used by threatstack-to-s3.py to start the application. It imports create_app() and then uses it to create a Flask application instance.

Threatstack-to-s3.py

#!/usr/bin/env python
from app import create_app

# Gunicorn entry point.
application = create_app()

if __name__ == '__main__':
    # Entry point when run via Python interpreter.
    print("== Running in debug mode ==")
    application.run(host='localhost', port=8080, debug=True)
Copy

Views and Flask blueprints

Before discussing the remaining three modules, I'll talk about what views and Flask blueprints and then dive into the app.views.s3 module.

Views: Views are what the application consumer sees. There's no front end to this application, but there is a public API endpoint. Think of a view as what can and should be exposed to the person or thing (e.g., the consumer) who is using this application. The best practice is to keep views as simple as possible. If an endpoint's job is to take data in and copy it to S3, make it perform that function, but hide the details of how that was done in the application models. Views should mostly represent the actions a consumer wants to see happen, while the details (which consumers shouldn't care about) live in the application models (described later).

Flask Blueprints: Earlier I said that I am going to use a Packages and Blueprints layout instead of a single module application. Blueprints contain a portion of my API endpoint structure. This lets me logically group related portions of my API. In my case, each view module is its own blueprint.

Learn more

Modular Applications with Blueprints documentation on the Flask website.

Explore Flask is a book about best practices and patterns for developing web applications with Flask.

App.views.s3 module

The threatstack-to-s3 service takes Threat Stack webhook HTTP requests in and stores a copy of the alert data in S3. This is where I store the set of API endpoints that allow someone to do this. If you look back at app/__init__.py, you will see that I have rooted the set of endpoints at /api/v1/s3.

From app/init.py:

    from views.s3 import s3
    app.register_blueprint(s3, url_prefix='/api/v1/s3')
Copy

I used this path for a few reasons:

API: To note that this is an API and I should not expect a front end. Maybe one day I'll add a front end. Probably not, but I find this useful mentally and as a sign to others
V1: This is version 1 of the API. If I need to make breaking changes to accommodate new requirements, I can add a v2 so that two APIs exist as I migrate all consumers over to the new version
S3: This is the service I'm connecting to and manipulating. I have some freedom here to name this portion of the path whatever I want, but I like to keep it descriptive. If the service was relaying data to HipChat, for example, I could name this portion of the path hipchat

In app.views.s3, I am providing a single endpoint for now, /alert, which represents the object I'm manipulating, and that responds only to the HTTP POST request method.

Remember: When building APIs, URL paths should represent nouns and HTTP request methods should represent verbs.

App/views/s3.py

'''
API to archive alerts from Threat Stack to S3
'''

from flask import Blueprint, jsonify, request
import app.models.s3 as s3_model
import app.models.threatstack as threatstack_model

s3 = Blueprint('s3', __name__)


@s3.route('/alert', methods=['POST'])
def put_alert():
    '''
    Archive Threat Stack alerts to S3.
    '''
    webhook_data = request.get_json()
    for alert in webhook_data.get('alerts'):
        alert_full = threatstack_model.get_alert_by_id(alert.get('id'))
        s3_model.put_webhook_data(alert)
        s3_model.put_alert_data(alert_full)

    status_code = 200
    success = True
    response = {'success': success}

    return jsonify(response), status_code  
Copy

Now I'll walk through some key parts of the module. If you're familiar enough with Python, you can skip the next few lines on imports, but if you're wondering why I rename what I import, then follow along.

from flask import Blueprint, jsonify, request
import app.models.s3 as s3_model
import app.models.threatstack as threatstack_model  
Copy

I'm a fan of typing brevity and consistency. I could have done this the following way to import the model modules:

import app.models.s3
import app.models.threatstack
Copy

But that would mean I'd be using functions like:

app.models.s3.put_webhook_alert(alert)  
Copy

I could have done this as well:

from app.models import s3, threatstack
Copy

However, this would break when I create the s3 Blueprint object a few lines later because I'd overwrite the s3 model module.

s3 = Blueprint('s3', __name__) # We've just overwritten the s3 module we imported.  
Copy

For these reasons, importing the model modules and renaming them slightly is just easier.

Now I'll walk through the app endpoint and function associated with it.

@s3.route('/alert', methods=['POST'])
def put_alert():
    '''
    Archive Threat Stack alerts to S3.
    '''
Copy

The first line is called a decorator. I'm adding a route to the s3 Blueprint called /alert (which expands to /api/v1/s3/alert) that when an HTTP POST request is made to it will cause put_alert() to be called.

The body of the function is pretty simple:

Get the request's JSON data
Iterate over the array in the alerts key
For each alert:
- Retrieve the alert detail from Threat Stack
- Store the alert info in the request in S3
- Store the alert detail in S3

    webhook_data = request.get_json()
    for alert in webhook_data.get('alerts'):
        alert_full = threatstack_model.get_alert_by_id(alert.get('id'))
        s3_model.put_webhook_data(alert)
        s3_model.put_alert_data(alert_full)
Copy

Once that's done, I return a simple JSON doc back, indicating the success or failure of the transaction. (Note: There's no error handling in place, so of course I've hardcoded the success response and HTTP status code. I'll change that when error handling is added at a later date.)

    status_code = 200
    success = True
    response = {'success': success}

    return jsonify(response), status_code
Copy

At this point, I've satisfied my request and done what the consumer requested. Notice that I haven't included any code demonstrating how I fulfilled the request. What did I have to do to get the alert's detail? What actions did I perform to store the alert? How are the alerts stored and named in S3? The consumer doesn't really care about those details. This is a good way to think about organizing your code in your own service: What the consumer needs to know about should live in your view. The details the consumer doesn't need to know should live in your model, which I am about to cover.

Before discussing the remaining modules, I'll talk about models, which are how to talk to the services I'm using, such as Threat Stack and S3.

Models

Models describe "things," and these "things" are what I want to perform actions on. Typically, when you search for help on Flask models, blogs and documentation like to use databases in their examples. While what I'm doing right now isn't far off, I'm just storing data in an object store instead of a database. It's not the only sort of thing I might do in the future with the data received from Threat Stack.

Additionally, I've chosen to skip an object-oriented approach in favor of a procedural style. In more advanced Python, I would model an alert object and provide a means of manipulating it. But this introduces more complexity than is needed for the given task of storing data in S3 and also makes the code more complicated for demonstrating a simple task. I've chosen brevity and clarity over technical correctness for this.

App.models.threatstack Module

The app.models.threatstack module, as you can guess, handles communication with Threat Stack.

'''
Communicate with Threat Stack
'''
import os
import requests

THREATSTACK_BASE_URL = os.environ.get('THREATSTACK_BASE_URL', 'https://app.threatstack.com/api/v1')
THREATSTACK_API_KEY = os.environ.get('THREATSTACK_API_KEY')

def get_alert_by_id(alert_id):
    '''
    Retrieve an alert from Threat Stack by alert ID.
    '''
    alerts_url = '{}/alerts/{}'.format(THREATSTACK_BASE_URL, alert_id)

    resp = requests.get(
        alerts_url,
        headers={'Authorization': THREATSTACK_API_KEY}
    )

    return resp.json()
Copy

Just a quick run through of a few spots of note:

THREATSTACK_BASE_URL = os.environ.get('THREATSTACK_BASE_URL', 'https://app.threatstack.com/api/v1')
THREATSTACK_API_KEY = os.environ.get('THREATSTACK_API_KEY')
Copy

I don't want to keep the Threat Stack API in my code. This is just good clean code/security living. I'm going to get the API key from my environment for now because it's a quick and simple solution. At some point, I should centralize all configuration in a single file instead of hiding it here, so the code and setup are a little cleaner. That's a job for another time, and for now the setup is documented in README.md.

def get_alert_by_id(alert_id):
    '''
    Retrieve an alert from Threat Stack by alert ID.
    '''
    alerts_url = '{}/alerts/{}'.format(THREATSTACK_BASE_URL, alert_id)

    resp = requests.get(
        alerts_url,
        headers={'Authorization': THREATSTACK_API_KEY}
    )

    return resp.json()
Copy

The get_alert_by_id() function takes an alert ID, queries the Threat Stack platform for the alert data, and returns that data. I'm using the Python requests module to make an HTTP GET request to the Threat Stack API endpoint that returns alert info for the given alert.

Read the Threat Stack API documentation.

App.models.s3 Module

The app.models.s3 module handles connectivity to AWS S3.

'''
Manipulate objects in AWS S3.
'''
import boto3
import json
import os
import time

TS_AWS_S3_BUCKET = os.environ.get('TS_AWS_S3_BUCKET')
TS_AWS_S3_PREFIX = os.environ.get('TS_AWS_S3_PREFIX', None)

def put_webhook_data(alert):
    '''
    Put alert webhook data in S3 bucket.
    '''
    alert_time = time.gmtime(alert.get('created_at')/1000)
    alert_time_path = time.strftime('%Y/%m/%d/%H/%M', alert_time)
    alert_key = '/'.join([alert_time_path, alert.get('id')])
    if TS_AWS_S3_PREFIX:
        alert_key = '/'.join([TS_AWS_S3_PREFIX, alert_key])

    s3_client = boto3.client('s3')
    s3_client.put_object(
        Body=json.dumps(alert),
        Bucket=TS_AWS_S3_BUCKET,
        Key=alert_key
    )

    return None

def put_alert_data(alert):
    '''
    Put alert data in S3.
    '''
    alert_id = alert.get('id')
    alert_key = '/'.join(['alerts',
                          alert_id[0:2],
                          alert_id[2:4],
                          alert_id
                          ])

    if TS_AWS_S3_PREFIX:
        alert_key = '/'.join([TS_AWS_S3_PREFIX, alert_key])

    s3_client = boto3.client('s3')
    s3_client.put_object(
        Body=json.dumps(alert),
        Bucket=TS_AWS_S3_BUCKET,
        Key=alert_key
    )

    return None
Copy

I'll walk through the interesting parts:

TS_AWS_S3_BUCKET = os.environ.get('TS_AWS_S3_BUCKET')
TS_AWS_S3_PREFIX = os.environ.get('TS_AWS_S3_PREFIX', None)
Copy

Again, there's no config file for this app, but I need to set an S3 bucket name and optional prefix. I should fix this eventually—the setup is documented in the README.md, which is good enough for now.

The functions put_webhook_data() and put_alert_data() have a lot of duplicate code. I haven't refactored them because it's easier to see the logic before refactoring. If you look closely, you'll realize that the only difference between them is how the alert_key is defined. I'll focus on put_webhook_data():

def put_webhook_data(alert):
    '''
    Put alert webhook data in S3 bucket.
    '''
    alert_time = time.gmtime(alert.get('created_at')/1000)
    alert_time_path = time.strftime('%Y/%m/%d/%H/%M', alert_time)
    alert_key = '/'.join(['webhooks', alert_time_path, alert.get('id')])
    if TS_AWS_S3_PREFIX:
        alert_key = '/'.join([TS_AWS_S3_PREFIX, alert_key])

    s3_client = boto3.client('s3')
    s3_client.put_object(
        Body=json.dumps(alert),
        Bucket=TS_AWS_S3_BUCKET,
        Key=alert_key
    )

    return None
Copy

This function takes in a single argument named alert. Looking back at app/views/s3.py, alert is just the JSON data that was sent to the endpoint. Webhook data is stored in S3 by date and time. The alert 587c0159a907346eccb84004 occurring at 2017-01-17 13:51 is stored in S3 as webhooks/2017/01/17/13/51/587c0159a907346eccb84004.

I start by getting the alert time. Threat Stack has sent the alert time in milliseconds since the Unix epoch, and that needs to be converted into seconds, which is how Python handles time. I take that time and parse it into a string that will be the directory path. I then join the top-level directory where I store webhook data, the time-based path, and finally the alert ID to form the path to the webhook data in S3.

Boto 3 is the primary module in Python for working with AWS resources. I initialize a boto3 client object so I can talk to S3 and put the object there. The s3_client.put_object() is fairly straightforward with its Bucket and Key arguments, which are the name of the S3 bucket and the path to the S3 object I want to store. The Body argument is my alert converted back to a string.

Wrapping up

What I have now is a functional Python Flask web service that can take a Threat Stack webhook request, get the alert's detail, and archive it in S3. It's a great start, but there's still more to be done for this to be production ready. Immediately you might be asking, "What happens if something goes wrong?" There's no exception handling to deal with issues such as communication failures with Threat Stack or S3. I intentionally omitted it to keep the code clear. There's also no authorization key checking. This means that anyone can send data to it. (And since I don't do any error checking or exception handling, they can crash the service.) There's also no TLS encryption handling. That's something I'd leave up to Nginx or Apache, which would be the webserver fronting this application. All these and more are issues you need to tackle before putting this web service into production. But for now this is a start that should help you become more comfortable as you start building your own services.

Resources

View the GitHub repository for Threat Stack to S3 service.

Because the application goes through revisions, review the version used in this article.

Check out Tom's new tutorial on exception handling in Python Flask.

This article originally appeared on the Threat Stack blog. Reposted with permission.

Comments are closed.

This work is licensed under a Creative Commons Attribution-Share Alike 4.0 International License.