At the BuzzFeed Open Lab, we've been thinking a lot about automated journalism. In particular, we'd like to build open source tools that can be used by newsrooms big and small to empower journalists instead of replace them. As a first small step in this direction, we've built a tool for monitoring RSS feeds in bulk that we're using internally to make the Securities and Exchange Commission (SEC)'s EDGAR system more accessible.
EDGAR, or the Electronic Data Gathering, Analysis, and Retrieval system, is the SEC's public filings database. For certain kinds of events, like when a company is going public and is required to disclose information to everyone at the same time, EDGAR is the source of truth.
We wanted an automated system to extract and process new information from EDGAR as it appears to save our journalists time and energy. We reviewed many of the existing tools that attempt to solve some of these problems, but ultimately came up short. There are many RSS readers, but few provide immediate notifications. IFTTT has a wonderful interface, but setting up recipes is time consuming and monitoring multiple feeds quickly becomes untenable.
Eventually, we decided to build an RSS watchdog tool we're calling RSS Puppy. It's designed to watch bulk collections of RSS feeds and emit events that other systems can listen for and act on. It runs using only a single instance of Node.js and a Postgres database for storage. It's also lightweight, so it can be deployed on any local server or cloud service provider. Finally, it's modular, so you can develop your own output handlers that do anything you need.
Our instance of RSS Puppy watches for filings from many different companies and sends out notifications to our journalists as soon as something happens. It also pulls up past versions of filings and highlights changes in the new ones, giving journalists a head start understanding developments. Control of the story remains in journalists' hands, but the system helps get some of the manual labor out of the way.
If you have RSS feeds that you'd like to monitor, getting started is as simple as setting up a database, cloning RSS Puppy, and changing a few fields in the sample config. You can find more details in the readme, and see an example of how we use Docker to manage our instance here.
Once those basics are in place, you can create output handlers that do anything you like. RSS Puppy will deal with periodically checking feeds and keeping track of old entries. When something new appears, it will invoke your handlers with the data. Handlers are written in JavaScript and can be single functions or large modules with lots of dependencies. We intend to keep developing output handlers for commonly used services, and we gladly welcome contributions on this front. So check out RSS Puppy and let us know if you build anything cool with it!
Comments are closed.