Interview with Frank Huerta and Mason Sharp on PostGres-XL
Postgres-XL released to tackle big data analytics and more
Sometimes it's best to go with what you know. For countless developers, database managers, analysts, and others who need to store data in a traditional relational database system, PostgreSQL is that system. But as the demands on databases grow, so too must the software which underlie them.
Modern databases need to bring concurrency across multiple systems, and users expect that data to be synchronized quickly, easily, and without fail. To do this with PostgreSQL requires a bit of teaching an old dog new tricks. Meet TransLattice, a company specializing in distributed SQL databases, which recently acquired StormDB, an innovator in modern database technology.
As a step to bring some modern enhancements to Postgres, TransLattice today announced the open sourcing of Postgres-XL based on the technology it acquired from StormDB. Postgres-XL is a clustered, parallel SQL database designed for both online transactional processing (OLTP) and big data analytics. I spoke with Frank Huerta, TransLattice’s CEO, and Mason Sharp, Chief Architect, to learn a little bit more about Postgres-XL and what it means for the open source community.
Tell me a little bit about the what you're excited about with Postgres-XL and how it came to be.
For Postgres-XL, what we’ve done is based on StormDB, and is based on Postgres. Postgres has a multi-version concurrency control model for database concurrency. To make the database consistent across the cluster of systems, that management was pulled out of the core database and made an additional component. That way, cluster-wide you always get a consistent view of the data, and doing that also allows for OLTP write scalability, so not just reads, but were we can scale out writes across multiple nodes. In addition, down at the data node level, we made some changes where they all interact with each other. We allow for massive parallel processing (MPP) processing, where queries can be parallelized. So we can use all of the resources in the cluster to process queries a lot faster.
Postgres is a great general-purpose database. There are other databases out there specializing in write-transactions, or people using Hadoop for processing large amounts of data. Postgres-XL takes this great general-purpose capability and scales it out so it can handle a variety of workloads in mixed-workload environments as well.
What are some sample types of applications or use cases for which someone might be interested in using Postgres-XL?
As we were emphasizing before, there is parallelizing for big data analytics, for example, financial services, or a security company that gets a lot of traffic data and wants a way to analyze that. Or for OLTP write-intesive workloads, for example, we're talking to an online ad company who do a lot of impressions and clickthroughs that they need to track. And in mixed workloads, you might have different kinds of activities going on.
You can also use it to consolidate data from a variety of sources, as in an operational data store. Having the Postgres heritage, JSON data types are actually supported, so it's doing a common thing people are using NoSQL databases for. Postgres actually has that built in so we're able to leverage that. We can similiarly give you a key value store across multiple servers without sacrificing consistency.
Also leveraging the Postgres heritage, there's PostGIS, which adds powerful geospatial capabilities. You can also plug that right in; if you have a lot of GPS data, a lot of location data, it can be compute intensive to go through all of that. Now you can leverage multiple servers in a cluster to process that data faster.
Also, we made it with the idea that if you have traditional hardware sources, and different applications using it, their workloads may vary at different times, and may be bursty, so you can have this cluster of machins that has multiple databases kind of sharded across these servers. At the same time we wanted to make it a little more secure, so Postgres out of the boxes you can get an idea of what the namespaces of the other databases and users are. So we locked that down to better support multi-tenancy, as well as adding in additional statistics and tracking for what's going on in all of the databases. This also makes it useful as a hosted database solution.
Why did you decide to open source this technology?
We thought it was important to contribute back to the community, a community that has fueled what we've been doing already here at TransLattice. We are going to take up some of this technology into our products anyway. We're engaging partners to provide services for this technology, and we'll be providing those services as well.
Is the hope that this will become a community project and that you'll get commits from users outside of your organization?
Yes, absolutely! There are already a couple of premiere Postgres consulting companies that have said that they are going to be contributing to this project. We've also had some conversations with members of the larger Postgres community, and it seems there are going to be some contributions for it. We have our roots in Postgres, so contributing code back is something that we're already doing.