ScyllaDB: Cassandra compatibility at 1.8 million requests per node

No readers like this yet.
Server room

Cory Doctorow. Modified by Opensource.com. CC BY-SA 2.0.

ScyllaDB is designed to be a resilient NoSQL database and is currently in beta testing. It is designed from the ground up to take advantage of multiple core systems and to provide very high performance.

Don Marti, techical marketing manager for ScyllaDB, co-founded the Linux consulting firm Electric Lichen. He is a strategic advisor for Mozilla and has previously served as president and vice president of the Silicon Valley Linux Users Group and on the program committees for USENIX, CodeCon, and LinuxWorld Conference and Expo.

Don talked to us in advance of SCaLE 14x in Pasadena, California, where he'll be giving talks on ScyllaDB, JavaScript, and turning software demos into software projects.

How did you get started with ScyllaDB?

I have been involved with Linux and open source since the mid-1990s, and Scylla is a natural progression for open source to move up the stack and provide more value for some of the most demanding companies out there. The problems involved in running a resilient database are some of the hardest and most valuable in IT today.

What is the current state of ScyllaDB?

Scylla is currently in beta and on track for a GA release at the end of January. Beta releases (including Amazon AMIs) are available now.

Please describe the ScyllaDB "share nothing" architectural concept and why it was necessary in order to achieve the performance you describe.

Scylla runs one thread of execution per core, and cores only communicate by message passing—using a dedicated pair of queues for each pair of cores on the system. This means that Scylla can avoid costly locking. For example, all memory allocation in Scylla is handled locally to the core that needs the memory, so no core that needs memory ever has to wait for a lock.

Why is it needed? One word: multicore. Today's processor designers are giving us more and more cores, which means that we have to rethink software design both at the kernel and userspace levels. Kernel developers have been removing contention between cores for years, but most NoSQL databases are still adapted to the hardware assumptions of the 1990s—threads and locks everywhere. (It's not a coincidence that many of the Scylla developers come from a kernel background.) The end result is a NoSQL database with the functionality and resilience properties of Apache Cassandra, but with an order of magnitude more throughput per node.

How many total nodes can be combined into a ScyllaDB engine? How many nodes are actually practical?

The architecture and communication between nodes are based on Apache Cassandra, which can handle tens of thousands of nodes in multiple data centers. Because Scylla offers the same design with lower latency—and a simpler, more reliable native software stack—it should be possible to run Scylla clusters that are even larger. However, a typical Scylla cluster can be a tenth the size of a Cassandra cluster and supply the same throughput. In most cases we see that you can do more with fewer nodes, for example replacing a 1,000-node Cassandra cluster with a 100-node Scylla cluster.

What is the primary performance chokepoint with this architecture and how do you foresee it being overcome?

Scylla uses the same on-disk storage format as Cassandra to make migration easy. Because the on-disk format is designed for 100% compatibility, not for maxed-out performance, it's probably the slowest part of the design.

At this point, though, Scylla is already so fast that raw performance issues are less important than other enhancements. Raw performance is even higher than we anticipated at the start of the project, so we have some time to focus on customer feature requests.

What is the next step for ScyllaDB?

Right now we're focused on early pilot customers who have been evaluating Scylla, mostly as an upgrade path from Cassandra but also as an alternative to other databases.

What do you hope to accomplish with your talk at SCaLE 14x?

The takeaway is that you can do a short series of commands and have a fast, resilient database running right away on the cloud or your own server. Instead of thinking about tweaking garbage collection and other complex DevOps tasks, you can focus on your project.

Where can our readers go to learn more about ScyllaDB?

Visit one of our getting started pages to get Scylla running in your environment of choice: Amazon AWS, Docker, or your own server with RPM or deb packages.

David Both
David Both is an Open Source Software and GNU/Linux advocate, trainer, writer, and speaker. He has been working with Linux and Open Source Software since 1996 and with computers since 1969. He is a strong proponent of and evangelist for the "Linux Philosophy for System Administrators."

Comments are closed.

Creative Commons LicenseThis work is licensed under a Creative Commons Attribution-Share Alike 4.0 International License.