SCaLE 14x interview with Don Marti, ScyllaDB

ScyllaDB: Cassandra compatibility at 1.8 million requests per node

ScyllaDB: Cassandra compatibility at 1.8 million requests per node
Image by : 

Cory Doctorow. Modified by Opensource.com. CC BY-SA 2.0.

x

Get the newsletter

Join the 85,000 open source advocates who receive our giveaway alerts and article roundups.

ScyllaDB is designed to be a resilient NoSQL database and is currently in beta testing. It is designed from the ground up to take advantage of multiple core systems and to provide very high performance.

Don Marti, techical marketing manager for ScyllaDB, co-founded the Linux consulting firm Electric Lichen. He is a strategic advisor for Mozilla and has previously served as president and vice president of the Silicon Valley Linux Users Group and on the program committees for USENIX, CodeCon, and LinuxWorld Conference and Expo.

Don talked to us in advance of SCaLE 14x in Pasadena, California, where he'll be giving talks on ScyllaDB, JavaScript, and turning software demos into software projects.

How did you get started with ScyllaDB?

I have been involved with Linux and open source since the mid-1990s, and Scylla is a natural progression for open source to move up the stack and provide more value for some of the most demanding companies out there. The problems involved in running a resilient database are some of the hardest and most valuable in IT today.

What is the current state of ScyllaDB?

Scylla is currently in beta and on track for a GA release at the end of January. Beta releases (including Amazon AMIs) are available now.

Please describe the ScyllaDB "share nothing" architectural concept and why it was necessary in order to achieve the performance you describe.

Scylla runs one thread of execution per core, and cores only communicate by message passing—using a dedicated pair of queues for each pair of cores on the system. This means that Scylla can avoid costly locking. For example, all memory allocation in Scylla is handled locally to the core that needs the memory, so no core that needs memory ever has to wait for a lock.

Why is it needed? One word: multicore. Today's processor designers are giving us more and more cores, which means that we have to rethink software design both at the kernel and userspace levels. Kernel developers have been removing contention between cores for years, but most NoSQL databases are still adapted to the hardware assumptions of the 1990s—threads and locks everywhere. (It's not a coincidence that many of the Scylla developers come from a kernel background.) The end result is a NoSQL database with the functionality and resilience properties of Apache Cassandra, but with an order of magnitude more throughput per node.

How many total nodes can be combined into a ScyllaDB engine? How many nodes are actually practical?

The architecture and communication between nodes are based on Apache Cassandra, which can handle tens of thousands of nodes in multiple data centers. Because Scylla offers the same design with lower latency—and a simpler, more reliable native software stack—it should be possible to run Scylla clusters that are even larger. However, a typical Scylla cluster can be a tenth the size of a Cassandra cluster and supply the same throughput. In most cases we see that you can do more with fewer nodes, for example replacing a 1,000-node Cassandra cluster with a 100-node Scylla cluster.

What is the primary performance chokepoint with this architecture and how do you foresee it being overcome?

Scylla uses the same on-disk storage format as Cassandra to make migration easy. Because the on-disk format is designed for 100% compatibility, not for maxed-out performance, it's probably the slowest part of the design.

At this point, though, Scylla is already so fast that raw performance issues are less important than other enhancements. Raw performance is even higher than we anticipated at the start of the project, so we have some time to focus on customer feature requests.

What is the next step for ScyllaDB?

Right now we're focused on early pilot customers who have been evaluating Scylla, mostly as an upgrade path from Cassandra but also as an alternative to other databases.

What do you hope to accomplish with your talk at SCaLE 14x?

The takeaway is that you can do a short series of commands and have a fast, resilient database running right away on the cloud or your own server. Instead of thinking about tweaking garbage collection and other complex DevOps tasks, you can focus on your project.

Where can our readers go to learn more about ScyllaDB?

Visit one of our getting started pages to get Scylla running in your environment of choice: Amazon AWS, Docker, or your own server with RPM or deb packages.

About the author

David Both - David Both is a Linux and Open Source advocate who resides in Raleigh, North Carolina. He has been in the IT industry for over forty years and taught OS/2 for IBM where he worked for over 20 years. While at IBM, he wrote the first training course for the original IBM PC in 1981. He has taught RHCE classes for Red Hat and has worked at MCI Worldcom, Cisco, and the State of North Carolina. He has been working with Linux and Open Source Software for almost 20 years. David has written articles for...