The evolution of the big data platform at Netflix

No readers like this yet.
A network of people

Eva Tse headshot from LinkedInEva Tse will be speaking at this year's OSCON about her experiences at Netflix in her talk entitled The evolution of the big data platform at Netflix.

I caught up with Eva to get a bit of a background on her, Netflix, and how open source is being used to improve services at Netflix. Not only has Netflix used and contributed to existing open source projects, but they have released their own projects like Genie as open source. To learn more about Netflix's open source projects you can pursue their GitHub. Be sure to read all the way through to learn the answer to the most important question of all: Eva's favorite Netflix original series!

Tell us about yourself and your background.

I manage the Big Data Platform team at Netflix. My team architects and manages the Netflix big data platform in the AWS cloud. Prior to Netflix, I had experience in building data warehousing product for Informatica's flagship product PowerCenter. I led the server and metadata service teams for PowerCenter.

I am naturally attracted to solving large-scale data-related problems. Data is one of the most important assets for a business. For a fast-growing Internet-scale business like Netflix, it is challenging to make data available to decision makers in both a timely and reliable manner. I enjoy the challenge of getting ahead of the tidal wave of data and making sense out of it!

How did you first get involved in open source?

When I first joined Netflix, we wanted to really understand the streaming experience of our customers. With the growing demand of our streaming service and our plan of eventually going global, we needed to build a data analytics platform that could serve our needs for many years to come. We recognized the momentum and saw the advantages of leveraging Apache Hadoop as our distributed data processing engine. We decided to try it, and the rest is history.

Throughout the last few years, we've continued to evaluate and adopt different open source technologies around the Hadoop ecosystem. We also see an opportunity to open source and share the service layers we built in our big data platform. So, we started open sourcing some of these components in Netflix OSS like Genie, Inviso, Lipstick, etc.

What is your favorite (non-Netflix) open source product? And your favorite Netflix OSS product?

We like all the ones we've leveraged in our big data platform i.e., Presto, Spark, Hive Pig, Hadoop, and Parquet. These are all great technologies. Each of these projects serve a specific need that we have and each has a very vibrant and passionate open source community.

However, Hadoop is my favorite. It is a key enabler to the industry movement to open source big data technologies. It has also come a long way from a map-reduce execution engine to a resource management system that can host different data processing engines.

My favorite Netflix OSS project in the big data space is Genie. Genie helps abstract the details of running data processing jobs from our users and expose our platform as a service to the rest of the company. It is a simple and yet powerful concept. It is a key component in our architecture. We spoke to a couple companies about Genie and they are incorporating it in their big data environment!

How long has Netflix been contributing to open source projects? What advice would you have to other companies out there that don't see the value in using and contributing to open source?

In general, our philosophy is to contribute back to open source projects that we leveraged. It simply makes sense to contribute back so that these projects continue to evolve and the community also benefits.

In my talk, we will discuss some of the benefits we see in using and contributing to open source. In general, there are a lot of great open source technologies that are built by really talented and passionate engineers. Instead of building something from scratch, it is likely that someone has already solved the same problem you have so you could build or improve upon that. Open source is like crowdsourcing of engineering brain power, which is extremely powerful.

Can you give us the a sneak peak into your OSCON 2015 talk?

In this talk, I will focus on how we tackle the scale issues we have in our big data platform beyond just data volume. We will talk about our overall architecture and all of the open source projects that we pull together to build our platform. We will also take a look at how each open source projects serves a specific need in our stack and our philosophy in leveraging and contributing back to these open source technologies.

Just for fun, what's your favorite Netflix original series?

This is a tricky question. Obviously, they are all very good. But I like Unbreakable Kimmy Schmidt the best. I like her invincible attitude :)

Speaker Interview

This article is part of the Speaker Interview Series for OSCON 2015. OSCON is everything open source—the full stack, with all of the languages, tools, frameworks, and best practices that you use in your work every day. OSCON 2015 will be held July 20-24 in Portland, Oregon..

User profile image.
Nicole C. Baratta (Engard) is a Senior Content Strategist at Red Hat. She received her MLIS from Drexel University and her BA from Juniata College. Nicole volunteers as the Director of ChickTech Austin. Nicole is known for many different publications including her books “Library Mashups", "More Library Mashups", and "Practical Open Source Software for Libraries".

1 Comment

Read Netflix contract.
Your viewing history and preferences become property of Netflix and Netflix reserve the right to sell this data.
Never mind about peoples privacy ...big thumbs up to Netflix they use Open source.

Creative Commons LicenseThis work is licensed under a Creative Commons Attribution-Share Alike 4.0 International License.