Interview with David Smith of Revolution Analytics
Big data influencer on how R is paving the way
The R programming language is used for data visualization and expermiental analysis for the likes of Facebook and has a rapidly growing user base of more than two million. What began in 1995 as an open source academic research tool has evolved for use among commercial and industrial businesses around the world.
Revolution Analytics supports the R community and the ever-growing needs of commercial users. Recently named a top 10 influencer on the topic of Big Data, I asked David Smith, the Chief Community Officer at Revolution Analytics, to share with me what keeps this programming language ticking. Though R has been around since the 90s, released in 1995 as under GPLv2 by two statistics professors looking to develop a new language for statistical computing, a new breath of life has energized a rowdy team of innovators around R.
David tells me that developers have contributed more than 6000 packages to extend R’s capabilities. Read more in this interview.
How is R used to implement data science applications at commercial institutions like Facebook?
R has been adopted at just about every company that has a data science team or where statistical analysis is a strategic initiative. One of the reasons is that R is ubiquitous in academia: anyone who learns statistics or data science at university has learned R, so for companies that are hiring new graduates it makes sense to use R as a platform. In any industry that deals with data, R is used for exploratory data analysis, data visualization, experimental design, statistical modeling and forecasting... just about any kind of advanced data analysis, in fact. Facebook uses R mainly for data visualization and experimental analysis, and I’ve created a list of R applications at several other companies.
How are the goals set by CEO Dave Rich and CTO Greg Todd innovative and different?
First and foremost, Dave and Greg are focused on bringing the R language to companies around the world. More and more companies today see data science as a strategic imperative, as the means to unlock value from the data they have collected. We’re helping those companies innovate with the R language: we provide big-data capabilities and integration frameworks for R; and the technical support, training and consulting services these companies need to make their data-driven applications work. As for the future, we’re riding the wave of R’s continued growth, and bringing Revolution R Enterprise to the cloud and to new database and Hadoop platforms.
What does your day look like as you lead the open source solutions group at Revolution Analytics?
Recently, I’ve been busy launching AdviseR, our new technical support service for open source R users. On a typical day I’ll have a couple of meetings with our open source development team, who work on our community projects like RHadoop. We’re also working on some new projects, to bring some of the proprietary components of Revolution R Enterprise to open source. Evangelism for the R project is also a big part of my job: I’ll meet with companies several times a week to introduce them to R, and I also post daily to the Revolutions blog with resources and applications related to R.
Forbes named Revolution Analytics a top 10 influencer on the topic of Big Data, so tell us: What is the future of big data? Are most companies doing it right?
Some companies figured out big data years ago; other companies are now realizing that data is one of the most valuable assets a company can have. In all cases though the demand for data scientists is growing rapidly, as companies recognize that making the best use of all that data is the key to being successful in a competitive marketplace. (That’s one reason why salaries for R programmers are at a premium right now.) It’s kept us all very busy at Revolution Analytics as companies look for help in adopting and using R to become data-driven organizations.
Chief Data Scientist Sue Ranney produced ExaStat, an open source environment for analyzing huge data sets. What's she working on at Revolution Analytics and what's her next big idea?
Sue continues in her passion for developing fast, portable, scalable analytics. In the past year she has been working with the team to provide high performance in-database analytics, both in-Hadoop and in-Teradata, using our ‘big data’ R package, RevoScaleR. She is currently involved in a project with our Chief Scientist, Lee Edlefsen, to provide a framework for R programmers to easily write their own portable, scalable analytics. The goal is to be able to write and test R code for customized analytics on your desktop, and then be able to deploy the same code to automatically run in parallel on big data on a cluster of computers.
What do you see in the future for the R project?
With the increasing demand for data scientists recently, R has been growing rapidly, and I expect that to continue. On the technical side, the contributions to R show no sign of slowing, and so R will continue to be the place to find the leading edge in statistical analysis, data mining, and data science. I’m really proud to see the R project continue to be so successful!
Revolution Analytics was recently named a visionary in Advanced Analytics Platforms by Gartner.