Head of Open Source at Facebook opens up

No readers like this yet.
Neon sign: Internet


What is seen hereafter is a partial transcription of James Pearce's OSCON session Rebooting Open Source at Facebook.

For hundreds of years, open has trumped closed—sharing has trumped secrecy.

In a humble way, this informs our program at Facebook. We have 200 active projects at Facebook, with 10 million lines of code. Many hundreds of engineers working on these, with over 100,000 followers and 20,000 forks. We contribute to a wide range of projects (i.e. The kernel, mercurial, D, etc). We've even open sourced the designs of our data centers and machines in the open computer project. We want to share a collection of things we've learned along the way.

Why is this so important?

The reason, open source is dorm-room friendly. Our roots stretch to a young undergrad in 2004 who picked the FOSS (free and open source) software that was available, the classic lamp stack. Our capacity to participate in communities to make a better place has increased.

When we find a piece of open source software (OSS), we first try to scale that, and then find the limitations of a project. So we try to improve them and make them work in scaled environments, and we see this pattern happening over and over again. Mark's decision to use PHP, for instance, had limitations. We built the HipHop "compiler" HHVM project, and even more recently, the PHP enhanced language called Hack, launched back in March. Data, web, infra, front-end, all of our technology stack. It is closely aligned with our hacker culture, and how our organization was perceived. We asked our employees...

"Were you aware of the open source software program at Facebook?"

  • 2/3 said "Yes"
  • 1/2 said that the program positively contributed to their decision to work for us

These are not marginal numbers, and I hope, a trend that continues.

A large number of those people said their experience using our projects in the open helped them get ramped up prior to being hired. That is a huge win for our company.

This is important part of why open source is valuable to our company. And you need to be able to articulate the value.

#0: Always articulate the value FOSS brings to your company

There are always costs and investments, so understand what your return is. Naive ideology only goes so far, you need data to support continuation. We're confident it helps us do a better job. It helps us keep our tech fresh, justify architectural decisions, bring more eyes to our code. Open source is like the breeze from an open window; it keeps things from going stale.

But, if you wind the clock back a year, you'd find this three20 project, which has been discontinued... Our PHP SDK... deprecated. Our fork of Memecache, with a description of "test" and commit messages of "5" "6" and "7"...

*audience laughter*

This is the "throw it over the wall" syndrome. We're guilty of this, I'm sad to say, and it is almost worse than not doing it at all.

You need to continue to care about the things you release, or how can you expect others to care about them?

#1: Use your own open source

It is essential to continue using the version you release. Don't create internal forks, keep the code fresh, keep working on it. The community will notice if you don't. Eat your own dogfood.

Sometimes you'll have to integrate your open source code with closed/proprietary tools internally. It usually means you create plugins, or adapters, and make architecture decisions that make your project better. Presto, we needed it to integrate with both open and internal databases. We had a strong plugin architecture, and plugins for open databases, and then plugins for our internals.

Nevertheless, we weren't doing that well last year. We decided to refresh our team and get our house in order. At that time, our web team open sourced React at JSConf. React is one of the most exciting projects in the Javascript world in the past years, with a great community response. It reminded us at Facebook that we knew how to get great projects out there. That initiative came from the developers themselves. There was no promotional team internally; they came directly from engineers.

#2: Decentralize project ownership

Make sure the engineers are the sole custodians. External engineers work with internal engineers directly. No monolithic structure. As we looked at the reboot, we needed to figure out what we already had, and getting the portfolio under control.

We needed to answer 3 key questions:

  1. Which projects did we own?
  2. Who contributes?
  3. How healthy are they?

Most were on Github. Github of course had a great API, so we wrote a script (in hack) to access and enumerate over projects, and get:

  • every repository
  • every commit
  • every pull request
  • every issue

So we stored all this data, and put it into MySQL.

I love Github, but I find it easier to use SQL to filter what is going on. We found some things to address. We realized we could do this import process again and again, and see how trends evolve over time. I am now one of the world's experts in the Github API throttling mechanism, and we've got it running very efficiently. All of this is to implement two things: instrumentation and publishing.

#3: Invest in instrumentation

We now have time series data and can create metrics. This is Argus and shows the total number of watchers over time. Up to 100,000 followers, and polling every minute, we can watch over time, and we can find inflection points which GitHub didn't have. We launched an iOS library called Shimmer, and then tweaked it, and those surges can be seen after investing in the iOS community. Being able to monitor and publish data and progress, it shows that we are being disciplined, and can get respect via empirical data.

We have over 35 metrics we follow.

Five most important metrics:

  • Average number of Followers
  • Number of Forks per repository
  • Average Pull Request age
  • Average Issue age
  • Number of External commits

#4: Invest in tools

Mostly internally, to help teams run projects. These are internal dashboards, visible by everyone at company. Everyone is aware of the metrics we follow internally. "Big" views on all projects, which ones are doing well/badly, and you can drill down and see the owner for each project. They are clearly defined, an employee, and can assign tasks to directly. I can hassle them, and also, if they ever leave (as it did with Tornado) we can find new stewards for the project. For tornado, we transfered ownership to the community. We have engineers associate their Facebook profile with Github profile via oAuth. We can then track who contributes, whether internal or external. This workflow unlocked so much valuable data about what is going on.

#5: Establish ownership

Don't let projects be orphaned, or flap in the wind. We can show graphs/metrics scoped to projects or teams. Individual teams set quarterly/semesterly goals for themselves often. That social pressure helps projects do well.

#6: Gamification of good behavior

We have teams competing now. React and the iOS pop project have about the same number of followers, and there is a bit of a space-race to get the most followers. In the absence of managing projects directly, you can influence projects. We don't want engineers spinning wheels with lawyers, wasting time. We want them to do it with discipline.

  • How core to Facebook is this technology?
  • Who will use it, who is it useful to, how valuable is it?
  • What else already exists, that is similar to this technology?
  • Is there anything novel in the project?
  • Does it include third party, including third-party open source?
  • Who will maintain the project, accept contributions, and liaise with community?
  • Where/how should project be distributed?
  • What is public release date?

We have a very strong template for licensing, we stick to BSD, occasionally Apache, or Boost, and the only reason we'd look at other licenses, is when target community has a strong culture of using that license. We don't impose licenses unfamiliar to a community.

#6.5: Choose your lawyers wisely

We have a linter to make sure license headers are all there, and everything is good to go, in a private repository. Then we release mid-week, tweet about it, do a Facebook open source social media blast, and then post it to the code blog. Then the social media magic takes holds, and we get good momentum on the first day. We have an internal group of 600-700 employees interested in FOSS.

Every Friday, Mark gathers everyone at 4pm for Q&A. At the start of the session, Mark talks about new apps/products/releases. He's taken to announcing our OSS projects in these meetings, and you can only imagine how motivating that is. Knowing the CEO is aware of a project, and announces it to whole company. Much comes from Infrastructure teams, and that is a huge boost for them. I get a huge surge in interest after Mark talks about them.

#7: Launch is only step zero

You have to know how to continue keeping it successful. I look at the number of followers over time. We can see the bumps of interest over the first week, and a gradual slope over time. It is the gradient of the second half, not just what happens on the fist day.

Some exceptional cases:

  • fb-flo and origami beat this curve; flo was released at a JavaScript conference, tripled their community; face-to-face PR hugely grows FOSS success
  • KVO Controller did two week intervals and saw strong growth after each session; practice makes perfect
  • our climax was the release of Pop, which blew everything away; got 4,000 followers on the first day, 6,000 in the first week, and is way north of 7,000 now

Obviously we benefit with reputation, but the success was built on the success of previous iOS projects. Pop had a closed beta for two weeks before launch. Out of the gate, we had a strong pick-up. Our closed betas were best advocates, helped early growth. The reaction from the iOS community was strong.

We encourage major projects to have their own website. Our design teams have built entire sites for Origami. It shows you care, and take care of your project.

We have IRC, Facebook groups/pages, meetups, and hackathons. It all is important; and it all works.

We have one technique, called a community round-up. The React.js team will gather all the mentions, all the projects, all the demos/presentations, and then shows them to the rest of the community, not just at Facebook. This gives extra authenticity.

The first couple weeks of external commits are vital! In the first day, you'll get a swath of PR'S, most will be typo fixes in documentation. This is not a bug, it shows that people are feeling comfortable.

#8: Leave breadcrumbs

Docs, unimplemented features, to-dos. As projects go on, they change their destiny. There are many paths: Snapshot, Upstream, Flythenest, Deprecate, Reboot.

Snapshots: usually read-only, academic exercises; many are created to get upstream FBThrift is a good example of this

Upstream: we teamed up with Twitter and Linkedin to get changes upstream in WebscaleSQL

Flythenest: project goes on to become "it's own thing;" some of our major projects will have this, and then we'll eventually become "just be a user" like everyone else

deprecate: project served useful purpose, and finishes

Reboot: project starts over again

#9: Understand OSS project lifecycles

We launched 65 new projects in the last couple months. That's about 2.5 projects per week. It is more about quality than quantity, but each has a goal. The is a variety of types of projects; mobile, infrastructure, and programming languages. All are very broad.

Metrics June 2013 July 2014
Total Repos 129 202
Followers 50.1K 97.6K
Forks 11.8K 20.7K
Pull-requests 1400 (502 days) 1973 (208 days)
Issues 404 (323 days) 427 (186 days)
Commits 30.7K 42.4K

#10: Be open and connected

It has been a pleasure to share our journey with you today.

Q: In the Facebook license, it looked like "for more information."

A: Straightforward BSD license, and a patent grant. We have a patent grant for the developers, same as what happens in the Apache License.

Q: Does Facebook have a Contributor's License Agreement (CLA)?

A: We didn't have slide for the CLA, but it is basically the Apache CLA. It is so users that contributions that came from external contributors were theirs to give. We then have a bot that comes around to do a Github auth. Exactly the same as the Google/Apache process.

Q: Have we open sourced the GitHub scripts?

A: I knew someone was going to ask that! We'll share as much of that as we can soon.

What is your Background?

Name: James Pearce
Title: Open Source Program Lead
Twitter: @jamespearce
Link: code.facebook.com/projects

Been in the tech industry for years, mostly in mobile. Worked on early early mobile tech, when it was called "WAP." I've been waiting for it to become the next big thing, and it finally has. I joined Facebook about three years ago, working on Mobile Developer relations, talking about app integration.

When it came to open source software, it was serendipity. We saw it needed love, and here I am. I'm still learning a lot as I go along. We try to federate as much activity as we can, and make it as light touch as possible. We're doing better than we were, but we've got a long way to go. We've got lots of projects, but we want to do more, work with more communities, and think more about how we provide stewardship over time.

How do we do more in mobile? We have lots to offer in Android, and we want to continue to run the program as efficiently as possible.

How can people get involved?

Check out our careers site. All our open source projects are on GitHub, we're friendly, and we're responsive when people send pull-requests.

This derivative work by Remy Decausemaker is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

User profile image.
At the Fedora Project Remy served as Community Action and Impact Lead, bringing more heat and light to the distro's user and contributor base.

Comments are closed.

Creative Commons LicenseThis work is licensed under a Creative Commons Attribution-Share Alike 4.0 International License.