Using Git and mailing lists time zones to find out where developers live

No readers like this yet.
Lots of people in a crowd.

Opensource.com

Where do the developers in my FOSS community live? For large open source communities where personal contact with developers is impossible, answering this simple question may be difficult.

In some projects, developers have the option of registering personal geographical information such as a country or city of residence or GPS coordinates. For example, this is the case with Debian (shown below). In other projects, IP addresses—on which geolocation analysis can be performed later—are collected. This information permits tracking different kinds of access (to the development repositories, to the download area, to the forums, etc). But most projects don't have these tracking capabilities.

This map shows the location of Debian developers. CC BY-SA 4.0.

Fortunately, there is an approach that, even when it does not produce a great deal of detail about exact location, can be useful for visualizing how your community is spread around the world: time zone analysis. Time zones are not good for fine-grained location, but are enough to give an idea of large geographical areas.

Time zone analysis uses information provided as a byproduct when developers interact with some repositories:

  • Git includes local time as a part of commit records. When commits are merged in the project Git repository, the author time (which includes the time zone tag) is usually not altered. Be warned that some actions on commits may alter their time, switching to the time zone tag of the person performing the action. Still, the information is in most cases reliable enough to know about the time zones for commit authors.

An example of time zone analysis. It shows the number of Git authors per time zone for OpenStack in 2014. CC BY-SA 4.0.
  • Mailers provide the local time, including time zone tags, in all sent messages. In many cases, the software archiving mailing lists keep this time unaltered. In those cases, the analysis of mailing list repositories permit the identification of time zone for senders.

Another example of time zone analysis showing the number of messages per time zone sent to Eclipse mailing lists during 2014. CC BY-SA 4.0.

In both cases, be aware of at least three sources of trouble:

  1. Bots performing commits or sending messages can have their local time zone set to whatever is convenient for the machine where they reside; in any case, bots are not human activity. In some projects, bots can account for a large amount of the activity in git repositories or mailing lists, which means that to have a reliable analysis, bots must be identified and removed.
  2. People setting the time zone in their machines to something other than their time zone of residence. For example, frequent worldwide travelers may have their time zone set to UTC+0 (universal time, formerly Greenwich time). This means that the time zone corresponding to UTC+0 can be over-represented.
  3. Many countries switch time zones twice per year (daylight saving time), but not all countries do. And those switching don't switch in the same dates.

A map of world time zones. By Phoenix B 1of3. CC BY-SA 3.0.

Time zone analysis provides only a high-level view of the geographical distribution of the community. For example, you cannot tell European from African contributors, because they are in the same time zones. You can roughly identify persons from large regions (but the list is not exact, so look at the map for details and a more accurate description):

  • UTC+12: New Zealand.
  • UTC+10, UTC+11: Australia.
  • UTC+9: Japan, Korea.
  • UTC+7, UTC+8: China, Eastern Russia, Indochina.
  • UTC+6: India (in fact, it is UTC+5:30).
  • UTC+3 to UTC+5: Western Russia, East Africa, Middle East.
  • UTC+0 to UTC+2: Western and Central Europe, West Africa.
  • UTC-2, UTC-3: Brazil, Argentina, Chile.
  • UTC-4 to UTC-6: North America Central and East Coast (US, Canada, Mexico), Central America, South America West Coast.
  • UTC-8, UTC-7: North America West Coast (US, Canada).

For some uses, this is enough. For example, the above chart about OpenStack Git authors show clearly that most of the developers are from North America and western Europe, with some participation from the Far East and other regions. The distribution of the Eclipse mail senders is even more centered in western Europe, with large participation from North America and only some presence from the rest of the world.

You can use this kind of study to track the results of policies for increasing geographical diversity, to know where developers come from, or to decide on a meeting location or chat session start time. In general, time zone analysis is a simple way to learn about the big picture of where your developers come from.

This text is based on content from the book Evaluating FOSS Projects.

Most Open
Month

A collection of articles highlighting first-time Opensource.com contributors.

User profile image.
Jesus M. Gonzalez-Barahona is co-founder of Bitergia, the software development analytics company specialized in the analysis of free / open source software projects. He also teaches and researches in Universidad Rey Juan Carlos (Spain), in the context of the GSyC/LibreSoft research group.

2 Comments

This is a really cool idea. I used to lead the Fedora Docs group, and finding a suitable IRC meeting time for a worldwide team was always a challenge. Something like this could have been really helpful.

Creative Commons LicenseThis work is licensed under a Creative Commons Attribution-Share Alike 4.0 International License.