When we started talking about hosting a 'back to school' week at Opensource.com, I decided to take that quite literally, and went back to NC State University earlier this month to attend the inaugural Geospatial Forum at the Center for Geospatial Anaytics. Geospatial analytics and GIS (geospatial information science) is a huge field, with a number of open source tools for research and teaching available, and I wanted to learn more about how these tools are being used in the real world.
Speaking at the forum was Dr. Markus Neteler, Head of the GIS and Remote Sensing unit at Fondazione Edmund Mach (FEM) in Trento, Italy, and Chair of the GRASS open source GIS project steering committee. GRASS (Geographic Resources Analysis Support System) is an open source GIS software tool used for managing and analyzing geospatial data, and includes functions for modeling, visualization, image processing, as well as creating custom output maps.
While there are some commercial entities using GRASS, the business side of GIS work is still heavily dominated by closed source tools, especially Esri ArcGIS. But within the academic world, tools like GRASS have gained a good bit of traction. The GRASS project is also a founding member of the Open Source Geospatial Foundation (OSGeo), an organization which exists to help foster the development, use, and adoption of open source tools for GIS.
When I was in graduate school, I was fortunate to have among my professors Dr. Helena Mitasova, a developer of GRASS and OSGeo Foundation member, so I gained some experience working with GRASS, but I also knew that the work I was doing was only scratching the surface of what the tool could do. In fact, I had a little bit of fun looking back at my old portfolio of GIS work and discovering that one of the assignments I had completed in GRASS involved using it to map the viewshed from the top of Red Hat Tower (in Raleigh, NC) and then determine what percentage of that viewshed was made up of various land use types.
The project Dr. Neteler spoke on was considerably more advanced than my fairly simple assignments in learning GRASS. Dr. Neteler shared with us a number of applications he has been working on using GRASS along with a number of other open source tools to process gigantic temperature datasets collected by various satellite projects, process the data to clean it for accuracy and to fill in gaps, and then use this vast dataset over a long period of time to learn about temperature variation in Europe and how this affects a number of public health problems.
As you might imagine, high resolution readings of temperatures over a long period of time create a large dataset. For the projects Dr. Neteler described, research was primarily done with MODIS data, which provided daily readings across the study area at 250 meter resolution, meaning that a single snapshot of Europe could been over a million pixels wide. Of course, each pixel cell of each raster could require considerable processing based on neighbor data or other information to correct its contents.
To conquor this large data set, FEM uses a cluster of computers running a number of open source tools. Overall, their cluster contains 300 nodes, 610 gigabytes of RAM, and 132 terrabytes of disk space. Each of these nodes runs Scientific Linux, a derivative of Red Hat Enterprise Linux targeted at scientific uses and initially created by Fermi National Accelerator Laboratory. The storage is managed by GlusterFS, an open source distributed filesystem. And while the analysis on these units is primarily provided by GRASS, it is conducted through scripted command line calls rather than the graphic user interface which I was more familiar with. FEM also used open source tools including PROJ.4 for projecting their data, GDAL for providing some additional data processing, and Grid Engine for scheduling the processing.
And so what can you do by creating a very accurate picture of temperature data over a large region over a span of time? Many of the examples Dr. Neteler gave of their research focused on tracking infectious disease. Similiar to where I live in the United States, Europe has also seen a problem with the tiger mosquito being introduced as a non-native invasive specifies and gradually increasing its range across the continent. Tiger mosquitoes are capable of carrying a number of infectious pathogens, and so public health experts need to be able to understand where their range may expand to based on the range of temperatures which the mosquito can tolerate. This allows for the targeting of eradication measures to the right areas and planning for range changes as climate changes affects temperature patterns.
A slightly more upbeat example of FEM's research is in viticulture, the growing of grapes for wine. As farmers look to choose grape varieties which will be tolerant to conditions which can vary heavily in a mountainous region over a relatively short distance, they can look to the predictive modeling of temperature data to better understand which varieties will be most likely to thrive in their specific temperature microclimate.
These are just two of many examples; the underlying theme of Dr. Neteler's talk was that this research was made possible both by the open source tools for conducting analysis and the open data provided by the MODIS satellite program. Much of the data collected by European agencies, unlike the NASA MODIS program, is not freely available, costing up to hundreds of thousands of dollars and making it prohibitively expensive for academic and research usage. "Public data should be public" Neteler said, and so too should the tools for conducting analysis, making any research done with it replicable and verifiable.