Education for the real world: Open course on open source NoSQL databases

Register or Login to like
Teaching open source text

Back in March of this year, the University of Albany Student Chapter of the American Society for Information Science & Technology (ASIS&T) organized its second Open Source Festival. The event brought together enthusiasts of open source from industry, government, and academia in the New York-Albany area. There, I shared my experience of teaching an open source class at RPI and the work that OSEHRA was doing on further promoting the use of open source software in healthcare. Among other topics of discussion was the need to educate college students on the basic concepts of NoSQL databases.

The concern was coming from the now widespread use of the M database across healthcare applications and the lack of awareness about M in the academic community—most courses focus on Relational Databases. It is worth pointing out that M is both a language and a database, more specifically, a hierarchical NoSQL database, and it is used in thousands of clinical facilities worldwide.

Dima Kassab, who was one of the organizers for the Open Source Festival 2012, was preparing her fall class on databases at SUNY Albany in early July, and she had the initiative to include in her syllabus the topics of "open source NoSQL databases." Her motivation was twofold: to expose students to the new NoSQL technologies and to the use of open source software. We agreed to work together to prepare hands-on sessions on open source databases that were to be included as activities in the Fall semester.


At the Informatics Department in the State University of New York, University at Albany, other instructors like, David Adkins and Alexander Jurkat, were very excited to introduce NoSQL databases to students. Others were more reluctant because they were not sure if learning about these databases would be the best fit for their Information Science major students or if the course was mature enough to be included in the curriculum.

Alex, David, and Dima had meetings to discuss how to approach teaching NoSQL databases, and the main challenge they had was the lack of educational materials in this area. For all of them, this was the first time working with many of these databases under the NoSQL category. They had various meetings to decide which databases they should focus on, in what sequence these databases should be introduced to students, and how much time should be devoted to these databases.

Plus, students are hesitant when they are first introduced to NoSQL databases because standard database courses have incorporated little to no NoSQL material. This creates a disconnection between what students are learning in their formal courses in college and what it is being used in industry—where NoSQL databases are gaining adaption to complement applications in which they overperform relational databases.

Designing the course

The course was built on the experience that Dima Kassab had with Team Based Learning (TBL) and my experience in organizing hands-on tutorials in conferences and practical activities in the open source software practice class at RPI. TBL turns out to be extremely compatible with open source practices because it is rooted in empowering students to take control of their own learning process. It engages students and drives discussions in groups and self-directed activities.

We immediately agreed on doing hands-on exercises that will familiarize the students via direct experience with the concepts and uses of open source NoSQL databases. And we built on previous experiences we had using Sphinx (the RST-based documentation system of Python) with hand-on exercises pre-configured in Virtual Machines or in servers in the Cloud.

Next, we decided to store all of our materials in a Github repository. The easy RST text markup used by Sphinx makes it easy to compose pages that look nice, and that can be easily maintained in a revision control system. We chose to adopt the CC BY 3.0 License for the texts and Apache 2.0 License for all code snippets. Our hope is that this will facilitate reuse by others, as well as collaborations with anyone interested in helping improve the tutorial materials. The current version of the Tutorial is here and the repository is here

To avoid being biased towards any particular database, we wanted to follow Andreas Kolleger's approach in his OSCON talk: "Past, Present and Future of NoSQL", a great presentation on the spectrum of NoSQL databases in a common context that made it much clearer what the pros and cons are for different applications; he also hinted to how the landscape may evolve in the near future.

That lead to us discovering the excellent book "Seven Databases in Seven Weeks" by Eric Redmond and Jim R. Wilson. We appreciated its friendly comparative overview, without database-religious overtones, where the reader can get familiar with many fine databases by doing practical hands-on exercises. It is refreshing to read an objective database comparison that goes beyond the tired conversation of "My Database is Better than Yours." Redmond and Wilson achieved this with great style, covering: PostgresSQL, Riak, HBase, MongoDB, CouchDB, Neo4J, and Redis.

We then started crafting an online tutorial, with hands on exercises. Our version of "Seven Databases in Seven Weeks" was the transformation of "weeks" into "classes", adding a database to the list (M), and because one semester's time was not enough, we began with "three databases in three classes." The first three picks were: MongoDB, Neo4J, and M. To facilitate the practical session, we put together a Linux server in the Amazon EC2 cloud and accounts were created for every student.

Day one and two of class

Our first class took place on Tuesday, November 6th with 23 students at the Informatics Department of SUNY Albany and covered our MongoDB coursework. The second class was held on Thursday, November 8th and covered our Neo4j coursework. Our last class of this group will be on Tuesday, November 13th and cover M.

Our early assessment of the sessions is that the actual exercises require more than one hour to be executed properly. From our current design of activities we probably need two or three hours to complete the exercises of populating the database, querying in various forms, and then discussing among the groups using TBL practices.

If only an hour is allocated, to make the classes fit we propose that the database should be populated before the class and that the class itself be focused on querying exercises. It is quite essential to preserve the time for the TBL group discussions to take place, given that the NoSQL concepts are a paradigm shift from what the students have already been exposed to as they studied the Relational model.

One tempting option is to allow the students to follow the tutorial instructions as a homework activity before the class, and then focus the class on holding discussions and focusing on further hands-on exercises. This is guided by the approach advocated by Salman Kahn (the founder of Kahn Academy) in his book, "The One World School House," where students are allowed to work on class material at their own pace and the classroom time is focused on interacting with peers and instructors.

The experience has been very exciting, and we anticipate progressive improvement as we continue to develop these open course materials, providing practical training on modern databases.

Stay tuned for a follow-up article discussing the third class teaching M.

Luis Ibáñez works as Senior Software Engineer at Google Inc in Chicago.

Comments are closed.

Creative Commons LicenseThis work is licensed under a Creative Commons Attribution-Share Alike 3.0 Unported License.