A final lesson on teaching open source NoSQL databases

education key
Image by opensource.com
submit to reddit
 
(1 votes)

In two recent posts, we described a set of tutorial sessions teaching open source NoSQL databases at SUNY Albany. These practical sessions covered:

  1. MongoDB: a document database
  2. Neo4j: a graph database
  3. M: a hierarchical database 

The tutorials were prepared online using Sphinx and where planned to be executed as hands-on exercises by students in just about one hour each. The tutorials were delivered in two undergraduate Database classes with instructors Dima Kassab and Alex Jurkat, and one group in the MBA program, with instructor Shobha Chengalur-Smith.

From the class reviews made by the students at the end of the semester, we gathered that they appreciated three main aspects of these exercises:

  • Having practical sessions to interact with database concepts.
  • Using a server with a preconfigured database along with web-based tutorial materials.
  • Having external visitors sharing and assisting with the class.

There are two aspects worth highlighting on the practical nature of the exercises. On one hand, they illustrated how several of the database concepts that students have learned in class, applied to real-world situations. Particularly in the case of MongoDB as used for document management, and in the case of M as used as a standard NoSQL database in healthcare and financial applications. On the other hand, the tutorial sessions were structured as "lessons" where the students followed a series of step-by-step instructions and executed them at their own pace. This, of course, meant that some of the students finished the exercises very early, while others had to continue with the exercise in the days following the class. This gesture of the students coming back to finish the exercises afterward, was for us a sign of successfully having managed to attract their attention to the subject.

The students worked on a simplified version of a social network— combining basic information about themselves and information about their favorite movies. Using a topic the students could relate to quickly was one of the key ingredients for getting them interested in the exercise. Also, the students were all working in the same database, meaning that as they added records to it, they could see the work of their peers. This shared resource added to the flavor of group interactive work and facilitated conditions to expose them to some of the typical challenges of databases, in particular:

  • Inconsistencies
  • Duplication
  • Race conditions

As the database was reused by subsequent tutorials, students down the line also benefited from starting with a populated database, and that grew to several hundreds of records. Thanks to the fact that we ran six sessions in total, as instructors we also got to experiment with several approaches, some that worked better than others, and converged to the following observations.

Avoid:

  • The use of the command line.
  • SSH logins to remote servers.
  • Copying files across servers.
  • Synchronized instruction (the instructor talking to all of the students at the same time)
  • Reading tutorial background material during the class

Embrace:

  • Reading the background material as homework in preparation for the practical session in class.
  • Use of web-based interfaces during the exercises. In particular the Neo4j web interface for displaying graphs, and its online console for Gremlyn.
  • Use of iPython notebook combined with pyMongo to perform interactive exercises using python, as well as with the Python bindings for M.
  • Have multiple helpers. We were lucky to count in some cases with up to four people assisting the students. For example, in Dima Kassab’s class we had her Teaching Assistant, Ocieka Bakou, and also had the help of Amir Sadoughi (from RPI), who was very kind to help us in four other sessions. The presence of multiple helpers is essential when guiding the students to follow the lessons at their own pace. The helpers found themselves going from chair to chair providing clarifications and useful hints.
  • Pre-populate the databases with a couple hundred records. This makes it easier to start with Query exercises as the first activity, then do Updates and Insertion as second activities.

The Command Line

I must say that it is unfortunate not to be able to use the command line, and that this is certainly a void that has to be filled at some point in the education of undergraduates in technology. The time available for the database class, just didn’t allowed us to take the proper time to train the students in the proper use of the command line. It is true too that the core of the class was supposed to be focus on database concepts and not necessarily on the practical details of deployment. For example, the issues of security were not part of the topics that the class intended to address. This was by design.

On the front of using web-based interfaces, I can’t say enough good things about iPython Notebook. It is simply a superb tool for teaching, and I just wish I had started using it earlier. A couple of pragmatic recommendations on its use:

  1. Create notebooks for every student in advance (for example, using their names). Otherwise, you get into race conditions when 25 students are editing notebooks at the same time.
  2. Take the time to give students a quick tour of it's use before diving into the exercise.
  3. Be prepared for the wireless network to not work properly at times! (Sometimes I think networks and printers exist just to test our character.)

A final observation

The instructors discussed among ourselves the importance of having a server dedicated to the class, where instructors and students had access to play with the tools. Such a server is certainly not setup with security as its main feature, and it is fundamental to find the balance between quick usability for educational purposes and a light exposure to the realities of deploying applications in production. In our sessions we took advantage of a server in the Amazon EC2 cloud, where we set up the databases and their web interfaces. It is however costly to keep the server up on a continuous basis, so we will be exploring some of the options that Amazon offers for education.

This wrapped up our teaching sessions on NoSQL databases for 2012. We are planning on doing the sessions again during the Spring 2013 semester, and we are looking at a similar approach for the Web Development course (also during the Spring semester).

""
Creative Commons License

2 Comments

Ezra Taylor's picture

Thank you.

Wschifano's picture

Open Source saves Corporations from $Billions of $Dollar from Viruses, Malwares, Trojans, etc. Open Source creates an opportunity for 'Special Application Designs & Special Application Features' just for that Specific Department(s). Open Source generates $Monies for the local economy ie, Cities, Towns, Counties, & Provinces.