I had the pleasure of interviewing Rosaria Silipo about her upcoming talk at OSCON: Advanced analytics for the Internet of Things. I recently taught a group of librarians about Internet of Things and how we can use it to provide better services to our patrons, so when I saw Rosaria's talk summary I thought it was right up my alley.
One of the biggest concerns folks in my workshop had was security, and I love Rosaria's response to my security question:
"Different kinds of data can be shared through the Internet of Things. Some of this data is sensitive, but not all of it. Sharing your medical history is not the same as sharing the amount of charged battery remaining in your toothbrush."
Tell us about yourself and your background.
I have been a data analyst—now the term seems to be data scientist—for more than 20 years. I have been mining data for academic projects, corporate solutions, and nonprofit institutions. I have been involved with all phases of the data analytics process in my career, from data collection and data warehousing through to model production and deployment. I have analytically explored many different application fields. The first ones that come to mind are biomedical signals (the topic of my master and doctoral thesis), Internet of Things, speech recognition, customer intelligence, social media, energy, finance, banking, and web analytics, but I am sure there are many more. Indeed, once you have a good domain expert working with you to help in preparing the data and interpreting the final results, the part in the middle involves always the same techniques applied to data.
How did you first get involved in open source? What is your favorite open source product?
To be honest, I do not remember. Open source has always been around since my first steps in the data analytics space. Very likely my first open source tool, as for many other computer scientists, was Linux. I must say, though, that I did not fully appreciate the potential of open source until the Internet. One of the best advantages of open source, in my opinion, is not so much the price, but the community behind it. The internet allows for easier and more frequent interactions among people and has given the open source communities the central role they deserve. I often hear the statement that open source tools do not offer support. If the open source tool they are talking about is really open source, the community produces not only support, but also an environment to learn more. Well, my favorite open source product is of course KNIME :).
For those who might be unfamiliar, what can you tell us about KNIME? Why would we want to explore this project further?
KNIME is an open data analytics platform, i.e. it is a tool for all of your analytical needs: data acquisition, data manipulation, statistics, machine learning, data visualization, and finally deployment. It is based on a graphical interface—no script to learn—which makes it easier to use with a shorter learning curve. It is open source. Being open source is just one aspect of a more general "open" philosophy. KNIME believes in openness meant as integration of different specialized tools, as collaboration among people with different skills, as transparency in code development, as agility in empowering users' creativity. Often I get the question whether KNIME is better than R. I think this is an ill-posed question. KNIME is not better than R—R has a longer history and a larger community. However, KNIME can integrate R—as well as Weka, Python, Perl, Google Analytics, or any other tool—through a very intuitive graphical interface. The integration of all those great and highly specialized tools under one hood through a graphical, intuitive interface is what makes KNIME so powerful and versatile. This is what also makes KNIME suitable for collaborative work: I might be a KNIME expert, but I do not need only KNIME experts around me to work together productively. In summary, KNIME is a powerful, easy to use, agile, integrative, open data analytics platform. I think anybody who is running a data science lab should have a look at it.
One of the biggest concerns people have about the Internet of Things is security and privacy. How can we encourage people to share their data if they want to secure their (and their customers') privacy?
Security is a big concern every time you let somebody else host your data. Privacy laws also differ from country to country. So, you need to make sure that whoever is hosting your data has a policy in place that respects the privacy laws of the country you are working in. It is quite a complex issue. However, I find that sometimes it is also a bit overblown. Different kinds of data can be shared through the Internet of Things. Some of this data is sensitive, but not all of it. Sharing your medical history is not the same as sharing the amount of charged battery remaining in your toothbrush. I am willing to share non-sensitive information as long as I get a return on it. A very good example of this can be seen in the youngest generations. They are willing to share tons of information on social media platforms as long as they can use them for free to keep in touch with their circle of friends. If the benefits override the loss of privacy, I might be willing to share more.
Do you use Internet of Things devices at home (or work)? Which ones have you had success with (or liked) and which do you think could be improved?
I am just starting to use Internet of Things home devices. The most curious one that I have at the moment monitors my plants at home. It sends me emails on behalf of my plants when something is needed (water, light, fertilizer). I do not have much of a green thumb, and it's useful to know what is going on. The Bromeliad, for example, just emailed me that she is doing fine!
This article is part of the Speaker Interview Series for OSCON 2015. OSCON is everything open source—the full stack, with all of the languages, tools, frameworks, and best practices that you use in your work every day. OSCON 2015 will be held July 20-24 in Portland, Oregon..