How open source is advancing the Semantic Web

Sir Tim Berners-Lee's vision of the Semantic Web is becoming clearer through open source contributions.

Image by:

Opensource.com

The Semantic Web, a term coined by World Wide Web (WWW) inventor Sir Tim Berners-Lee, refers to the concept that all the information in all the websites on the internet should be able to interoperate and communicate. That vision, of a web of knowledge that supplies information to anyone who wants it, is continuing to emerge and grow.

In the first generation of the WWW, Web 1.0, most people were consumers of content, and if you had a web presence it was comprised of a series of static pages conveyed in HTML. Websites had guest books and HTML forms, powered by Perl and other server-side scripting languages, that people could fill out. While HTML provides structure and syntax to the web, it doesn't provide meaning; therefore Web 1.0 couldn't inject meaning into the vast resources of the WWW.

Next came Web 2.0 and the emergence of user-generated content like blogs, wikis, video sharing, social media, and so forth. Dynamically generated content created two-way interaction. Sites like Flickr and Twitter employed user-generated tags (called folksonomies) to organize content into categories. While this represented a vast improvement in both interface and interaction over Web 1.0, it's not the full level of interactivity envisioned by Berners-Lee's definition of the Semantic Web.

The urgency to realize the Semantic Web has gained steam with the rapidly expanding Internet of Things (IoT), as each of these devices forms a web of semantic data that can be queried with appropriate tools. The intersection of artificial intelligence, big data, the IoT, and connected web technologies is creating the opportunity to derive more meaning and context from the data we share in our increasingly interconnected world. As this web of data continues to grow, we need software tools and frameworks to create and read this information.

Architecture of the Semantic Web

The World Wide Web Consortium (W3C) has identified a software stack of complementary datasets that are embedded in web content and can be identified and queried with the appropriate software.

Image by:

^{Semantic Web Architecture, Marobi1, CC0}

This new ecosystem of information is defined by a theoretical stack made up of Uniform Resource Indicators (URIs), strings of characters that identify a resource. It also includes Unicode, the digital representation of characters that allows web content to be displayed in different languages. Other major elements of the stack's foundation are as follows:

Extensible Markup Language (XML) embedded in these web documents establishes a common syntax that describes the content provided.
Resource Description Framework (RDF) is defined by W3C as "a flexible and extensible way to represent information about World Wide Web resources. It is used to represent, among other things, personal information, social networks, metadata about digital artifacts, as well as provide a means of integration over disparate sources of information."
Web Ontology Language (OWL) is a semantic markup language designed to describe relationships between things and can be used by other programs to interpret the data provided.
SPARQL (pronounced "sparkle") is the protocol and RDF query language that extracts the embedded RDF data found on the web. According to Wikipedia, "SPARQL allows users to write queries against what can loosely be called 'key-value' data or, more specifically, data that follows the RDF specification of the W3C."

Open source projects advancing the Semantic Web

How does a web page distinguish information? How can my web content literally talk to other content in a way that the receiver knows my intent? How can information in a wiki's text and multimedia files, for example, be queried to determine what active projects took place in 2016? One open source tool that enables this type of interaction is Semantic MediaWiki.

In addition, a growing list of open source projects has emerged to extract meaning from the Semantic Web. This includes DBpedia, a project that aims to extract structured content from Wikipedia. Creative Commons uses RDF data to embed license information in web pages and MP3 files. Simple Knowledge Organization Systems (SKOS) is used in thesaurus apps like Unesco Thesaurus. Apache Jena is an open source implementation of SPARQL. Another open source framework that implements SPARQL is Sesame.

If you'd like to learn more about these and other advancements, visit the W3C's Semantic Web page, which contains technology information, news, upcoming events, and more.