Overview of the Elastic Stack, open source software tools for data insights

Gathering insights from data: An overview of the Elastic stack

Gathering insights from data: An overview of the Elastic stack
Image credits : 

Tony Smith via Flickr (CC BY 2.0)

Get the newsletter

Join the 85,000 open source advocates who receive our giveaway alerts and article roundups.

The Elastic stack is a versatile collection of open source software tools that make gathering insights from data easier. Formerly referred to as the ELK stack (in reference to ElasticsearchLogstash, and Kibana), the growing list of tools that integrate with the platform (such as Beats) have outgrown the acronym but provide ever-growing capability for users and developers alike.

At the upcoming Southeast Linuxfest 2016, I'll be covering some of the steps to get started using each of these parts of the stack. In this article, we'll look at each in turn to summarize the capabilities, requirements, and interesting use cases that apply to each.

Beats & Logstash

Collecting data is where it all starts. Logstash and Beats both serve this need, though each is finely tuned to suit different needs and uses. Whether your priority is light resource usage or extensive features, either solution has been designed to meet those requirements.

Beats

Beats are lightweight, fast data shippers and collectors that typically do one thing and do it well. For example, Packetbeat collects and ships data about packet activity on a network interface, while Filebeat can tail a log file and send logs to be processed downstream. Beats are designed to be fast, portable, and easy to deploy on individual hosts for specific needs. There are even community-created Beats for uses like monitoring HTTP endpoints and NGINX health.

Written in Go and based upon the shared foundation of libbeat, Beats are meant to be easily implemented so that even resource-constrained environments can be measured easily and without much overhead.

Logstash

Logstash is similarly capable of collecting machine data, but where it shines is the plethora of open source plugins available to enrich data. For example, while collecting webserver logs is useful, deeply parsing the user-agent data to extract traffic statistics can be beneficial, which the useragent filter can do. Or, if using the Twitter plugin, you may want to perform sentiment analysis on user tweets.

Custom plugins are simple Ruby libraries, which enable users to extend functionality and prototype new features quickly. Performance isn't an afterthought, however: Logstash ships with JRuby by default, which opens up possibilities for concurrency and real threads.

Elasticsearch

Once data has been collected and enriched, storing it in Elasticsearch opens up a range of possibilities.

Elasticsearch is a search engine at its heart, with a myrid of use cases borne of its flexibility and ease of use. Based on Apache Lucene, Elasticsearch strives to make both the operational challenges (such as scalability and reliability) and application-based needs (like freetext search and autocomplete) easier for end users.

Operationally, Elasticsearch's elasticity stems from splitting indices into shards, which can be spread across multiple hosts to balance load and boost performance. With planning, this means that datasets can grow well beyond the capabilities of one machine to handle.

Some examples of the analytics that can be performed on Elasticsearch include:

  • Geo searching. When documents are inserted with geo metadata, results can be overlaid on a map to visualize how documents relate to real-world longitudes and latitudes.
  • Graphing. Recent plugins have added graph search to Elasticsearch, which can answer interesting questions such as relationships between data in the Panama Papers.
  • Aggregations. Answering questions such as which pages returned the most 500 errors become a matter of forming the right query for the fields in a log file.

While all these capabilities are driven by Elasticsearch's core, exposing them in a user-friendly interface is left to the next layer in the stack.

Kibana

Kibana is a browser-based visualization frontend for Elasticsearch. It enables users to easily consume data in aggregate that would otherwise be difficult to process; making logs, metrics, and unstructured data searchable and more usable for humans. Additional plugins can be used for specialized cases, such as Timelion for timeseries data.

Because Kibana persists most of its data within Elasticsearch, managing Kibana dashboards and visualizations is a similar exercise as managing other indices in Elasticsearch. Charts, graphs, and other visualizations sit atop Elasticsearch APIs which can be easily inspected for closer analysis or use in other systems.

An open ecosystem

Like other multi-faceted systems, the Elastic stack is supported by tools to help manage deployment and configuration, such as with AnsiblePuppet, and Chef. Standard distribution repositories are similarly available.

It's worth noting that all of these open source projects are spread across a wide array of languages. While Beats are written in Go for portable, efficient distribution of compiled binaries, Kibana uses Javascript for unified development of frontend and backend components. With many languages and an open codebase, users should feel free to get involved with the feature development and bugfixing efforts that matter to them.

Now is a great time to play with the stack and see what you can accomplish with it, with a wide variety of solutions and global community standing by to support users and developers alike.

About the author

Tyler Langlois - Tyler is an Infrastructure Engineer at Elastic where he wears many hats. In previous lives he's worked in cybersecurity, datacenter floors, and a variety of operations roles. He's an active contributor in the open source community and enjoys working in Linux, functional programming languages, and living in zsh.