Get the highlights in your inbox every week.
Overview of the Elastic Stack, open source software tools for data insights
Gathering insights from data: An overview of the Elastic stack
The Elastic stack is a versatile collection of open source software tools that make gathering insights from data easier. Formerly referred to as the ELK stack (in reference to Elasticsearch, Logstash, and Kibana), the growing list of tools that integrate with the platform (such as Beats) have outgrown the acronym but provide ever-growing capability for users and developers alike.
At the upcoming Southeast Linuxfest 2016, I'll be covering some of the steps to get started using each of these parts of the stack. In this article, we'll look at each in turn to summarize the capabilities, requirements, and interesting use cases that apply to each.
Beats & Logstash
Collecting data is where it all starts. Logstash and Beats both serve this need, though each is finely tuned to suit different needs and uses. Whether your priority is light resource usage or extensive features, either solution has been designed to meet those requirements.
Beats are lightweight, fast data shippers and collectors that typically do one thing and do it well. For example, Packetbeat collects and ships data about packet activity on a network interface, while Filebeat can tail a log file and send logs to be processed downstream. Beats are designed to be fast, portable, and easy to deploy on individual hosts for specific needs. There are even community-created Beats for uses like monitoring HTTP endpoints and NGINX health.
Written in Go and based upon the shared foundation of libbeat, Beats are meant to be easily implemented so that even resource-constrained environments can be measured easily and without much overhead.
Logstash is similarly capable of collecting machine data, but where it shines is the plethora of open source plugins available to enrich data. For example, while collecting webserver logs is useful, deeply parsing the user-agent data to extract traffic statistics can be beneficial, which the useragent filter can do. Or, if using the Twitter plugin, you may want to perform sentiment analysis on user tweets.
Custom plugins are simple Ruby libraries, which enable users to extend functionality and prototype new features quickly. Performance isn't an afterthought, however: Logstash ships with JRuby by default, which opens up possibilities for concurrency and real threads.
Once data has been collected and enriched, storing it in Elasticsearch opens up a range of possibilities.
Elasticsearch is a search engine at its heart, with a myrid of use cases borne of its flexibility and ease of use. Based on Apache Lucene, Elasticsearch strives to make both the operational challenges (such as scalability and reliability) and application-based needs (like freetext search and autocomplete) easier for end users.
Operationally, Elasticsearch's elasticity stems from splitting indices into shards, which can be spread across multiple hosts to balance load and boost performance. With planning, this means that datasets can grow well beyond the capabilities of one machine to handle.
Some examples of the analytics that can be performed on Elasticsearch include:
- Geo searching. When documents are inserted with geo metadata, results can be overlaid on a map to visualize how documents relate to real-world longitudes and latitudes.
- Graphing. Recent plugins have added graph search to Elasticsearch, which can answer interesting questions such as relationships between data in the Panama Papers.
- Aggregations. Answering questions such as which pages returned the most 500 errors become a matter of forming the right query for the fields in a log file.
While all these capabilities are driven by Elasticsearch's core, exposing them in a user-friendly interface is left to the next layer in the stack.
Kibana is a browser-based visualization frontend for Elasticsearch. It enables users to easily consume data in aggregate that would otherwise be difficult to process; making logs, metrics, and unstructured data searchable and more usable for humans. Additional plugins can be used for specialized cases, such as Timelion for timeseries data.
Because Kibana persists most of its data within Elasticsearch, managing Kibana dashboards and visualizations is a similar exercise as managing other indices in Elasticsearch. Charts, graphs, and other visualizations sit atop Elasticsearch APIs which can be easily inspected for closer analysis or use in other systems.
An open ecosystem
Like other multi-faceted systems, the Elastic stack is supported by tools to help manage deployment and configuration, such as with Ansible, Puppet, and Chef. Standard distribution repositories are similarly available.
Now is a great time to play with the stack and see what you can accomplish with it, with a wide variety of solutions and global community standing by to support users and developers alike.