A checklist for building your enterprise logging layer

As you build infrastructure to collect, unify, and analyze your logs, be sure to check this list to create an invaluable record and tool.

Image by:

Museum of Photographic Arts. Modified by Opensource.com. CC BY-SA 4.0

In any discussion about big-picture topics such as cloud migration services, data centers, and microservices, the topic of enterprise logging can get relegated to an afterthought. But you do so at your peril, because without logging, you won't have critical visibility into your services in order to diagnose and debug efficiently. What's more, if you're a large enterprise, you may be violating compliance requirements.

As you add applications and infrastructure components to your data centers, the amount of log data you collect will grow enormously. Combine that with the movement toward ephemeral components with Docker, containers, and Kubernetes, and log data becomes essential.

Proprietary tools for logging layers haven't kept up with cloud-native trends, so many enterprises are turning to open source software to meet their logging requirements. These open source tools allow users to unify data streams from all network devices, firewalls, applications, and infrastructure data, and they are evolving alongside other open source projects already powering enterprises. In most cases, open source tools are no longer just replacements, rather more powerful offerings that can outperform their proprietary counterparts.

As you build infrastructure to collect, unify, and analyze your logs, be sure to check off all these boxes to ensure logging fulfills its potential to be an invaluable record and tool.

Security: Because logs contain data from your applications, infrastructure, and network devices, they may also contain sensitive information. To prevent an operator or network intruder from breaching that information, ensure your logs are secure at rest and in transit. Additionally, make sure your encryption standards are not easily decrypted (MD5 or SHA1), but use at least SHA-256-bit encryption (AES-256-bit encryption is preferred).
High availability: When transmitting log data to your required backend for compliance or debugging, missing portions of logs create a hole in your enterprise history. That means you may not be able to determine which application failed, what network device streamed malicious content, or what country hacked your firewall. Remember, when your infrastructure is overwhelmed, your logging infrastructure tends to go with it. High availability ensures you have a secondary logging pipeline that can route and audit your logs if the primary pipeline is offline.
Load balancing: Log data tends to grow with the growth of your enterprise, and your log volume can go from a couple hundred messages per second to millions. This means that your logging infrastructure needs to properly balance the load of these growing message rates and prevent single instances from being overwhelmed.
Stream splitting: A single log message might have importance for multiple stakeholders in your enterprise, such as IT, security, and an application developer. When sending production log data, make sure you can route data efficiently to the specific stakeholder's toolsets and duplicate data when multiple stakeholders need to analyze it separately.
Vendor lock-in evasion: Sending data to a single analytic toolset is an easy way to code the pipeline, but might not yield the best return in analytics and insight. Also, locking in to a single vendor to provide analytics in the backend can be expensive, and your data can be held hostage. Your logging pipeline should instead be flexible and efficient, allowing you to quickly adopt new technologies and route data where it can be processed most efficiently.
Processing at the edge: Most of the time when you run analytics on top of your logging data, you can run expensive, computation-heavy queries that take ages to finish. Instead of waiting until all the data is in your backend to begin analysis, your logging pipeline should do some preliminary work to augment streams of data with relevant information or distribute processing across multiple nodes.
Filtering at the edge: As much as you try to log only the most important data, sometimes logs are just a bunch of noise. Your logging pipeline should be able to intelligently identify noisy messages and quickly filter them out. Your pipeline should also be able to filter on a variety of fields, including the source of the log, the severity of the log, content inside the log message, and, of course, the log time.
Application-independent logging (non-modular logging, monolithic logging): When your logging pipeline is not separate from your application, you need to redeploy your application every time you make a change to your logging pipeline. This means you lose logs every time you redeploy, and if your application malfunctions, your logging also stops. Even if the changes you need to make to your logging pipeline are small, you'll need to redeploy your application, which is expensive in the case of a mobile application and nearly impossible if your application is part of a hardware offering.
Reliability: The backend where your logs end up should always be up and running, but sometimes they can be unavailable for any number of reasons. Your logging pipeline should be able to identify when an error occurs and be able to buffer the data while it waits for the backend system to come back up. In a case where the backend is unavailable for a long time, your logging pipeline should be able to use its high availability properties to switch to a backup.
Monitoring: When your logs are no longer being generated, you have no visibility into your infrastructure, applications, and network devices. Everything could be running smoothly or all hell could be breaking loose. That's why when logging goes down in Google Cloud and Microsoft Azure, engineers consider it a severity 0. The only way to make sure that your logging infrastructure is up and running is by monitoring to make sure it's up and running, throughput is fast and steady, and any anomalies are accounted for.
Logging agent weight: Usually when you deploy a logging agent next to your application, you want that logging agent to be lightweight, performant, and reliable. Using a heavy logging agent can impact your business application by taking up performance that should be used for your application.

Letting your logging layer become an afterthought can have serious ramifications for your business. Following these recommendations ensures that your enterprise logging is not only supporting compliance needs, but also achieving the goals necessary to help your business run smoothly and effectively.