Grafana Loki Review
At CWISE we believe that log management is one of the most important items when it comes to observability in the company.
Options to choose from in operational log analytics are pretty limited in DevOps space. Most of the tools in self-hosted or semi self-hosted spectrum ether are
a) For free or relatively for free, hard to maintain even in small installations (Graylog/Elastic stack)
b) expensive and good. Like Splunk, it's golden standard for logging/analytics, but it's super expensive
c) mediocre (not naming anyone)
Grafana Loki came out from "Grafana labs" with a promise to be Prometheus for logging. Loki is built around the concept, that there is no need to index the whole text, only labels. This brings to much more efficient storage of logs.
It's cloud-native, e.g. it runs perfectly on Kubernetes. Architecture wise Loki is simple, it comes with 2 components
- client: promtail (or via plugins fluentd, fluentbit, logstash) processes logs and sends to server-side
- server: Loki server itself, it's pretty easy to get it working horizontally scalable
Important piece storage: Until the 2.0 release Loki needed to store two types of artifacts in two separate storage locations: index and chunks (log itself) storage. Since release v2.0 there is the possibility to use only one storage option via boltdb-shipper for both artifacts types. (this scenario is recommended)
Supported storage backends: S3, GCS, local, cassandra.
Querying being handled via logql language, which is inspired on promql language. There two (three) options how to access data, either logql utility, grafana data source or directly thru API.
So far so good, here we will tell our experience with running Loki.
Our setup for AWS based EKS installs are:
- promtail on EKS nodes as daemon set
- loki on EKS
- S3 as storage backend. Auth from EKS to S3 handled via IRSA.
- Grafana as visualization frontend (surprise surprise)
- We have got all logs in JSON output format
Operational side: simple and neat. Backend for S3 works well, even with investing too much time in caching. Upgrades
Querying and data storage: As mentioned earlier loki stores labels as searchable fields. That means log message is not very easily searchable if not indexed. For example if kubernetes used, then fields like namespace name or pod name will be searchable. But log message content not so much. Too much labels will cause high cardinality.
However, if used Grafana as querying frontend, then Grafana will parse output. For example JSON logs will be nicely parsed.
Grafana Loki is a nice addition to the log analytics space. It's really lightweight and easy to run, even for medium size deployments.
- Easy to setup and maintain. As we believe serverless is more like tactical approach, less technology, then Loki with low maintenance and easy setup is very nice thing
- Visualization support from Grafana.
- Integration with alertmanager can be very handy if you are using, already, alertmanager for deployments. Same approach for alerting for logs.
- It's still very "green" project, community is thring and a lot of features being developed
Not so nice things:
- Archiving: there is none yet, this is huge obstacle if you are regulated Company.
- Some of backends, like GCS/S3 can become costly if running high cardinality logs streams.
- It's still very "green" project,
"Client-side" processing approach significantly reduces operational overhead, like running logstash instances. Storage approach storing indexes, not full-text is a serious adversary to elasticsearch based OSS log analytics systems.
In CWISE for self-hosted OOS based solution for full-blown enterprise features, we would still pick elasticsearch based solution, like Elastic stack or Graylog. For smaller projects, Grafana Loki is a nice place where to start. And taking the in-account pace of development Loki, it will (is) be an alternative worth mentioning in one or two years.