I've been experimenting with Grafana/Prometheus. Do people actually use this instead of Nagios/Cacti/whatever traditional monitoring system? Obviously it's very generalized, but I wonder if it's reasonable for alerting and stuff.

@crunklord420 Having worked with all of the products you've mentioned, what do you wanna know?

As for Prometheus/grafana, it works for simple monitoring stuff. If you wish to add complexity you can look at Prometheus its AlertManager.

@xyfdi do you think using grafana/prometheus is considered edge-case or is that normal use-case? I just checked and saw PagerDuty has integrated support, so I guess there's acceptance.

@crunklord420 I would say that its normal use-case to be honest. A lot of software has Prometheus support and when it comes to Cloud Native ( 🤢 ) software they are the go-to of self-hosted solutions.

As for me, I've used them to monitor 100+ boxes (using node_exporter) and the devs used it to monitor their shitty single-instance applications. Its fairly rock solid, feed it enough diskspace and memory and you can a very long time without having to look at it.

@xyfdi would you say it takes lots of memory and disk in comparison? It looks like you can just set a disk usage limit and it just prunes, I guess based on date? I wonder if you can select specific time-series with different pruning limits.

@crunklord420 You tell Prometheus how long you wish to retain the information you scraped from the exporters. By default this is 21 days and its fine.

You have to keep in mind that Prometheus is built to give you an accurate picture of what is happening here and now. So its excellent with frequent updates and storing loads of data in its TSDB. For the long term it isn't recommended. ( I mean, I did it at work, had like a year or so retention, 1.5TB on a simple t2.medium box. It worked, you could go back really far and it had great detail. But damn, you could here the instance just generating a fire in AWS's datacenter. )

Ultimately it depends how much you wish to store for how long which in turn determines the size. As for pruning, this happens automagically, but this happens for all the time-series, not specific ones.

@xyfdi @crunklord420 to be fair it takes almost no effort to add a prometheus endpoint. you just need to pull off a single http1.1 response frame with a plaintext response

blobcat_shitpost{user="icedquinn"} 60000
blobcat_shitpost_count 1000000
Follow

@icedquinn
Yeah, its easy. Just verify with promtool it isn't absolutely retarded and you are good to go.

@crunklord420

· · Web · 0 · 0 · 1
Sign in to participate in the conversation
Game Liberty Mastodon

Mainly gaming/nerd instance for people who value free speech. Everyone is welcome.