• Jo Miran
    link
    fedilink
    126 months ago

    We do Grafana + Prometheus for most of our clients but I think that adding Loki into the mix might be necessary. The amount of clients that are missing basic events like “you’ve run out of disk space…two days ago”, is too damn high.

    • @darvit@lemmy.darvit.nl
      link
      fedilink
      76 months ago

      Sounds like you need an alert/monitoring system and not a logging system. Something like nagios where you immediately get an alert if something is past its limits, and where you don’t have to rely on logging.

      • Jo Miran
        link
        fedilink
        56 months ago

        Preaching to the choir. They hire use to performance tune their app but then their IT staff manges to not notice the most basic things.

    • @Machindo@lemmy.ml
      link
      fedilink
      26 months ago

      I would add Alertmanager to your stack if you haven’t already. It’s pretty tightly integrated with prometheus. There’s some canned alerting rules based on predicting disk space full in X number of days. We wire Alertmanager to Pagerduty.

    • @dan@upvote.au
      link
      fedilink
      16 months ago

      The amount of clients that are missing basic events like "you’ve run out of disk space

      For my personal servers, I use Netdata for this. Works pretty well.