Prometheus is a cloud-native (actually was the 2nd program that added to CNCF after Kubernetes) time-series database used to collect and process metrics. It is developed by SoundCloud in 2012.
Prometheus is considered a successor to programs like *Zabbix*. Unlike Zabbix which uses MySQL as its database, Prometheus use its own internal time-series database.
Prometheus also has auto-discovery. Its pull-based system (unlike Zabbix) can detect `agents` and start collecting from them.
![[Pasted image 20250305174842.png]]
### Alertmanager
A custom threshold can be defined in Prometheus. When reached, Prometheus will *fire* this alert to Alertmanager. Then based on Alertmanager configuration, the alert will *notify* the destination.
### DataTypes
1. `counter`: an ever-increasing number (http hits)
2. `gauge`: a number that goes up an down (tempreture)
3. [`histogram`](https://prometheus.io/docs/practices/histograms/): a sample observation oriented data (response time, duration)
4. `summary`: histogram + total count
> [!tip]
> Histograms can be used for measuring SLOs like more than 90% of requests resolve in less than 20ms. (Apex Score)
### Remote Write
In order to add high-availability on Prometheus, [*Remote Write*s](https://prometheus.io/docs/specs/prw/remote_write_spec/) can be used. This makes Prometheus state-less since it does not save state anymore.
![[Pasted image 20250305192302.png]]
The `sender` or `receiver` not only can be Prometheus itself, but other compatible softwares (like *mimir* as receiver or *m3* for both) can be used as sender or receiver.
### Recording rules
In order to optimise read-heavy stats, [*recording rules*](https://prometheus.io/docs/prometheus/latest/configuration/recording_rules/) can be defined to pre-compute and cache the results.
Querying the pre-computed result will then often be much faster than executing the original expression every time it is needed. This is especially useful for dashboards, which need to query the same expression repeatedly every time they refresh.
### Federation
Federation allows a Prometheus server to scrape selected time series from another Prometheus server.
This is different from *remote write*. In remote writes, the instance itself does not keep an instance of the data. But in federation, the data is in *sync* with different nodes.
![[Pasted image 20250305203244.png]]
### Service/Auto discovery
Prometheus also introduces auto-discovery to automatically add or remove services. This can be both based on *files* or *http*.
### Security
Prometheus and other its components (agents, exporters) offer a web-UI. This often comes up on a port with internet edge without authentication.
Use TLS+proper authentication for any exposed port.
### Prometheus Operator
is an external program used to operate a Prometheus from outside. This is helpful for operating a observability stack using Kubernetes.
![[Pasted image 20250305205327.png]]