Skip to main content

Monitoring and Alert System

Component Description

When deploying with Docker Compose, the following components will be automatically deployed. For source code deployment, these components must be manually enabled in docker-compose.yaml.

Component NameDescriptionDeployment Instructions
openim-adminAdmin backend providing a monitoring page entry.Automatically enabled in Docker and source code deployments.
prometheusMonitoring system component for collecting and storing metric data.Automatically enabled in Docker deployments; manual enablement required for source code deployments.
alertmanagerComponent for managing and sending alerts.Automatically enabled in Docker deployments; manual enablement required for source code deployments.
grafanaDashboard component for displaying monitoring data.Automatically enabled in Docker deployments; manual enablement required for source code deployments.
node-exporterCollects node (e.g., server) metric information.Automatically enabled in Docker deployments; manual enablement required for source code deployments.

Configuration File Description

File NameFile DescriptionModification Items
config/config.yamlOpenIM service configurationprometheus.enable: true indicates enabling
config/prometheus.ymlPrometheus configurationNo modifications required
config/instance-down-rules.ymlAlert rulesTwo default rules configured (instance_down, database_insert_failure_alerts)
config/alertmanager.ymlAlert management configurationSender and recipient email information must be configured
config/email.tmplEmail alert templateDefault email template, modifiable
config/templates/prometheus-dashboard.yamlCustom dashboardNo modifications required

Logging into the Admin Backend

Enter http://ip:11002 in the browser to access the admin backend. This IP is the server OPENIM_IP; ensure your browser can access it. The default username and password are both chatAdmin.

admin

Logging into Grafana

First, log into the admin backend, then click the data monitoring menu on the left, enter the default username (admin) and password (admin) to log into Grafana.

PC Web Interface

Adding Prometheus Data Source

As shown below, enter the URL of the Prometheus data source: http://172.28.0.1:19090 (19090 is the default Prometheus port) and click "Save and Test" to save.

PC Web Interface

PC Web Interface

Importing a Custom Dashboard

Click the import button as shown below to import the dashboard.

PC Web Interface

Copy the content from https://github.com/openimsdk/open-im-server/tree/main/config/templates/prometheus-dashboard.yaml into the area shown below, then click the load button.

PC Web Interface

Select your Data Source and job, customize metric information as shown below.

PC Web Interface

Importing node-exporter's Dashboard

Enter 1860 to import, or find other node-exporter dashboard views on the official website (https://grafana.com/grafana/dashboards/).

PC Web Interface

node-exporter metric information, as shown below.

PC Web Interface

Alert Configuration File Explanation

  1. Email Alert Architecture Diagram: The Prometheus component loads the instance-down-rules.yml file for alert rules, sending qualifying alert information to the Alertmanager component. Alertmanager loads the alertmanager.yml and email.tmpl files, sending emails using configured alert email information and templates. PC Web Interface

  2. prometheus.yml File Explanation: Mainly used to configure the path of the alert rules file, alert management service address, and IP addresses for monitoring data capture. Default settings usually require no modification.

    # Alertmanager configuration
    alerting:
    alertmanagers:
    - static_configs:
    - targets: ['172.28.0.1:19093']

    # Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
    rule_files:
    - "instance-down-rules.yml"
  3. instance-down-rules.yaml File Explanation: By default, two email alert rules are implemented (instance_down, database_insert_failure_alerts). To add more alert rules, they can be included in the instance-down-rules.yml file:

    groups:
    - name: instance_down # Alert Rule One: Triggers if a monitoring module crashes for over a minute
    rules:
    - alert: InstanceDown
    expr: up == 0
    for: 1m
    labels:
    severity: critical
    annotations:
    summary: "Instance {{ $labels.instance }} down"
    description: "{{ $labels.instance }} of job {{ $labels.job }} has been down for more than 1 minutes."

    - name: database_insert_failure_alerts # Alert Rule Two: Triggers if there's an increase in msg_insert_redis_failed_total and msg_insert_mongo_failed_total
    rules:
    - alert: DatabaseInsertFailed
    expr: (increase(msg_insert_redis_failed_total[5m]) > 0) or (increase(msg_insert_mongo_failed_total[5m]) > 0)
    for: 1m
    labels:
    severity: critical
    annotations:
    summary: "Increase in MsgInsertRedisFailedCounter or MsgInsertMongoFailedCounter detected"
    description: "Either MsgInsertRedisFailedCounter or MsgInsertMongoFailedCounter has increased in the last 5 minutes, indicating failures in message insert operations to Redis or MongoDB, maybe the redis or mongodb is crash."
  4. alertmanager.yml File Explanation: Modify the sender and receiver email configuration information to receive alert messages. To implement alerts through DingTalk, WeChat Work, or other means, you need to rewrite alertmanager.yml. Official documentation for the alert management module can be referred to here: https://prometheus.io/docs/alerting/latest/alertmanager/

    global:
    resolve_timeout: 5m
    smtp_from: alert@openim.io # Alert sending email
    smtp_smarthost: smtp.163.com:465 # SMTP address for sending email
    smtp_auth_username: alert@openim.io # Email authorization username, usually same as smtp_from
    smtp_auth_password: YOURAUTHPASSWORD # Email authorization code
    smtp_require_tls: false
    smtp_hello: openim alert

    templates:
    - /etc/alertmanager/email.tmpl # Email template

    route:
    group_by: ['alertname']
    group_wait: 5s
    group_interval: 5s
    repeat_interval: 5m
    receiver: email
    receivers:
    - name: email
    email_configs:
    - to: 'alert@example.com' # Receiving alert email
    html: '{{ template "email.to.html" . }}'
    headers: { Subject: "[OPENIM-SERVER]Alarm" }# Email title
    send_resolved: true
  1. Email Template File email.tmpl Explanation: This file is in HTML format. The alarm management module will fill in the variable information inside it, and then render it into an HTML format file for email sending. It can be rewritten according to your needs:
{{ define "email.to.html" }}
{{ range .Alerts }}
<!-- Begin of OpenIM Alert -->
<div style="border:1px solid #ccc; padding:10px; margin-bottom:10px;">
<h3>OpenIM Alert</h3>
<p><strong>Alert Program:</strong> Prometheus Alert</p>
<p><strong>Severity Level:</strong> {{ .Labels.severity }}</p>
<p><strong>Alert Type:</strong> {{ .Labels.alertname }}</p>
<p><strong>Affected Host:</strong> {{ .Labels.instance }}</p>
<p><strong>Affected Service:</strong> {{ .Labels.job }}</p>
<p><strong>Alert Subject:</strong> {{ .Annotations.summary }}</p>
<p><strong>Trigger Time:</strong> {{ .StartsAt.Format "2006-01-02 15:04:05" }}</p>
</div>
<!-- End of OpenIM Alert -->
{{ end }}
{{ end }}

Alarm Experience

You can manually trigger the instancedown alarm rule. If you have deployed OpenIM from source code, execute the make stop command to stop the openim-server service. Wait for more than 5 minutes, and you will receive an alarm email as shown below:

PC Web Interface

Logging System

If OpenIM service is deployed in a k8s environment using the helm chart method, you can view the logs of all OpenIM services through Grafana. Currently, binary and Docker deployments do not integrate the Loki logging collection component. To experience the Loki logging collection function, please use the helm chart deployment. For more details, please visit https://github.com/openimsdk/helm-charts/blob/main/docs/user-guide-zh.md