Use the Healthchecks service

About Healthchecks

Healthchecks is an open-source powerful monitoring and alerting tool for applications and services. It allows to easily monitor the availability of a system or the execution of a program by periodically sending HTTP “keep-alive” requests to custom endpoints. All ping endpoints support:

  • HTTP (1.0, 1.1, 2) and HTTPS
  • IPv4 and IPv6
  • HEAD, GET, and POST request methods.

HTTP POST requests can optionally include diagnostic information in the request body and if the request body looks like a UTF-8 string, Healthchecks.io will store the request body (limited to the first 100 kB). The ping API also support receiving signals to */start and */failure specific endpoints. Successful responses will have the “200 OK” HTTP response status code and a short “OK” string in the response body. A rate limit of 5 request per minute for each check is in place, after that the response will be “200 OK (rate limited)” as a warning, however futher requests will not be stopped.

For each check the user can set a period, the expected time between pings, and a grace time, the time to wait before sending an alert if a check hasn’t received a ping during the aforementioned period.

Healthchecks can then notify the user if an alert has been emitted or periodically send a report email with the status of selected checks. The service uses email as main channel of communication, however other various third-party services can be configured using webhooks or specific API.

In addition to the instant notifications, the service can provide a weekly or monthly report email with the status of the checks and an ongoing reminder email if a check is down.

The service provides an user-friendly web interface to manage checks, view their status history, and configure all the various alerting options. Healthchecks.io also offers advanced features like maintenance windows, tags for organizing checks and integration with popular monitoring tools like Grafana and Prometheus.

This guide is inteded to help new users get started with the Healthchecks service provided by INFN Cloud. The full documentation of the service can be found at https://healthchecks.cloud.infn.it/docs/ .

Prerequisites

The user has to be registered in the IAM system for INFN-Cloud https://iam.cloud.infn.it/login. Only registered users can login into the service endpoint https://healthchecks.cloud.infn.it.

  • For more details regarding the registration process please see Getting Started. To use this service you don’t need the “system admin” nomination.

Access the service

Healthchecks is setup as a high-available service that can be accessed via the endpoint which refers to two different sites: Bari and CNAF.

You will be immediately asked to log in to the system using the INFN Cloud IAM.

IAM login

Figure 1: INFN Cloud IAM login

Select “Sign in with INFN CCR-AAI” button and use your AAI credential.

INFN-AAI login

Fig 2: Using INFN-AAI identity

You will be then redirected to the Healthchecks homepage.

Healthchecks homepage

Fig 3: Healthchecks homepage

Here all your projects will be shown. By default a projecte named as your email address will be created, but you can create more projects to organize your checks. Projects are useful to group checks together and share them with other users.

Manage checks

In this section we will use the default project as an example. Enter the project by clicking on the project name in the homepage:

open project page

Fig 4: Open project page

Here you will see all the checks in the project and an header bar with a menu, we will come back on this in the following sections. By default a check is already created here with the name “My first check”. As you can see from Fig. 5, for each check it is shown:

  • status of the check (up, down, paused)
  • name of the check
  • ping url (the endpoint to ping to keep the check alive)
  • integrations associated with the check (more on this later)
  • period and grace time of the check
  • last ping time

At the end of each row there are a bell button which let’s you pause or resume a check and a three-dot button that allows you to edit the check. Let’s click on the latter to see the check details.

access check details

Fig 5: Access check details

See and edit check details

The check details page will now open, showing all the information about the check.

You can immediately see that the page is divided in two columns: the left one shows the check details and the right one shows the event history. As there is much going on in this page, let’s break it down in sections, starting from the left column.

Check informations

Here you can see of the check and the description, useful to remember what the check is about.

check informations

Fig 7: Check informations

By clicking on the “Edit” button you can change: name, slug, tags and description of the check. The slug is a human-readable string that you can customize, useful to identify the check in the ping url (more on this later). You can use the suggested value, which is generated from the check name by removing whitespaces and special characters, or set your own. Tags are useful to organize checks and can be used to filter checks in the project page.

edit check informations

Fig 8: Edit check informations

How to ping

In this section you can see the ping url, the endpoint that you need make an HTTP request to in order to keep the check alive. You can use either the url based on the uuid, which is a random and unique string generated by the service, or the slug, that you can customize as seen previously, in combination with the ping API key of the project. For more information on the ping key see the “Edit project settings” section.

Next, a direct link to the Healtchcecks API documentation is provided, explaining how to explicitly signal a failure or measure job execution time.

In the bottom right part three buttons are shown. The first one lets you edit the filtering rules for the check, which can be useful to avoid false positives.

ping filtering rules

Fig 9: Ping filtering rules

Here you can set the allowed request methods for HTTP requests and what to do when a paused check receives a ping.

The second button opens a pop-up showing you some examples of how to ping the check by adding a crontab entry or by using curl, python, and other languages. Finally, the third button lets you copy the ping url to the clipboard.

Current status

This section shows the current status of the check and the last time it was pinged. Moreover, a table shows a summary of the status for the last three months, with informations on the number of downtimes, the total downtime, and the uptime percentage.

check status

Fig 10: Check status

Finally in the bottom right part there is a button to pause or resume the check and a button to ping the check immediately.

Schedule

In this section you can see the period and grace time of the check. The period is the expected time between pings, while the grace time is the time to wait before sending an alert if a check hasn’t received a ping during the aforementioned period.

check schedule

Fig 11: Check schedule

You can edit the schedule by clicking on the “Edit” button and either using the simple graphical interface or a Crontab/OnCalendar expression. You can set the period and the grace time from 1 minute to 365 days.

edit check schedule

Fig 12: Edit check schedule

Notification methods

In this section you can see the notification methods enabled in the project that contains your check. You can turn each notification method on or off by clicking on it, a green ‘ON’ label and a gray ‘OFF’ labbel will signal the status of the method.

check notifications

Fig 13: Check notifications

Danger zone

Finally in the botton left column there is the “Danger zone” section, where you can:

  • create a copy of the check
  • transfer the check to another project
  • clear all the events of the check
  • delete the check
danger zone

Fig 14: Check danger zone

Events

In the right column you can see the event history of the check, which is a complete log of all the events, and their details, that have happened to the check. Here you can see, for example, when the check was pinged, when it switched status, and when an alert was sent.

check events history

Fig 15: Check events history

Create a new check

If you want to create a new check, you can do so by clicking on the “Create a new check” button in the project page.

new check button

Fig 16: New check button

A pop-up will open where you can set for the check:

  • the name
  • the slug
  • possible tags
  • schedule (period and grace time)
create a new check

Fig 17: Create a new check

Ping your check

To ping your check you can use the ping url provided in the check details page. As shown before, in the “How to ping” section of the details page, you can find some examples of how to ping the check using curl, python, and other languages.

Be careful that the ping endpoint is not protected by any authentication mechanism, as it is necessary to know either the uuid or, in case you have created a slug for the check, the ping API key of the project. Both the uuid and the ping key are complex and random string that are hard to guess. Therefore you should not publicly share neither of them. To share a check and/or a project see the “Share a project” section.

To ping a check you can use curl from any terminal:

curl --retry 3 https://healthchecks.cloud.infn.it/ping/your-ping-url

However the real advantages of Healthchecks are that you can easily implement in your software sending the HTTP request, no matter of the programming language used, and that you include diagnostic information in the request body that Healthchecks will store.

Moreover, you can use the Healthchecks API to explicitly signal a failure or measure job execution time. To signal a failure you can either append */fail to the ping url or any exit code different from 0:

# Reports failure by appending the /fail suffix:
curl --retry 3 https://healthchecks.cloud.infn.it/ping/your-uuid-here/fail

# Reports failure by appending a non-zero exit status:
curl --retry 3 https://healthchecks.cloud.infn.it/ping/your-uuid-here/1

The latter approach is particularly useful when you want to report in the ping the exit status of a script or a program:

#!/bin/sh

/usr/bin/certbot renew
curl --retry 3 https://healthchecks.cloud.infn.it/ping/your-uuid-here/$?

To measure job execution time you can append */start to the ping url to start a timer that will be stopped by any success ping received.

# Starts a timer by appending the /start suffix:
curl --retry 3 https://healthchecks.cloud.infn.it/ping/your-uuid-here/start

# Stops the timer by sending a success ping:
curl --retry 3 https://healthchecks.cloud.infn.it/ping/your-uuid-here

You can see the execution time in the check details page, in the “Events” section.

Manage projects

Create a new project

To create a new project you can click on the “Create a new project” button in the homepage.

new project button

Fig 18: New project button

A pop-up will open where you can set the name of the project.

create a new project

Fig 19: Create a new project

You will be then redirected to the project page, where you can see all the checks in the project. Of course since the project is new, there will be no checks in it and a warning message will be shown. You can create a new check by clicking on the “Create a new check” button and following the procedure described above.

Another warning will appear on the “Integration” menu entry, as no integrations are enabled by default in a new project you create. Check next section to see how to enable them.

Add integrations

An important feature of Healthchecks is the possibility to send notifications about the checks to various third-party services. The main channel of communication is email, and it will be already enabled in the default project with the address associated in your AAI profile but for any new project you will have to enable it manually. In any case you can also setup multiple email addresses to notify for each project by adding another email integration or add any other service from those available in the integrations page.

You can access this page by clicking on the “Integrations” entry in the headerbar menu.

access integrations page

Fig 20: Access Integrations page

If your project has no integrations enabled, a warning message will be shown as in Fig. 21. To add a new integration scroll down to the Add more section and click on the Add Integration green button near the service you want to add.

integrations page

Fig 21: Integrations page, the list of services in the screen is not exhaustive

Depending on the service you choose, you will be guided on how to provide the necessary information to configure the integration. In the case of email, you will simply asked to provide an email address to send the notifications to and to specify when to send the notifications (on down and/or up events).

new email integration

Fig 22: New email integration

Please be aware that to avoid malicious use, after entering an email address different to the one associated with your profile, the service will send out a confirmation link to the specified address to confirm it. This will be repeated each time you add the address in a project even if the address was already confirmed in another project.

Only confirmed addresses can receive notifications. The default address is always considered confirmed for any project.

email integration confirmation link

Fig 23: Email integration confirmation link

To verify the email simply click on the link in the email you will receive as shown in Fig. 23. You will be redirected to the Healthchecks page where you will see a confirmation message such as the one in Fig. 24 and the email will be marked as verified in the integrations page.

email integration verified

Fig 24: Email integration verified

Other than the instant alert received through the chosen integration, you can also receive a weekly or monthly report email with the status of the checks and a ongoing reminder email if a check is down. To set the frequency of the reports and the reminder emails see the “Email report” paragraph under the “Edit Account settings” section of this guide.

Edit project settings

To edit the settings of a project you can click on the “Settings” entry in the headerbar menu.

access project settings

Fig 25: Access project settings

This page is also divided in sections as described in the following.

Project name

Here you can simply change the name of the project.

edit project name

Fig 25: Edit project name

API access

In this section you can see the read-write API key, the read only API key and the ping API key of the project. To get more information on the API keys click on the “API documentation” link shown in the page or visit the Healthchecks API documentation. The ping API key is the key you need to use to ping a check wit the slug, as seen in the “How to ping” section of the check details page.

To create a key, in this example the ping key, click on the “Create” link on the proper row.

Create API ping key

Fig 26: Create API ping key

The page will update and the newly created key will be shown alongside a green “Key created” confirmation message. The “Create” link will be replaced by a “Revoke” link, that will let you revoke the key if necessary.

Team access

Here you can see the list of users that have access to the project and their role and add a new one by clicking on the “Invite a Team Member” green button.

team access

Fig 27: Team access view

A pop-up will open where you can enter the email address of the user you want to invite and choose the role among “Team Member”, “Manager” and “Read-only” that best fits the user needs.

invite a team member

Fig 28: Invite a team member

Please note that the email address must be the same as the one associated with the user’s IAM profile as it is the one used to login into the service. Each user can check its email address by simply clicking on the “Account” entry in the headerbar menu and looking at the field as shown in Fig. 29.

access account settings

Fig 29: Access account email

Transfer ownership

In this section you can transfer the ownership of the project to another user already included in the team. To do so, click on the “Transfer ownership” button and select the user you want to transfer the ownership to.

transfer team ownership

Fig 29: Transfer team ownership

Remove project

Finally in the bottom of the page you can permanently delete the project by clicking on the “Remove project” button.

remove project

Fig 30: Remove project

Edit Account settings

It is possible to edit few settings specific to the user account by clicking on the “Account” entry in the headerbar menu and then “Account settings” in the drop-down menu that will open.

access account settings

Fig 31: Access account settings

The setting page is organized in three tabs, visible on the left: “Account”, “Apparence” and “Email Reports”.

Account

Clicking on the “Account” tab lets you see all your projects and your role in them. You can also quickly access the project setting page by clicking on “Settings” near the project name.

account settings

Fig 31: Account settings

It is also possible to completely delete the account and all the projects and checks owned by the account by clicking on the “Close account” button. This will irreversibly delete all the data associated with the account from the database. If you will access the service again by visiting the webpage, a new empty account, with the same email address, will be created automatically.

Appearence

Here you can choose the theme of the web interface between “Light”, “Dark” or “System” that will follow the system settings.

appearence settings

Fig 32: Appearence settings

Email Reports

This last section is particularly important as it lets you choose the frequency of the periodic email reports that Healthchecks will send you. You can choose between “Off”, “Weekly on Mondays” and “Monthly on the 1st day” and the time of the day when the report will be sent. You can also change the time zone here to match your local time, as the periodic reports will be sent between 9am and 11am in the selected time zone.

Moreover you can set the frequency of the ongoing reminder emails that Healthchecks will send you if a check is down.

email reports settings

Fig 33: Email reports settings