Use the Healthchecks service
About Healthchecks
Healthchecks is an open-source powerful monitoring and alerting tool for applications and services. It allows to easily monitor the availability of a system or the execution of a program by periodically sending HTTP "keep-alive" requests to custom endpoints. All ping endpoints support:
- HTTP (1.0, 1.1, 2) and HTTPS
- IPv4 and IPv6
- HEAD, GET, and POST request methods.
HTTP POST requests can optionally include diagnostic information in the
request body and if the request body looks like a UTF-8 string,
Healthchecks.io will store the request body (limited to the first 100
kB). The ping API also support receiving signals to /start and /failure
specific endpoints. Successful responses will have the 200 OK HTTP
response status code and a short "OK" string in the response body. A
rate limit of 5 request per minute for each check is in place, after
that the response will be "200 OK (rate limited)" as a warning, however
futher requests will not be stopped.
For each check the user can set a period, the expected time between pings, and a grace time, the time to wait before sending an alert if a check hasn't received a ping during the aforementioned period.
Healthchecks can then notify the user if an alert has been emitted or periodically send a report email with the status of selected checks. The service uses email as main channel of communication, however other various third-party services can be configured using webhooks or specific API.
In addition to the instant notifications, the service can provide a weekly or monthly report email with the status of the checks and an ongoing reminder email if a check is down.
The service provides an user-friendly web interface to manage checks, view their status history, and configure all the various alerting options. Healthchecks.io also offers advanced features like maintenance windows, tags for organizing checks and integration with popular monitoring tools like Grafana and Prometheus.
This guide is inteded to help new users get started with the Healthchecks service provided by INFN Cloud. The full documentation of the service can be found at https://healthchecks.cloud.infn.it/docs/ .
Prerequisites
The user has to be registered in the IAM system for INFN-Cloud https://iam.cloud.infn.it/login. Only registered users can login into the service endpoint https://healthchecks.cloud.infn.it.
- For more details regarding the registration process please see Getting Started. To use this service you don't need the "system admin" nomination.
Access the service
Healthchecks is setup as a high-available service that can be accessed via the endpoint which refers to two different sites: Bari and CNAF.
You will be immediately asked to log in to the system using the INFN Cloud IAM.
Select "Sign in with INFN CCR-AAI" button and use your AAI credential.
You will be then redirected to the Healthchecks homepage.
Here all your projects will be shown. By default a projecte named as your email address will be created, but you can create more projects to organize your checks. Projects are useful to group checks together and share them with other users.
Manage checks
In this section we will use the default project as an example. Enter the project by clicking on the project name in the homepage:
Here you will see all the checks in the project and an header bar with a menu, we will come back on this in the following sections. By default a check is already created here with the name "My first check". As you can see from Fig. 5, for each check it is shown:
- status of the check (up, down, paused)
- name of the check
- ping url (the endpoint to ping to keep the check alive)
- integrations associated with the check (more on this later)
- period and grace time of the check
- last ping time
At the end of each row there are a bell button which let's you pause or resume a check and a three-dot button that allows you to edit the check. Let's click on the latter to see the check details.
See and edit check details
The check details page will now open, showing all the information about the check.
You can immediately see that the page is divided in two columns: the left one shows the check details and the right one shows the event history. As there is much going on in this page, let's break it down in sections, starting from the left column.
Check informations
Here you can see of the check and the description, useful to remember what the check is about.
By clicking on the "Edit" button you can change: name, slug, tags and description of the check. The slug is a human-readable string that you can customize, useful to identify the check in the ping url (more on this later). You can use the suggested value, which is generated from the check name by removing whitespaces and special characters, or set your own. Tags are useful to organize checks and can be used to filter checks in the project page.
How to ping
In this section you can see the ping url, the endpoint that you need make an HTTP request to in order to keep the check alive. You can use either the url based on the uuid, which is a random and unique string generated by the service, or the slug, that you can customize as seen previously, in combination with the ping API key of the project. For more information on the ping key see the "Edit project settings" section.
Next, a direct link to the Healtchcecks API documentation is provided, explaining how to explicitly signal a failure or measure job execution time.
In the bottom right part three buttons are shown. The first one lets you edit the filtering rules for the check, which can be useful to avoid false positives.
Here you can set the allowed request methods for HTTP requests and what to do when a paused check receives a ping.
The second button opens a pop-up showing you some examples of how to ping the check by adding a crontab entry or by using curl, python, and other languages. Finally, the third button lets you copy the ping url to the clipboard.
Current status
This section shows the current status of the check and the last time it was pinged. Moreover, a table shows a summary of the status for the last three months, with informations on the number of downtimes, the total downtime, and the uptime percentage.
Finally in the bottom right part there is a button to pause or resume the check and a button to ping the check immediately.
Schedule
In this section you can see the period and grace time of the check. The period is the expected time between pings, while the grace time is the time to wait before sending an alert if a check hasn't received a ping during the aforementioned period.
You can edit the schedule by clicking on the "Edit" button and either using the simple graphical interface or a Crontab/OnCalendar expression. You can set the period and the grace time from 1 minute to 365 days.
Notification methods
In this section you can see the notification methods enabled in the project that contains your check. You can turn each notification method on or off by clicking on it, a green 'ON' label and a gray 'OFF' labbel will signal the status of the method.
Danger zone
Finally in the botton left column there is the "Danger zone" section, where you can:
- create a copy of the check
- transfer the check to another project
- clear all the events of the check
- delete the check
Events
In the right column you can see the event history of the check, which is a complete log of all the events, and their details, that have happened to the check. Here you can see, for example, when the check was pinged, when it switched status, and when an alert was sent.
Create a new check
If you want to create a new check, you can do so by clicking on the "Create a new check" button in the project page.
A pop-up will open where you can set for the check:
- the name
- the slug
- possible tags
- schedule (period and grace time)
Ping your check
To ping your check you can use the ping url provided in the check details page. As shown before, in the "How to ping" section of the details page, you can find some examples of how to ping the check using curl, python, and other languages.
Be careful that the ping endpoint is not protected by any authentication mechanism, as it is necessary to know either the uuid or, in case you have created a slug for the check, the ping API key of the project. Both the uuid and the ping key are complex and random string that are hard to guess. Therefore you should not publicly share neither of them. To share a check and/or a project see the "Share a project" section.
To ping a check you can use curl from any terminal:
curl --retry 3 https://healthchecks.cloud.infn.it/ping/your-ping-url
However the real advantages of Healthchecks are that you can easily implement in your software sending the HTTP request, no matter of the programming language used, and that you include diagnostic information in the request body that Healthchecks will store.
Moreover, you can use the Healthchecks API to explicitly signal a failure or measure job execution time. To signal a failure you can either append */fail to the ping url or any exit code different from 0:
# Reports failure by appending the /fail suffix:
curl --retry 3 https://healthchecks.cloud.infn.it/ping/your-uuid-here/fail
# Reports failure by appending a non-zero exit status:
curl --retry 3 https://healthchecks.cloud.infn.it/ping/your-uuid-here/1
The latter approach is particularly useful when you want to report in the ping the exit status of a script or a program:
#!/bin/sh
/usr/bin/certbot renew
curl --retry 3 https://healthchecks.cloud.infn.it/ping/your-uuid-here/$?
To measure job execution time you can append */start to the ping url to start a timer that will be stopped by any success ping received.
# Starts a timer by appending the /start suffix:
curl --retry 3 https://healthchecks.cloud.infn.it/ping/your-uuid-here/start
# Stops the timer by sending a success ping:
curl --retry 3 https://healthchecks.cloud.infn.it/ping/your-uuid-here
You can see the execution time in the check details page, in the "Events" section.
Manage projects
Create a new project
To create a new project you can click on the "Create a new project" button in the homepage.
A pop-up will open where you can set the name of the project.
You will be then redirected to the project page, where you can see all the checks in the project. Of course since the project is new, there will be no checks in it and a warning message will be shown. You can create a new check by clicking on the "Create a new check" button and following the procedure described above.
Another warning will appear on the "Integration" menu entry, as no integrations are enabled by default in a new project you create. Check next section to see how to enable them.
Add integrations
An important feature of Healthchecks is the possibility to send notifications about the checks to various third-party services. The main channel of communication is email, and it will be already enabled in the default project with the address associated in your AAI profile but for any new project you will have to enable it manually. In any case you can also setup multiple email addresses to notify for each project by adding another email integration or add any other service from those available in the integrations page.
You can access this page by clicking on the "Integrations" entry in the headerbar menu.
If your project has no integrations enabled, a warning message will be shown as in Fig. 21. To add a new integration scroll down to the Add more section and click on the Add Integration green button near the service you want to add.
Depending on the service you choose, you will be guided on how to provide the necessary information to configure the integration. Please remember that once you have added the integration, you will have to enable it for each check in the project you want to receive notifications for.
Other than the instant alert received through the chosen integration, you can also receive a weekly or monthly report email with the status of the checks and a ongoing reminder email if a check is down. To set the frequency of the reports and the reminder emails see the "Email report" paragraph under the "Edit Account settings" section of this guide.
In the case of email, you will simply asked to provide an email address to send the notifications to and to specify when to send the notifications (on down and/or up events).
Please be aware that to avoid malicious use, after entering an email address different to the one associated with your profile, the service will send out a confirmation link to the specified address to confirm it. This will be repeated each time you add the address in a project even if the address was already confirmed in another project.
Only confirmed addresses can receive notifications. The default address is always considered confirmed for any project.
To verify the email simply click on the link in the email you will receive as shown in Fig. 23. You will be redirected to the Healthchecks page where you will see a confirmation message such as the one in Fig. 24 and the email will be marked as verified in the integrations page.
Push notifications with ntfy
If you want to receive notifications on your mobile device, you can use the ntfy service, a free and open-source service that allows you to send push notifications to your mobile device or desktop by subribing to a topic where you can publish and read messages. You can use the service via the official instance at https://ntfy.sh which offers a free and a \'pro\' paid plan. The main difference between the two plans is the ability, in the paid plan, to make the topic private so you can manage who can read and write to it. For the tipical healthchecks use case the free plan is perfectly fine, as we will create the topic with a random string name that will be hard to guess, as it happens with the uuid of the checks. Please also note that the free plan has a limit of 250 daily messages.
To use ntfy, you will need to create an account on the ntfy website and also download the app on your mobile device if you want to receive push notifications there. You can follow the official documentation at https://docs.ntfy.sh/ for the deatils.
In the following we will show you how to set up the integration with ntfy in Healthchecks. Open the ntfy.sh dashboard and login with your account, if needed. Click on the Subscribe to topic button, a pop-up will open where you can enter the name of the topic you want to subscribe to. As topics are never created or deleted and are public by default, you need to choose a topic name that is hard to guess and possibily random. To do so you could use the uuid of the project you are adding the integration to (remember that integrations are project-specific and not check-specific!), which you can find in the browser address bar (see Fig. 25), or generate a random string using the Generate name button (see Fig. 26).
Once you have chosen the topic name, click on the Subscribe button and the the page will update showing the topic on the lest side menu bar.
You can click on the topic name to see all the messages sent to it. Now go back to the Healthchecks integrations page and find ntfy in the list of all available integrations, then click on its Add Integration button.
Now you will be asked to enter the topic name you just created and specify the proiority level of the notifications for the down and up events, each priority has a different vibration and sound intensity. When you are done, click on the Save integration button and the integration will be added to the project.
Microsoft Teams
You can also receive healthchecks alerts in a Microsoft Teams channel you have admin access to. To do so go again to the integrations page, search for the Microsoft Teams integration and click on the Add Integration button. A web page will open with detailed step-by-step instructions on how to create a webhook in Microsoft Teams and add it to the integration.
If you have the Teams app installed on your mobile device, you can also receive push notifications there.
Edit project settings
To edit the settings of a project you can click on the "Settings" entry in the headerbar menu.
This page is also divided in sections as described in the following.
Project name
Here you can simply change the name of the project.
API access
In this section you can see the read-write API key, the read only API key and the ping API key of the project. To get more information on the API keys click on the "API documentation" link shown in the page or visit the Healthchecks API documentation. The ping API key is the key you need to use to ping a check wit the slug, as seen in the "How to ping" section of the check details page.
To create a key, in this example the ping key, click on the "Create" link on the proper row.
The page will update and the newly created key will be shown alongside a green "Key created" confirmation message. The "Create" link will be replaced by a "Revoke" link, that will let you revoke the key if necessary.
Team access
Here you can see the list of users that have access to the project and their role and add a new one by clicking on the "Invite a Team Member" green button.
A pop-up will open where you can enter the email address of the user you want to invite and choose the role among "Team Member", "Manager" and "Read-only" that best fits the user needs.
Please note that the email address must be the same as the one associated with the user's IAM profile as it is the one used to login into the service. Each user can check its email address by simply clicking on the "Account" entry in the headerbar menu and looking at the field as shown in Fig. 29.
Transfer ownership
In this section you can transfer the ownership of the project to another user already included in the team. To do so, click on the "Transfer ownership" button and select the user you want to transfer the ownership to.
Remove project
Finally in the bottom of the page you can permanently delete the project by clicking on the "Remove project" button.
Edit Account settings
It is possible to edit few settings specific to the user account by clicking on the "Account" entry in the headerbar menu and then "Account settings" in the drop-down menu that will open.
The setting page is organized in three tabs, visible on the left: "Account", "Apparence" and "Email Reports".
Account
Clicking on the "Account" tab lets you see all your projects and your role in them. You can also quickly access the project setting page by clicking on "Settings" near the project name.
It is also possible to completely delete the account and all the projects and checks owned by the account by clicking on the "Close account" button. This will irreversibly delete all the data associated with the account from the database. If you will access the service again by visiting the webpage, a new empty account, with the same email address, will be created automatically.
Appearence
Here you can choose the theme of the web interface between "Light", "Dark" or "System" that will follow the system settings.
Email Reports
This last section is particularly important as it lets you choose the frequency of the periodic email reports that Healthchecks will send you. You can choose between "Off", "Weekly on Mondays" and "Monthly on the 1st day" and the time of the day when the report will be sent. You can also change the time zone here to match your local time, as the periodic reports will be sent between 9am and 11am in the selected time zone.
Moreover you can set the frequency of the ongoing reminder emails that Healthchecks will send you if a check is down.