How to use Nagios and NRPE to monitor remote OPNSense HA secondary routers
by firestorm_v1 on Jul.19, 2024, under Linux, Networking, Software
In this article, I’ll be discussing how to use the Nagios NRPE (Nagios Remote Program Executor) service to monitor the remote secondary OPNSense firewall in a high availability configuration to overcome a VPN routing limitation where the secondary instance is not reachable. The root cause is due to the way the VPN routing is performed in OPNSense where the incoming traffic flow won’t communicate due to a routing conflict. While this issue isn’t an OPNSense specific issue (it affects pfSense and other firewalls that use HA/CARP with VPN).
Not just OPNSense, but pfSense and others too!
In order to effectively monitor both nodes of an HA cluster regardless of the state of the individual nodes (either primary or backup), the use of an NRPE reflector is necessary. While this article is written around using OPNSense, the same method can be used to monitor both nodes in a pfSense HA configuration and others that use the HA/CARP and VPN termination together.
Why do I call it a reflector? Because the checks in question are not intended for the NRPE server itself, but rather servers that the NRPE server can access. From Nagios’s perspective, the checks are being reflected from the NRPE server to the firewalls I want to check versus going to the firewalls directly. If at anything, use this as a practical example that you can use NRPE to check things on the remote end of a VPN, not just on the NRPE host itself. Most documentation online demonstrates using NRPE to perform checks on the host NRPE is installed in but this is
Prerequisites
While there may be specific environments that require additional adjustments, generally all that’s needed is a Linux host in some form that’s reachable from Nagios on the remote end. In my case, I’m using a Raspberry Pi dedicated to running scripts to also run the Nagios NRPE service. The requirements are fairly simple:
- A Linux host that has build tools installed and is located on the remote LAN side of the VPN.
- A repository of the nagios checks you want to perform
- A working Nagios host on the local LAN side of the VPN.
My Environment
In my environment, the remote LAN side network is 10.0.0.0/24 with the primary OPNSense host at 10.0.0.2 (DNS: ruckus01.lan.home.matrix) and secondary at 10.0.0.3(DNS: ruckus02.lan.home.matrix) with the VIP at 10.0.0.1. The script host is at 10.0.0.197 and the Nagios server is at 10.1.0.8.
Installation and configuration of Nagios-NRPE-Server
- Install Nagios NRPE by using your host’s package manager (either apt-get install nagios-nrpe-server or yum install nagios-nrpe-server)
- Download the Nagios Plugins wget https://nagios-plugins.org/download/nagios-plugins-2.4.10.tar.gz
- Install libssl-dev (apt-get install libssl-dev or yum install libssl-dev)
- Un-tar the nagios plugins source code using tar -xzvf nagios-plugins-2.4.10.tar.gz
- Go into the directory using cd nagios-plugins-2.4.10
- Compile and install the Nagios plugins: ./configure && make && make install
- If you are using a repository for your Nagios plugins, be sure you are aware of which architecture the plugins are compiled for. Plugins compiled on x86-64 for Nagios Server are not compatible with aarch64 (ARM) architecture. If you are doing what I’m doing with a Raspberry Pi as a reflector and Nagios running in a standard 64bit VM, omit the “make install” part of the above command and copy the checks you want to your repository using a variant name (for example, check_http is an x86_64 application, check_http_arm is the aarch64 version).
Once installed, we can start on the Nagios NRPE Server configuration. There are five checks that I implemented, you can add or remove them as necessary:
- check_http – This command checks that the LAN-side interface of the target router is reachable from the LAN on port 80. Even though port 80 redirects to port 443, it’s still necessary to check that it’s open.
- check_https – This command checks that the LAN-side interface of the target router is reachable from the LAN on port 443.
- check_ping – This command checks that the LAN-side interface is pingable with a minimum threshold to alert if the router becomes unreachable.
- check_carp – This is a script I wrote that checks the OPNSense CARP VIP statuses using the OPNSense API. It accepts two parameters (site) and (role) to determine which API keys, hosts, and expected states of the CARP VIPs for alerting. This is not included in the Nagios Plugins package.
- check_https_cert – This command checks that the SSL cert presented by the router is valid, not expired, and maps to an internal CA used for the certs in my environment.
The five aspects (http, https, ping, carp status, and SSL certificate) will be run against both the master and the backup node so the Nagios NRPE configuration will have two sets of checks. In order to facilitate this, we will use primary_ and secondary_ to denote which check is for what host.
command[primary_webui_http]=/usr/lib/nagios/plugins/check_http_arm -I 10.0.0.2 -p80
command[primary_webui_https]=/usr/lib/nagios/plugins/check_http_arm -I 10.0.0.2 -p443 -S
command[primary_ping]=/usr/lib/nagios/plugins/check_ping_arm -H 10.0.0.2 -w 3000.0,80% -c 5000.0,100% -p 5
command[primary_carp_master]=/usr/lib/nagios/plugins/check_carp home master
command[primary_webui_ssl]=/usr/lib/nagios/plugins/check_ssl_cert -H 10.0.0.2 --sni ruckus01.lan.home.matrix -r /etc/ssl/certs/Matrix_CA_Root.crt
command[secondary_webui_http]=/usr/lib/nagios/plugins/check_http_arm -I 10.0.0.3 -p80
command[secondary_webui_https]=/usr/lib/nagios/plugins/check_http_arm -I 10.0.0.3 -p443 -S
command[secondary_ping]=/usr/lib/nagios/plugins/check_ping_arm -H 10.0.0.3 -w 3000.0,80% -c 5000.0,100% -p 5
command[secondary_carp_backup]=/usr/lib/nagios/plugins/check_carp home slave
command[secondary_webui_ssl]=/usr/lib/nagios/plugins/check_ssl_cert -H 10.0.0.3 --sni ruckus02.lan.home.matrix -r /etc/ssl/certs/Matrix_CA_Root.crt
Before you restart NRPE, make sure to add your Nagios server’s IP address to the allowed_hosts directive.
Finally, restart NRPE using systemctl restart nagios-nrpe-server
Nagios Configuration
Configuring Nagios comes in two parts, first, definition of the check_reflector_nrpe check command, and then the host and service configuration to use the new command to perform the checks.
In objects/commands.cfg, add the below to the command definition section:
#check_nrpe_opnsense_reflector
define command {
command_name check_reflector_nrpe
command_line $USER1$/check_nrpe $ARG1$
}
In the above command, we’ll use Nagios’s check_nrpe plugin to run the check using the arguments passed in $ARG1$. The command name prevents conflicts from other commands that also use check_nrpe and will help to indicate what the command does.
Nagios host configurations consist of two essential parts: the host definition and the service definition(s). Since Nagios can’t directly ping the secondary in backup mode or the primary in backup mode, we will be defining the host using a check_command that uses the NRPE reflector to determine if the firewall is truly down or just in failover mode which will give us accurate host statuses regardless of the host’s CARP status.
define host {
use linux-server
host_name ruckus01.lan.home.matrix
alias
check_command check_reflector_nrpe!-H 10.0.0.197 -4 -c primary_ping
address 10.0.0.2
max_check_attempts 5
check_period 24x7
notification_interval 30
notification_period 24x7
}
In the above host definition, we’ve implemented the check_reflector_nrpe command along with a few parameters:
- -H 10.0.0.197 – This is the IP address of the NRPE server.
- -4 – Specify using IPv4
- -c primary_ping – Specify we want to run the primary_ping command on NRPE.
This check_command format will be used for all the services as well in pretty much the same format. The only thing that will be changing is the value passed with the -c argument. Note: The arguments you pass here are only being passed to the check_reflector_nrpe check, not out to the actual NRPE server with the exception of the command named using -c.
For service definitions, the same check_command is used with the appropriate command defined in nrpe.cfg on the reflector server.
define service{
use generic-service
host_name ruckus01.lan.home.matrix
service_description https_cert
check_command check_reflector_nrpe!-H 10.0.0.197 -4 -c primary_webui_ssl
}
Just like for the host definition, the service definition check_command uses the same syntax. We will configure the secondary using the appropriate check_command syntaxes as well. Once both routers have been defined in Nagios along with their respective service and host definitions, Nagios can be restarted and the configuration is now complete.
Implementing additional checks is easy, just remember to match the -c parameter in your Nagios configuration to the command definition in nrpe.cfg on the NRPE server.
Happy hacking monitoring!
FIRESTORM_v1