Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Script intervel in millisecond. #2546

Closed
muhammedShanSyber opened this issue Feb 7, 2025 · 15 comments
Closed

Script intervel in millisecond. #2546

muhammedShanSyber opened this issue Feb 7, 2025 · 15 comments

Comments

@muhammedShanSyber
Copy link

I've successfully implemented clustering using keepalived with 2 ubuntu servers. And It is working fine except I can't change the intervel to millisecond in vrrp_script.

Server 2 - backup configuration :

global_defs {
    enable_script_security
    script_user clusterseg2
}
vrrp_script chk_script {
    script "/etc/keepalived/health_check.sh"
    interval 1
    weight -20
    fall 2
    rise 5
}
vrrp_instance VI_1 {
    state BACKUP
    interface enp1s0
    virtual_router_id 51
    priority 90   
    #max_auto_priority 100
    advert_int 1
    unicast_src_ip 10.176.21.146
    unicast_peer {
	10.176.21.108
    }

    authentication {
        auth_type PASS
        auth_pass 1234 
    }

    virtual_ipaddress {
        10.176.21.250  
    }
    track_script {
	chk_script
    }
    preempt_delay 0
    notify_master "/etc/keepalived/start.sh"
    notify_backup "/etc/keepalived/stop.sh"
    notify_fault "/etc/keepalived/stop.sh"
}

I'm trying to eliminate downtime which is not possible ( 100% ). But some how I like to decrease the downtime to milliseconds.

@pqarmitage
Copy link
Collaborator

We don't want keepalived due to overheads of forking and execing the script causing a not insignificant load on the system.

If you can post your /etc/keepalived/health_check.sh script, we may be able to suggest some alternatives that would give you the sort of response time that you want.

With your current VRRP configuration it will take between 2.648 and 3.648 seconds before a backup takes over as master once the health_check.sh identifies a failure (this 1 second time range is due to the fault could occur anywhere in the 1 second interval between adverts). Also, specifying fall 2 further delays causes a further delay of 1 second from when the fault is first detected.

There are various ways that you can improve these times:

  1. Specify vrrp_version 3 and reduce the advert interval to a minimum of 0.01 seconds (the advert interval for VRRP v3 can be any multiple of 0.01 seconds up to a maximum of 40.95 seconds).
  2. Increase the priority of your VRRP instances, to perhaps 250 and 245. The calculation for the down timer for VRRPv3 is (3 + (256 - priority) / 256 ) * advert_interval, so increasing the priority and decreasing the advert interval reduces the time before a backup instance takes over as master.
  3. If you don't get false failures detected by health_check.sh script, remove the fall 2, so the priority change will occur as soon as the first failure is detected.
  4. Would it work for you if the VRRP instance went to fault state when the health_check.sh script detects a failure? If so, remove the weight -20 line. This will reduce the time for a backup to take over to (256 - priority) / 256 * advert_interval seconds (i.e. there is not the additional 3 advert intervals delay, due to the master instance sending an advert with priority set to 0.

Regarding the health_check.sh script itself, keepalived has various options to detect failure states much quicker (and more efficiently) than a track script can, so if we can see what the script is doing we may be able to suggest faster alternatives.

@muhammedShanSyber
Copy link
Author

How do I specify vrrp_version 3 in the configuration file ?
The installed version on my ubuntu server is 2.3.2

@pqarmitage
Copy link
Collaborator

@muhammedShanSyber Apologies, it should be version 3 in the vrrp_instance block; I would suggest you make it the first entry.

@muhammedShanSyber
Copy link
Author

muhammedShanSyber commented Feb 11, 2025

keepalived don't support vrrp version 3 ? Thats what I heard from the AI . If keepalived works with version 3 , what should I change in the configuration ? Should I change the vrrp_instance VI_1 to vrrp_instance VI_3 ?
example configuration :

cat /etc/keepalived/keepalived.conf

global_defs {
    enable_script_security
    script_user clusterseg2
}
vrrp_script chk_script {
    script "/etc/keepalived/health_check.sh"
    interval 1
    rise 5
}
vrrp_instance VI_3 {
    state BACKUP
    interface enp1s0
    virtual_router_id 51
    priority 90   
    #max_auto_priority 100
    advert_int 1
    unicast_src_ip 10.176.21.146
    unicast_peer {
	10.176.21.108
    }

    authentication {
        auth_type PASS
        auth_pass 1234 
    }

    virtual_ipaddress {
        10.176.21.250  
    }
    track_script {
	chk_script
    }
    preempt_delay 0
    notify_master "/etc/keepalived/start.sh"
    notify_backup "/etc/keepalived/stop.sh"
    notify_fault "/etc/keepalived/stop.sh"
}

@pqarmitage
Copy link
Collaborator

You either add version 3 to the vrrp_instance block, so it becomes:

vrrp_instance VI_3 {
    version 3
    state BACKUP
    interface enp1s0
    virtual_router_id 51
...

or you add vrrp_version 3 to the global_defs block to make all vrrp instances default to version 3:

global_defs {
    enable_script_security
    script_user clusterseg2
    vrrp_version 3
}

@muhammedShanSyber
Copy link
Author

Okay. Is it possible to do something about advert_int 1 ? From 1 sec to 500 millisecond ?

@muhammedShanSyber
Copy link
Author

Will there be an update in the coming days for implementing milliseconds ? I tried editing VI_1 to VI_3 and included vrrp_version 3 in global_defs , both are not working. service status of keepalived shows errors that says It doesn't support version 3. I changed back the defaults , faced issue again with keepalived and now reinstalled it. Now its working but there is a delay about 1 second.

@pqarmitage
Copy link
Collaborator

There will be no update since keepalived already supports advert intervals in multiples of 10ms.

If you specify interval 0.01 then adverts will be sent every 10ms.

@pqarmitage
Copy link
Collaborator

There is no need to change the vrrp instance name from VI_1 to VI_3 - it doesn't make any difference.

Please post your configuration and the full logs when keepalived starts up, so we can see the error message saying that version 3 is not supported.

@muhammedShanSyber
Copy link
Author

Okay. I have changed the advert_int 1 to advert_int 0.01
Here is the configuration file without specifying version 3

global_defs {
    enable_script_security
    script_user clusterseg
    vrrp_no_swap
    checker_no_swap
    }
vrrp_instance VI_1 {
    state MASTER
    interface eno1 
    virtual_router_id 51
    priority 100  
    #max_auto_priority 100
    advert_int 0.01
    unicast_src_ip 10.176.21.146
    unicast_peer {
	10.176.21.108
    }

    authentication {
        auth_type PASS
        auth_pass 1234  # Password for VRRP communication
    }

    virtual_ipaddress {
        10.176.21.250
    }
    preempt_delay 0
    notify_master "/etc/keepalived/start.sh"
    notify_backup "/etc/keepalived/stop.sh"
    notify_fault "/etc/keepalived/stop.sh"
}

And here is the service status :

keepalived.service - Keepalive Daemon (LVS and VRRP)
     Loaded: loaded (/usr/lib/systemd/system/keepalived.service; enabled; preset: enabled)
     Active: active (running) since Thu 2025-02-13 10:00:26 IST; 51min ago
       Docs: man:keepalived(8)
             man:keepalived.conf(5)
             man:genhash(1)
             https://keepalived.org
   Main PID: 62782 (keepalived)
      Tasks: 5 (limit: 6887)
     Memory: 38.5M (peak: 38.8M)
        CPU: 51min 40.843s
     CGroup: /system.slice/keepalived.service
             ├─62782 /usr/sbin/keepalived --dont-fork
             ├─62783 /usr/sbin/keepalived --dont-fork
             └─62797 /usr/bin/python3 /home/clusterseg/vm_communication_backend_diff_lang/V3/main.py

Feb 13 10:00:26 clusterseg Keepalived[62782]: Starting VRRP child process, pid=62783
Feb 13 10:00:26 clusterseg Keepalived_vrrp[62783]: (VI_1) VRRPv2 advertisement interval 0.010000s must be an integer. rounding
Feb 13 10:00:26 clusterseg Keepalived[62782]: Startup complete
Feb 13 10:00:26 clusterseg systemd[1]: Started keepalived.service - Keepalive Daemon (LVS and VRRP).
Feb 13 10:00:26 clusterseg Keepalived_vrrp[62783]: (VI_1) Entering BACKUP STATE (init)
Feb 13 10:00:27 clusterseg Keepalived_vrrp[62783]: (VI_1) received lower priority (90) advert from 10.176.21.146 - discarding
Feb 13 10:00:28 clusterseg Keepalived_vrrp[62783]: (VI_1) received lower priority (90) advert from 10.176.21.146 - discarding
Feb 13 10:00:29 clusterseg Keepalived_vrrp[62783]: (VI_1) received lower priority (90) advert from 10.176.21.146 - discarding
Feb 13 10:00:30 clusterseg Keepalived_vrrp[62783]: (VI_1) received lower priority (90) advert from 10.176.21.146 - discarding
Feb 13 10:00:30 clusterseg Keepalived_vrrp[62783]: (VI_1) Entering MASTER STATE

@pqarmitage
Copy link
Collaborator

As it says in the logs:
(VI_1) VRRPv2 advertisement interval 0.010000s must be an integer. rounding
so you are still using VRRPv2.

You need to add
version 3
to the vrrp_instance block, so that it looks like:

vrrp_instance VI_1 {
    version 3        # this line is added
    state MASTER
    interface eno1 
    virtual_router_id 51
    priority 100  
    #max_auto_priority 100
    advert_int 0.01
    unicast_src_ip 10.176.21.146
    unicast_peer {
	10.176.21.108
    }

@muhammedShanSyber
Copy link
Author

Updated configuration file and restart the service . here is the current status :

keepalived.service - Keepalive Daemon (LVS and VRRP)
     Loaded: loaded (/usr/lib/systemd/system/keepalived.service; enabled; preset: enabled)
     Active: active (running) since Mon 2025-02-17 12:01:26 IST; 1s ago
       Docs: man:keepalived(8)
             man:keepalived.conf(5)
             man:genhash(1)
             https://keepalived.org
   Main PID: 8509 (keepalived)
      Tasks: 2 (limit: 9189)
     Memory: 1.9M (peak: 3.5M)
        CPU: 21ms
     CGroup: /system.slice/keepalived.service
             ├─8509 /usr/sbin/keepalived --dont-fork
             └─8511 /usr/sbin/keepalived --dont-fork

Feb 17 12:01:26 clusterseg2 Keepalived[8509]: Running on Linux 6.11.0-17-generic #17~24.04.2-Ubuntu SMP PREEMPT_DYNAMIC Mon Jan 20 22:>
Feb 17 12:01:26 clusterseg2 Keepalived[8509]: Command line: '/usr/sbin/keepalived' '--dont-fork'
Feb 17 12:01:26 clusterseg2 Keepalived[8509]: Configuration file /etc/keepalived/keepalived.conf
Feb 17 12:01:26 clusterseg2 Keepalived[8509]: NOTICE: setting config option max_auto_priority should result in better keepalived perfo>
Feb 17 12:01:26 clusterseg2 Keepalived[8509]: Starting VRRP child process, pid=8511
Feb 17 12:01:26 clusterseg2 Keepalived_vrrp[8511]: (VI_1) VRRP version 3 does not support authentication. Ignoring.
Feb 17 12:01:26 clusterseg2 Keepalived[8509]: Startup complete
Feb 17 12:01:26 clusterseg2 systemd[1]: Started keepalived.service - Keepalive Daemon (LVS and VRRP).
Feb 17 12:01:26 clusterseg2 Keepalived_vrrp[8511]: (VI_1) Entering BACKUP STATE (init)
Feb 17 12:01:26 clusterseg2 Keepalived_vrrp[8511]: (VI_1) wrong version. Received 2 and expect 3

After few seconds the status looks like this :

● keepalived.service - Keepalive Daemon (LVS and VRRP)
     Loaded: loaded (/usr/lib/systemd/system/keepalived.service; enabled; preset: enabled)
     Active: active (running) since Mon 2025-02-17 12:01:26 IST; 58s ago
       Docs: man:keepalived(8)
             man:keepalived.conf(5)
             man:genhash(1)
             https://keepalived.org
   Main PID: 8509 (keepalived)
      Tasks: 5 (limit: 9189)
     Memory: 37.2M (peak: 37.7M)
        CPU: 55.356s
     CGroup: /system.slice/keepalived.service
             ├─8509 /usr/sbin/keepalived --dont-fork
             ├─8511 /usr/sbin/keepalived --dont-fork
             └─8526 /usr/bin/python3 /home/clusterseg2/vm_communication_backend_diff_lang/V3/main.py

Feb 17 12:01:26 clusterseg2 Keepalived[8509]: Startup complete
Feb 17 12:01:26 clusterseg2 systemd[1]: Started keepalived.service - Keepalive Daemon (LVS and VRRP).
Feb 17 12:01:26 clusterseg2 Keepalived_vrrp[8511]: (VI_1) Entering BACKUP STATE (init)
Feb 17 12:01:26 clusterseg2 Keepalived_vrrp[8511]: (VI_1) wrong version. Received 2 and expect 3
Feb 17 12:01:27 clusterseg2 Keepalived_vrrp[8511]: (VI_1) wrong version. Received 2 and expect 3
Feb 17 12:01:28 clusterseg2 Keepalived_vrrp[8511]: (VI_1) wrong version. Received 2 and expect 3
Feb 17 12:01:29 clusterseg2 Keepalived_vrrp[8511]: (VI_1) Entering MASTER STATE
Feb 17 12:01:30 clusterseg2 Keepalived_vrrp[8511]: (VI_1) wrong version. Received 2 and expect 3
Feb 17 12:01:31 clusterseg2 Keepalived_vrrp[8511]: (VI_1) wrong version. Received 2 and expect 3
Feb 17 12:01:32 clusterseg2 Keepalived_vrrp[8511]: (VI_1) wrong version. Received 2 and expect 3

config file :

global_defs {
    enable_script_security
    script_user clusterseg2
    vrrp_no_swap
    checker_no_swap
    }
#vrrp_script chk_script {
    #script "/etc/keepalived/health_check.sh"
    #interval 1
    #rise 1
    #}
vrrp_instance VI_1 {
    version 3
    state BACKUP
    interface enp1s0
    virtual_router_id 51
    priority 90   
    #max_auto_priority 100
    advert_int 1
    unicast_src_ip 10.176.21.146
    unicast_peer {
	10.176.21.108
    }

    authentication {
        auth_type PASS
        auth_pass 1234 
    }

    virtual_ipaddress {
        10.176.21.250  
    }
    #track_script {
	#chk_script
    #}
    preempt_delay 0
    notify_master "/etc/keepalived/start.sh"
    notify_backup "/etc/keepalived/stop.sh"
    notify_fault "/etc/keepalived/stop.sh"
}

@pqarmitage
Copy link
Collaborator

According to the logs you have changed the configuration on one system to use vrrp version 3, but you have not updated the other system to do so. The vrrp version needs to match on both systems.

Once you have fixed the above, you can then reduce the configured advert interval on both systems from advert_int 1 to as low as advert_int 0.01 (this may be too low to be reliable but it is the minimum value allowed). If you want a 500 millisecond advert interval, specify advert_int 0.5.

As I have previously asked, will you please post the FULL logs of keepalived from the time it starts, and not just the latest few entries produced by systemctl status keepalived, for example `journalctl --since=today --unit="Keepalived*". We cannot see what is happening unless we see all keepalived log entries.

Also, again as I have previously asked, will you please post the contents of /etc/keepalived/health_check.sh so that we can make recommendations of how that can be speeded up.

@muhammedShanSyber
Copy link
Author

I have changed configuration on both the servers and restarted the services. Stopped the service to check failover is running or not.
server 1 service status :

● keepalived.service - Keepalive Daemon (LVS and VRRP)
     Loaded: loaded (/usr/lib/systemd/system/keepalived.service; enabled; preset: enabled)
     Active: active (running) since Tue 2025-02-18 10:52:27 IST; 4min 37s ago
       Docs: man:keepalived(8)
             man:keepalived.conf(5)
             man:genhash(1)
             https://keepalived.org
   Main PID: 4328 (keepalived)
      Tasks: 5 (limit: 6887)
     Memory: 37.3M (peak: 37.6M)
        CPU: 4min 39.720s
     CGroup: /system.slice/keepalived.service
             ├─4328 /usr/sbin/keepalived --dont-fork
             ├─4329 /usr/sbin/keepalived --dont-fork
             └─4335 /usr/bin/python3 /home/clusterseg/vm_communication_backend_diff_lang/V3/main.py

Feb 18 10:52:27 clusterseg Keepalived[4328]: Running on Linux 6.8.0-52-generic #53-Ubuntu SMP PREEMPT_DYNAMIC Sat Jan 11 00:06:25 UTC >
Feb 18 10:52:27 clusterseg Keepalived[4328]: Command line: '/usr/sbin/keepalived' '--dont-fork'
Feb 18 10:52:27 clusterseg Keepalived[4328]: Configuration file /etc/keepalived/keepalived.conf
Feb 18 10:52:27 clusterseg Keepalived[4328]: NOTICE: setting config option max_auto_priority should result in better keepalived perfor>
Feb 18 10:52:27 clusterseg Keepalived[4328]: Starting VRRP child process, pid=4329
Feb 18 10:52:27 clusterseg Keepalived_vrrp[4329]: (VI_1) VRRP version 3 does not support authentication. Ignoring.
Feb 18 10:52:27 clusterseg Keepalived[4328]: Startup complete
Feb 18 10:52:27 clusterseg systemd[1]: Started keepalived.service - Keepalive Daemon (LVS and VRRP).
Feb 18 10:52:27 clusterseg Keepalived_vrrp[4329]: (VI_1) Entering BACKUP STATE (init)
Feb 18 10:52:27 clusterseg Keepalived_vrrp[4329]: (VI_1) Entering MASTER STATE

server 2 service status :

 keepalived.service - Keepalive Daemon (LVS and VRRP)
     Loaded: loaded (/usr/lib/systemd/system/keepalived.service; enabled; preset: enabled)
     Active: active (running) since Mon 2025-02-17 12:01:26 IST; 22h ago
       Docs: man:keepalived(8)
             man:keepalived.conf(5)
             man:genhash(1)
             https://keepalived.org
   Main PID: 8509 (keepalived)
      Tasks: 2 (limit: 9189)
     Memory: 2.2M (peak: 39.4M)
        CPU: 6h 24.840s
     CGroup: /system.slice/keepalived.service
             ├─8509 /usr/sbin/keepalived --dont-fork
             └─8511 /usr/sbin/keepalived --dont-fork

Feb 17 17:56:51 clusterseg2 Keepalived_vrrp[8511]: Netlink reports enp1s0 down
Feb 17 17:56:51 clusterseg2 Keepalived_vrrp[8511]: (VI_1) Entering FAULT STATE
Feb 18 09:53:15 clusterseg2 Keepalived_vrrp[8511]: Netlink reports enp1s0 up
Feb 18 09:53:15 clusterseg2 Keepalived_vrrp[8511]: (VI_1) Entering BACKUP STATE
Feb 18 09:53:19 clusterseg2 Keepalived_vrrp[8511]: (VI_1) Entering MASTER STATE
Feb 18 09:57:16 clusterseg2 Keepalived_vrrp[8511]: (VI_1) Master received advert from 10.176.21.108 with higher priority 100, ours 90
Feb 18 09:57:16 clusterseg2 Keepalived_vrrp[8511]: (VI_1) Entering BACKUP STATE
Feb 18 10:52:21 clusterseg2 Keepalived_vrrp[8511]: (VI_1) Entering MASTER STATE
Feb 18 10:52:47 clusterseg2 Keepalived_vrrp[8511]: (VI_1) Master received advert from 10.176.21.108 with higher priority 100, ours 90
Feb 18 10:52:47 clusterseg2 Keepalived_vrrp[8511]: (VI_1) Entering BACKUP STATE

Should I do anything about the authentication. Says version 3 doesn't support authentication. Here is the configuration of server 1 and server 2 :

server config :

global_defs {
    enable_script_security
    script_user clusterseg
    vrrp_no_swap
    checker_no_swap
    }
vrrp_instance VI_1 {
    version 3
    state MASTER
    interface eno1 
    virtual_router_id 51
    priority 100  
    #max_auto_priority 100
    advert_int 0.01
    unicast_src_ip 10.176.21.108
    unicast_peer {
	10.176.21.146
    }

    authentication {
        auth_type PASS
        auth_pass 1234  # Password for VRRP communication
    }

    virtual_ipaddress {
        10.176.21.250
    }
    preempt_delay 0
    notify_master "/etc/keepalived/start.sh"
    notify_backup "/etc/keepalived/stop.sh"
    notify_fault "/etc/keepalived/stop.sh"
}

server 2 config :

global_defs {
    enable_script_security
    script_user clusterseg2
    vrrp_no_swap
    checker_no_swap
    }
#vrrp_script chk_script {
    #script "/etc/keepalived/health_check.sh"
    #interval 1
    #rise 1
    #}
vrrp_instance VI_1 {
    version 3
    state BACKUP
    interface enp1s0
    virtual_router_id 51
    priority 90   
    #max_auto_priority 100
    advert_int 1
    unicast_src_ip 10.176.21.146
    unicast_peer {
	10.176.21.108
    }

    authentication {
        auth_type PASS
        auth_pass 1234 
    }

    virtual_ipaddress {
        10.176.21.250  
    }
    #track_script {
	#chk_script
    #}
    preempt_delay 0
    notify_master "/etc/keepalived/start.sh"
    notify_backup "/etc/keepalived/stop.sh"
    notify_fault "/etc/keepalived/stop.sh"
}

and here is health check configuration :

#!/bin/bash
# Health check script to ping the Master server

# Ping the Master server with a timeout of 100ms
if ping -c 1 -W 0.1 10.176.21.108 > /dev/null; then
    exit 0  # Success
else
    exit 1  # Failure
fi

@pqarmitage
Copy link
Collaborator

You should remove the authentication block since, at is says, VRRPv3 does not support authentication.

Is there a reason that you are using unicast rather than multicast? If multicast would work for you, I suggest removing the unicast_peer and unicast_src_ip entries and replace them with use_vmac. Your implementation will then conform to the VRRP RFC.

You have commented out/removed the track_script and I think that is right, because VRRP will now detect the other system is down/unreachable before the track_script can.

I also recommend removing state MASTER and state BACKUP - they don't do anything.

For a marginally faster failover time, increase the priorities and use 250 and 240.

If you want a faster indication than 0.04 seconds that the remote system is not available, you could use track_bfd but that is complicated.

If you make the above changes (excluding track_bfd), your configuration on server 1 will be

global_defs {
    enable_script_security
    script_user clusterseg
    vrrp_no_swap
    checker_no_swap
    }
vrrp_instance VI_1 {
    version 3
    interface eno1 
    virtual_router_id 51
    priority 250        # 240 on server2
    #max_auto_priority 100
    advert_int 0.01
    use_vmac

    virtual_ipaddress {
        10.176.21.250
    }
    preempt_delay 0
    notify_master "/etc/keepalived/start.sh"
    notify_backup "/etc/keepalived/stop.sh"
    notify_fault "/etc/keepalived/stop.sh"
}

The only difference in the configuration on server 2 is the priority.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants