You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
after upgrading from a 1.3 to the newest 1.4 version on a Linux system I noticed some performance issues.
In my case the average cpu time of the server increased around 20 times compared to the 1.3 version, even noticeable when running a idle server without any connections. So i gave a look into it.
After some investigation trying different 1.4 versions and recording traces all 1.4 versions shows the same behavior, that seems to be caused by the switch from the select to epoll kernel apis.
On every epoll_wait wakeup the opc server does many additional noblocking epoll_wait calls with timeout 0. Way to many to count in my case it seems to be something in the high 2 digit to low hundreds range adding usually up to more 500us of cpu time in each server call. So the overhead usually adds to way more that the expected cpu load of the server itself.
After building the library and debugging it following issue seems to occurre in the eventloop_posix.c and eventloop_posix_epoll.c modules in the functions UA_EventLoopPOSIX_runUA_EventLoopPOSIX_pollFDs:
all the timeouts of the internally added callbacks are managed with a precision of 100 ns
the used epoll_wait only supports a precision of 1ms. The input is rounded down by integer division
epoll_wait will wake up a ms too early, so the server main loop in UA_Server_run will loop nonblocking until the timeout a few hundred us away in the future is fulfilled, the callback executed and the timeout reset.
Depends a bit on the starting time and jitter, but worst case any callback takes around 1ms of extra time per execution indepedent of system performance, creating a noticeable amount of overhead. Even if epoll only supports ms precision it usually triggers rather precise with minimal jitter in the double digit microsecond range x ms after it got called. So how big the overhead gets, is consistent after the first call.
In all it should be a general problem that effects all linux builds in 1.4.x version.
Background Information / Reproduction Steps
run a server on Linux.
monitor cpu usage
record traces with kernel events (trace-cmd, lttng, etc.)
At first glance I see a few possible options to fix the problem, but maybe you have better ideas.
Use epoll_pwait2:
There is already a commented out epoll_pwait2 implementation right under the epoll_wait implementation in UA_EventLoopPOSIX_pollFDs epoll_pwait2 would probably work fine, but could be a breaking change for older systems. Already encountered some older linux versions that didn't support it before.
Ceil the timout that is going into epoll
A rather quick and dirty solution would be to ceil the timeout that is going into epoll_wait
tried it as a temporary workaround and seems to work rather good in my case. Subsequent calls of the repeated callbacks will usually have the same offset to the epoll_wait call so they will still get called rather precise in x ms intervals. It would also only effect the linux version.
Manage the timouts with ms precision:
A more clean solution would probably be to manage the callback times only with ms precision and take into account the jitter by checking if the timestamp is near the expected one. They only get parameterized with a ms precision. Not sure if there a good usecases to get more precise than that on different plattforms.
User Workarounds:
Run UA_Server_run_iterate with the nonblocking waitInternal yourself and sleep a ms longer than it return
Checklist
Please provide the following information:
open62541 Version (release number or git tag): All 1.4.x Versions
Other OPC UA SDKs used (client or server):
Operating system: Debian Bookworm
Logs (with UA_LOGLEVEL set as low as necessary) attached
Wireshark network dump attached
Self-contained code example attached
Critical issue
The text was updated successfully, but these errors were encountered:
Description
Hi,
after upgrading from a 1.3 to the newest 1.4 version on a Linux system I noticed some performance issues.
In my case the average cpu time of the server increased around 20 times compared to the 1.3 version, even noticeable when running a idle server without any connections. So i gave a look into it.
After some investigation trying different 1.4 versions and recording traces all 1.4 versions shows the same behavior, that seems to be caused by the switch from the
select
toepoll
kernel apis.On every
epoll_wait
wakeup the opc server does many additional noblocking epoll_wait calls with timeout 0. Way to many to count in my case it seems to be something in the high 2 digit to low hundreds range adding usually up to more 500us of cpu time in each server call. So the overhead usually adds to way more that the expected cpu load of the server itself.After building the library and debugging it following issue seems to occurre in the
eventloop_posix.c
andeventloop_posix_epoll.c
modules in the functionsUA_EventLoopPOSIX_run
UA_EventLoopPOSIX_pollFDs
:epoll_wait
only supports a precision of 1ms. The input is rounded down by integer divisionepoll_wait
will wake up a ms too early, so the server main loop inUA_Server_run
will loop nonblocking until the timeout a few hundred us away in the future is fulfilled, the callback executed and the timeout reset.Depends a bit on the starting time and jitter, but worst case any callback takes around 1ms of extra time per execution indepedent of system performance, creating a noticeable amount of overhead. Even if
epoll
only supports ms precision it usually triggers rather precise with minimal jitter in the double digit microsecond range x ms after it got called. So how big the overhead gets, is consistent after the first call.In all it should be a general problem that effects all linux builds in 1.4.x version.
Background Information / Reproduction Steps
UA_EventLoopPOSIX_run
Issue #6593 seems to be related
Used CMake options (should not really matter):
Possible Solution:
At first glance I see a few possible options to fix the problem, but maybe you have better ideas.
Use epoll_pwait2:
There is already a commented out
epoll_pwait2
implementation right under theepoll_wait
implementation inUA_EventLoopPOSIX_pollFDs
epoll_pwait2
would probably work fine, but could be a breaking change for older systems. Already encountered some older linux versions that didn't support it before.Ceil the timout that is going into epoll
A rather quick and dirty solution would be to ceil the timeout that is going into
epoll_wait
tried it as a temporary workaround and seems to work rather good in my case. Subsequent calls of the repeated callbacks will usually have the same offset to the
epoll_wait
call so they will still get called rather precise in x ms intervals. It would also only effect the linux version.Manage the timouts with ms precision:
A more clean solution would probably be to manage the callback times only with ms precision and take into account the jitter by checking if the timestamp is near the expected one. They only get parameterized with a ms precision. Not sure if there a good usecases to get more precise than that on different plattforms.
User Workarounds:
Run
UA_Server_run_iterate
with the nonblockingwaitInternal
yourself and sleep a ms longer than it returnChecklist
Please provide the following information:
UA_LOGLEVEL
set as low as necessary) attachedThe text was updated successfully, but these errors were encountered: