I am attempting to debug some issues that we have been having with our dedicated box that I have not been able to locate.
Our unmanaged dedicated box is the E3-1270v3, and it ran our custom mission file in ARMA3 perfectly fine at 60-70 people. It was this way for months.
Recently, even on a fresh restart the server FPS is at a lower rate than it ever has been and it randomly "crashes" at about 2-2.5 hours of up-time. What is interesting about this is that the server itself does not actually crash (the process remains up), but the server is locked up and all network packets are being dropped (according to the RPT log).
I have been attempting to debug this issue for about 2 months now and have not had any success. I have re-written entire chunks of code assuming it was some code in our mission, but on a whim, I set up a server on my local PC and it runs significantly better than our server does.
I have tried many things to attempt to fix this issue such as:
1.) Removing any process on the box that we aren't using (this server is currently the only long-running process (minus the MySQL instance for it)).
2.) Tried various memory allocators to see if I could increase the performance of ARMA itself.
3.) Tried several various basic.cfg entries to tweak the settings.
4.) Restarting the box as a whole to "cleanup" windows.
At the end of the day, the box itself appears to be in good health (CPU, RAM, Disk IO, Net IO) and are all operating in a healthy manner.
I opened a support ticket and they recommended trying some vanilla testing to see if the issue could be in the mission still.
Ok - so after mnay hours of testing (with about 6-8 people), we did see a lot of network errors start showing up in the log:
It just made everyone leave the server and everyone had a red chain. Graphs still look absolutely amazing, but the log is filling with these messages. Shortly before this occurs, the CPS of the server will usually tank as well (again, with apparently no reason at all).NetServer::SendMsg: cannot find channel #814647724, users.card=7
NetServer: users.get failed when sending to 814647724
Message not sent - error 0, message ID = ffffffff, to 814647724 ([GSN] Kherune)
Server: Network message 3366be is pending
Server: Network message 3366be is pending
Server: Network message 3366f6 is pending
Server: Network message 3366f6 is pending
Server: Network message 3366f8 is pending
Server: Network message 3366f8 is pending
Server: Network message 336713 is pending
Server: Network message 3367bd is pending
Server: Network message 33682e is pending
Server: Network message 33682f is pending
Server: Network message 3368be is pending
Server: Network message 3369a9 is pending
Server: Network message 3369a9 is pending
Server: Network message 3369fb is pending
Server: Network message 336a5d is pending
Server: Network message 336ae3 is pending
Server: Network message 336ae3 is pending
It's worth noting, that when this happens, there are very rare occasions where the server will "catch-up".
Even after this test, I'm not sure what to make of it all. I'm just baffled. It is worth noting that even on these vanilla missions, we saw better client side FPS on my machine, than when on the server.
cfg values:
I know this is a lot of information to digest, but any help would be greatly appreciated. I'm beginning to think there is something wonky with the box/OS install.MinBandwidth=15000000;
MaxBandwidth=100000000;
MaxMsgSend=512;
MaxSizeGuaranteed=1024;
MaxSizeNonguaranteed=64;
MinErrorToSend=0.0024999999;