Need Help with Server Performance

Post Reply
User avatar
Paronity
A semi-regular
A semi-regular
Posts: 23
https://www.youtube.com/channel/UC40BgXanDqOYoVCYFDSTfHA
Joined: Sat Mar 03, 2012 9:46 pm
Location: WV, USA
Contact:

Need Help with Server Performance

Post by Paronity »

Hello human life forms.

I am attempting to debug some issues that we have been having with our dedicated box that I have not been able to locate.

Our unmanaged dedicated box is the E3-1270v3, and it ran our custom mission file in ARMA3 perfectly fine at 60-70 people. It was this way for months.

Recently, even on a fresh restart the server FPS is at a lower rate than it ever has been and it randomly "crashes" at about 2-2.5 hours of up-time. What is interesting about this is that the server itself does not actually crash (the process remains up), but the server is locked up and all network packets are being dropped (according to the RPT log).

I have been attempting to debug this issue for about 2 months now and have not had any success. I have re-written entire chunks of code assuming it was some code in our mission, but on a whim, I set up a server on my local PC and it runs significantly better than our server does.

I have tried many things to attempt to fix this issue such as:

1.) Removing any process on the box that we aren't using (this server is currently the only long-running process (minus the MySQL instance for it)).
2.) Tried various memory allocators to see if I could increase the performance of ARMA itself.
3.) Tried several various basic.cfg entries to tweak the settings.
4.) Restarting the box as a whole to "cleanup" windows.

At the end of the day, the box itself appears to be in good health (CPU, RAM, Disk IO, Net IO) and are all operating in a healthy manner.

I opened a support ticket and they recommended trying some vanilla testing to see if the issue could be in the mission still.

Ok - so after mnay hours of testing (with about 6-8 people), we did see a lot of network errors start showing up in the log:
NetServer::SendMsg: cannot find channel #814647724, users.card=7
NetServer: users.get failed when sending to 814647724
Message not sent - error 0, message ID = ffffffff, to 814647724 ([GSN] Kherune)

Server: Network message 3366be is pending
Server: Network message 3366be is pending
Server: Network message 3366f6 is pending
Server: Network message 3366f6 is pending
Server: Network message 3366f8 is pending
Server: Network message 3366f8 is pending
Server: Network message 336713 is pending
Server: Network message 3367bd is pending
Server: Network message 33682e is pending
Server: Network message 33682f is pending
Server: Network message 3368be is pending
Server: Network message 3369a9 is pending
Server: Network message 3369a9 is pending
Server: Network message 3369fb is pending
Server: Network message 336a5d is pending
Server: Network message 336ae3 is pending
Server: Network message 336ae3 is pending
It just made everyone leave the server and everyone had a red chain. Graphs still look absolutely amazing, but the log is filling with these messages. Shortly before this occurs, the CPS of the server will usually tank as well (again, with apparently no reason at all).

It's worth noting, that when this happens, there are very rare occasions where the server will "catch-up".

Even after this test, I'm not sure what to make of it all. I'm just baffled. It is worth noting that even on these vanilla missions, we saw better client side FPS on my machine, than when on the server.

cfg values:
MinBandwidth=15000000;
MaxBandwidth=100000000;
MaxMsgSend=512;
MaxSizeGuaranteed=1024;
MaxSizeNonguaranteed=64;
MinErrorToSend=0.0024999999;
I know this is a lot of information to digest, but any help would be greatly appreciated. I'm beginning to think there is something wonky with the box/OS install.
Creator of Paronicon and ARMAcon
Image
Image
User avatar
J-English
This is my homepage
This is my homepage
Posts: 618
Joined: Thu Apr 15, 2010 4:06 am
Location: United Kingdom

Re: Need Help with Server Performance

Post by J-English »

We run a busy arma 3 server..we found that wasteland 1.1 ran very good at the begining..but the dsync on 1.1 was horrible.We rolled back to 1.0b and the dsync issue has gone away.Also the 1.38 arma physics update caused us issues.we now get 500mb or more in rpt files! So we added the -no logs parameter.The camera shake we added camershake false; to our init sqf file.
Caliban55
This is my homepage
This is my homepage
Posts: 439
Joined: Sat Sep 04, 2010 10:20 am
Location: Cologne, Gemany
Contact:

Re: Need Help with Server Performance

Post by Caliban55 »

Usually these network problems come up if the mission/clients are using too many public variable commands, there are too many objects created (wrecks, weapons objects, etc.) and not removed, or, god forbid, if the mission creator used persistent network calls.

BI is also changing the netcode from version to version, so it may well be that a previous version works better, or a RC.

Testing it on your home computer, or with another mission usually does not work well, as you would have to create a similar environment (same client number, clients connected over several hours and actually doing something).

The basic.cfg file also does not really help that much, as long as there are no completely unrealistic values in it - it has some use in (fine) tuning, but don't expect miraculous improvements.

What you can try to do:
1) Check if the server's computational cycles are breaking in when you notice the desync issues by logging into the server as an administrator and then starting a monitor console by typing in the chat (doesn't matter which channel): #monitor 2. You can turn it of by typing #monitor 0. If the values are above 5, that is usually still OK.

2) Test a legacy build (1.36) and a RC build (1.39) and check if the mission runs better with those.

3) Reduce the number of clients that can connect to something between 30 - 40.

4) Use the start-up parameters -loadMissionToMemory and -autoInit to reduce server load.
User avatar
Paronity
A semi-regular
A semi-regular
Posts: 23
Joined: Sat Mar 03, 2012 9:46 pm
Location: WV, USA
Contact:

Re: Need Help with Server Performance

Post by Paronity »

Awesome, thanks for the tips.

I have been continually debugging for days on this stuff and I have actually made a good bit of progress. I was using the netlog for a 3rd party tool to help map playerids to GUIDs on the fly. Disabling the netlog actually removed some overhead on it's own.

Next, I went through the code and tweaked any looping that I could find and disable as many things that I could to eliminate as many objects as I could as well.

For the last couple days, things have been running the full restart cycle without any issues. I was shocked to see just how much of an impact that netlog had on the over system performance. It's astonishing.

Thanks for the monitor commands though. I will be using those in the future for debugging of these types of issues as well.

Finally, I am the code for most of the changes in the missions. ARMA is a new beast to me and I'm still learning what is good/bad when it comes to coding standards, so it's always possible that it's my fault since ARMA is all kinda of quirky. I think your notes will go a long way in helping me debug performance in the future though. Thanks again!
Creator of Paronicon and ARMAcon
Image
Image
Caliban55
This is my homepage
This is my homepage
Posts: 439
Joined: Sat Sep 04, 2010 10:20 am
Location: Cologne, Gemany
Contact:

Re: Need Help with Server Performance

Post by Caliban55 »

Glad that it works better now.

If it comes to coding for Arma, this is what I would recommend as a guideline, especially for larger mission projects.

Make a plan of which functions/code should be run on:
  • the clients,
  • the server,
  • both client and server.
Then sepperate the code in a way that it is only executed where it is intended (if (isServer)..., if (isClient)...). In the long run you will do yourself a favour as this makes sure that the performance is more stable and the mission code can be adapted easily in the future. And try to avoid the above mentioned persistent network calls :D .
User avatar
Paronity
A semi-regular
A semi-regular
Posts: 23
Joined: Sat Mar 03, 2012 9:46 pm
Location: WV, USA
Contact:

Re: Need Help with Server Performance

Post by Paronity »

Well, the symptoms are back. The irony of it all is that NOTHING on the server has changed with the exception of perhaps the upgrade to 1.4.

I am now back to pulling my hair out over this issue. The charts look absolutely amazing (granted the population count is low ATM)

Image


(Ignore the blip in the graph, restarted the master ASM)

Even with the pop in the 50s or 60s that chart usually looks like that. It's stable, and solid. Then out of nowhere, it will drop and the server will have a stroke and it requires a reboot to fix.

When it happens I check these snippets to see what's going on and get an idea of what is currently on the map and in the mission namespaces:

Code: Select all

{
	_text = format["Vehicle%1",_forEachIndex];
	_type = typeOf _x;
	diag_log format["%1 (%2) | Pos: %3",_text,_type, position _x];
} foreach vehicles;

{
	_text = format["MissionObject%1",_forEachIndex];
	_type = typeOf _x;
	diag_log format["%1 (%2) | Pos: %3",_text,_type, position _x];
} foreach allMissionObjects "All";

{
	_text = format["Entity%1",_forEachIndex];
	_type = typeOf _x;
	diag_log format["%1 (%2) | Pos: %3",_text,_type, position _x];
} foreach (entities "All");
None of them are out of hand (never over 100-200 total at any given time, which in my experience is more than acceptable).

I have several custom cleanup scripts that take care of vehicles, bodies, creates, weapons, and just about anything else you can iterate through and cleanup.

I just can't find any rhyme or reason to it and it's starting to make me want to stop hosting these servers. In all the servers I have hosted in the years, these have far been the most cumbersome. I'm not above putting the work in and getting my "hands dirty", but I'm just hitting brick wall after brick wall.

Any body have anything that I can do to try to narrow this down? Even a "monitoring" script that I can run that will write to RPT every now and then with some stats that I could be watching (much like ASM does, but on a greater scale). I'm truly open to ANYTHING at this point. Thanks in advance!

EDIT: Re sized that image and made it click to open full size, had no idea the forums wouldn't auto-size it. :D
Creator of Paronicon and ARMAcon
Image
Image
Caliban55
This is my homepage
This is my homepage
Posts: 439
Joined: Sat Sep 04, 2010 10:20 am
Location: Cologne, Gemany
Contact:

Re: Need Help with Server Performance

Post by Caliban55 »

I don't think that there is any good solution at this point. You can try to run a different stable mission and test if you notice the same behavior there, but you would have to create a similar client connection environment. And then it would only tell you that there is something in the mission causing those desync issues. You would have to go through every code line in every function and check it, which is probably just too much work for a mission someone else wrote. At this point it would be easier and better to write a mission from scratch that you are familiar with and can control.

Your best option would be to accept these desync issues (they are not uncommon in the Arma engine and only get worse with high client number Coop missions) and restart the server on a constant interval, every 4 - 5 hours for example.

Or, if it is a Coop mission, you might try to disable any AI components.
Post Reply