WebPagetest Forums
Unrealistic performance at beginning of agent start (EC2) - Printable Version

+- WebPagetest Forums (https://www.webpagetest.org/forums)
+-- Forum: WebPagetest (/forumdisplay.php?fid=7)
+--- Forum: Private Instances (/forumdisplay.php?fid=12)
+--- Thread: Unrealistic performance at beginning of agent start (EC2) (/showthread.php?tid=14591)

Pages: 1 2 3

RE: Unrealistic performance at beginning of agent start (EC2) - pmeenan - 11-17-2016 07:34 AM

The default security group should generally be no access from anywhere (all ports blocked) AFAIK.

Both SpeedCurve and my testing use separate scripts to do the scaling "manually" (before starting a batch of tests and killing them when done) but at least my code looks just like the auto-scaling code.

RE: Unrealistic performance at beginning of agent start (EC2) - ShaneLabs - 11-18-2016 03:33 PM

Alright - was able to recreate the issue while logged into the agent and I'm pretty sure I know what's going on, yay!

For the record, the default security group had all ports open, but was restricted to connections from those in the same security group. Once I changed 'source' to my IP address, I could connect. So glad to say that's calmed my security concerns.

Here's a timeline of what happened after I saw the agent auto-launch due to a new test, in minutes:
0:09 - was finally able to connect via remote desktop. Immediately saw an error in the WPT driver window that said "Problem loading settings, trying again..."
0:11 - Installing Chrome
0:14 - Installing Firefox
0:18 - Installing Flash
0:20 - Installing Python and running python script
0:22 - Started first WPT test

During the first test, it was obvious the box was doing something other than just running the test.
Here's a screenshot of a blank page during the test, that lasted for almost 30 seconds. A list of processes in the background.
Next here's a screenshot of the test complete, still CPU and network usage, although it is processing the test.
This screenshot shows a few minutes past the test completion, yet still some disk and network activity
Finally here you can see things calm, as they should be.
I decided to run a second test after this calm activity, and you can see the normal expected resource usage of a test.

To prove that the issue happened during my spying, here's the first test, and here's the second. Same URL, minutes apart, huge performance difference.

Do you see what I see?

I see a "SoftwareUpdate.exe" from Apple and a "Setup.exe" from Google Chrome during the first test. Actually there's also a "GoogleUpdate.exe". I wasn't able to capture that they're utilizing too much CPU, but definitely shows they're eating up network (and something's obviously eating up disk even while it waits for work after the first run).

Could it be that even though Chrome and Safari have just been installed they're still downloading (or checking for) updates?

How do you interpret this?

RE: Unrealistic performance at beginning of agent start (EC2) - pmeenan - 11-18-2016 11:45 PM

It's possible. Safari won't have any updates because the browser is ancient and hasn't been supported in years. I should be able to add logic to wptdriver to automatically kill it.

For Chrome, we install 54 but it may not be the absolute latest sub-version so it's entirely possible that it is doing patch updates in the background. Since we automatically install the desired Chrome build automatically ourselves I should be able to disable automatic updates for Chrome as well.

I'll have an updated wptdriver later today that hopefully addresses both.

RE: Unrealistic performance at beginning of agent start (EC2) - pmeenan - 11-19-2016 05:28 AM

I just released 334 which (hopefully) turns off Chrome updates for EC2 instances and kills Apple's SoftwareUpdate.exe. I need to do some testing to make sure the registry keys work as advertised.

RE: Unrealistic performance at beginning of agent start (EC2) - ShaneLabs - 11-29-2016 06:01 AM

Hey Patrick -

Was waiting to collect some solid data, and I can tell you that it doesn't appear to have fixed the issue. I'll login to the agents again shortly to see which update processes are still running...

Thanks (hope you had a nice thxgiving!)

RE: Unrealistic performance at beginning of agent start (EC2) - pmeenan - 11-30-2016 12:15 AM

Don't know how much it will help but I just rolled out 336. The agent has always had logic to wait for the CPU to go idle at the start of a test (to wait for the browser to finish initializing/etc). The logic could be skewed by multi-core systems though since it looked at the overall CPU utilization so I tweaked it to adjust for the number of cores.

I also added some logic at the agent startup time to wait up to 10 minutes for the machine to go idle before starting. With EC2 this means the old agent on the iimage will install the browsers, update to the new agent and then the new agent will start and wait for the machine to go idle.

It still won't help if the update goes idle for a while, runs for longer than 30 seconds after starting the browser or starts in the middle of a test.

Separately, Chrome 55 should be coming out any day now in which case the initial browser install will be the latest and not have any available updates (at least until a post-release update is pushed)

RE: Unrealistic performance at beginning of agent start (EC2) - ShaneLabs - 12-08-2016 06:24 AM

Hey Patrick -

Thanks again for your work on this. I logged in today to watch it run some tests (since even after the 336 release I didn't see any difference), and I'm seeing the agent install all the software like before, then wait for idle CPU, then it kicks me off of remote desktop (I'm assuming it's restarting?). I'll log back in, and I see WPT Driver running, waiting for idle CPU, then I'm logged off again. It's done this 5 times in a row already...

I also noticed that my WPT server hasn't completed any work in the last 36 hours even though there is stuff in the queue and agents are starting via ec2 autoscale.

Any idea what is going on?

RE: Unrealistic performance at beginning of agent start (EC2) - pmeenan - 12-08-2016 06:48 AM

Looks like something may have gone sideways with the 341 agent push: https://github.com/WPO-Foundation/webpagetest/issues/777

I just pushed 342 so hopefully that fixes whatever went wrong.

RE: Unrealistic performance at beginning of agent start (EC2) - ShaneLabs - 12-08-2016 07:05 AM

Thanks for the quick response - that fix does seem to be working now, at least the agents are staying up and running tests now.

Ok, with that fixed I logged in to watch the tests being run, and I'm still seeing GoogleUpdate.exe running while a test is running, causing this false slow performance.

Here's the test: http://wpt.machmetrics.com/result/161206_MG_c8d6ef752199508df08e42c2c3f1f9f3/
Here's a screenshot showing update processes during test: http://i.imgur.com/zcnv3M8.png
And again, here's the same url minutes later showing you the problem is not the site: http://wpt.machmetrics.com/result/161207_4A_7d1e42890d00f05615a47bf5484f2437/

Any other remaining things to try? If not, I may either abandon autoscale and manage my own ami's with updates off, or pad the beginning of every queue with 10 or so tests that get thrown away...

RE: Unrealistic performance at beginning of agent start (EC2) - ShaneLabs - 12-22-2016 09:37 AM

For those curious, I've worked around this by padding the beginning of every queue with some 'warm up' tests that I ignore. Although I hate that this is wasting resources/time, I realize this is a tricky problem to solve.

Let me know if anyone else comes up with some better alternatives!