Current time: 12-15-2017, 08:07 PM Hello There, Guest! (LoginRegister)

Post Reply 
 
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Unrealistic performance at beginning of agent start (EC2)
11-17-2016, 07:34 AM
Post: #11
RE: Unrealistic performance at beginning of agent start (EC2)
The default security group should generally be no access from anywhere (all ports blocked) AFAIK.

Both SpeedCurve and my testing use separate scripts to do the scaling "manually" (before starting a batch of tests and killing them when done) but at least my code looks just like the auto-scaling code.
Visit this user's website Find all posts by this user
Quote this message in a reply
11-18-2016, 03:33 PM
Post: #12
RE: Unrealistic performance at beginning of agent start (EC2)
Alright - was able to recreate the issue while logged into the agent and I'm pretty sure I know what's going on, yay!

For the record, the default security group had all ports open, but was restricted to connections from those in the same security group. Once I changed 'source' to my IP address, I could connect. So glad to say that's calmed my security concerns.

Here's a timeline of what happened after I saw the agent auto-launch due to a new test, in minutes:
0:09 - was finally able to connect via remote desktop. Immediately saw an error in the WPT driver window that said "Problem loading settings, trying again..."
0:11 - Installing Chrome
0:14 - Installing Firefox
0:18 - Installing Flash
0:20 - Installing Python and running python script
0:22 - Started first WPT test

During the first test, it was obvious the box was doing something other than just running the test.
Here's a screenshot of a blank page during the test, that lasted for almost 30 seconds. A list of processes in the background.
Next here's a screenshot of the test complete, still CPU and network usage, although it is processing the test.
This screenshot shows a few minutes past the test completion, yet still some disk and network activity
Finally here you can see things calm, as they should be.
I decided to run a second test after this calm activity, and you can see the normal expected resource usage of a test.

To prove that the issue happened during my spying, here's the first test, and here's the second. Same URL, minutes apart, huge performance difference.

Do you see what I see?

I see a "SoftwareUpdate.exe" from Apple and a "Setup.exe" from Google Chrome during the first test. Actually there's also a "GoogleUpdate.exe". I wasn't able to capture that they're utilizing too much CPU, but definitely shows they're eating up network (and something's obviously eating up disk even while it waits for work after the first run).

Could it be that even though Chrome and Safari have just been installed they're still downloading (or checking for) updates?

How do you interpret this?
Find all posts by this user
Quote this message in a reply
11-18-2016, 11:45 PM
Post: #13
RE: Unrealistic performance at beginning of agent start (EC2)
It's possible. Safari won't have any updates because the browser is ancient and hasn't been supported in years. I should be able to add logic to wptdriver to automatically kill it.

For Chrome, we install 54 but it may not be the absolute latest sub-version so it's entirely possible that it is doing patch updates in the background. Since we automatically install the desired Chrome build automatically ourselves I should be able to disable automatic updates for Chrome as well.

I'll have an updated wptdriver later today that hopefully addresses both.
Visit this user's website Find all posts by this user
Quote this message in a reply
11-19-2016, 05:28 AM
Post: #14
RE: Unrealistic performance at beginning of agent start (EC2)
I just released 334 which (hopefully) turns off Chrome updates for EC2 instances and kills Apple's SoftwareUpdate.exe. I need to do some testing to make sure the registry keys work as advertised.
Visit this user's website Find all posts by this user
Quote this message in a reply
11-29-2016, 06:01 AM
Post: #15
RE: Unrealistic performance at beginning of agent start (EC2)
Hey Patrick -

Was waiting to collect some solid data, and I can tell you that it doesn't appear to have fixed the issue. I'll login to the agents again shortly to see which update processes are still running...

Thanks (hope you had a nice thxgiving!)
Find all posts by this user
Quote this message in a reply
11-30-2016, 12:15 AM
Post: #16
RE: Unrealistic performance at beginning of agent start (EC2)
Don't know how much it will help but I just rolled out 336. The agent has always had logic to wait for the CPU to go idle at the start of a test (to wait for the browser to finish initializing/etc). The logic could be skewed by multi-core systems though since it looked at the overall CPU utilization so I tweaked it to adjust for the number of cores.

I also added some logic at the agent startup time to wait up to 10 minutes for the machine to go idle before starting. With EC2 this means the old agent on the iimage will install the browsers, update to the new agent and then the new agent will start and wait for the machine to go idle.

It still won't help if the update goes idle for a while, runs for longer than 30 seconds after starting the browser or starts in the middle of a test.

Separately, Chrome 55 should be coming out any day now in which case the initial browser install will be the latest and not have any available updates (at least until a post-release update is pushed)
Visit this user's website Find all posts by this user
Quote this message in a reply
12-08-2016, 06:24 AM
Post: #17
RE: Unrealistic performance at beginning of agent start (EC2)
Hey Patrick -

Thanks again for your work on this. I logged in today to watch it run some tests (since even after the 336 release I didn't see any difference), and I'm seeing the agent install all the software like before, then wait for idle CPU, then it kicks me off of remote desktop (I'm assuming it's restarting?). I'll log back in, and I see WPT Driver running, waiting for idle CPU, then I'm logged off again. It's done this 5 times in a row already...

I also noticed that my WPT server hasn't completed any work in the last 36 hours even though there is stuff in the queue and agents are starting via ec2 autoscale.

Any idea what is going on?
Find all posts by this user
Quote this message in a reply
12-08-2016, 06:48 AM
Post: #18
RE: Unrealistic performance at beginning of agent start (EC2)
Looks like something may have gone sideways with the 341 agent push: https://github.com/WPO-Foundation/webpag...issues/777

I just pushed 342 so hopefully that fixes whatever went wrong.
Visit this user's website Find all posts by this user
Quote this message in a reply
12-08-2016, 07:05 AM (This post was last modified: 12-08-2016 07:30 AM by ShaneLabs.)
Post: #19
RE: Unrealistic performance at beginning of agent start (EC2)
Thanks for the quick response - that fix does seem to be working now, at least the agents are staying up and running tests now.

Ok, with that fixed I logged in to watch the tests being run, and I'm still seeing GoogleUpdate.exe running while a test is running, causing this false slow performance.

Here's the test: http://wpt.machmetrics.com/result/161206...2c3f1f9f3/
Here's a screenshot showing update processes during test: http://i.imgur.com/zcnv3M8.png
And again, here's the same url minutes later showing you the problem is not the site: http://wpt.machmetrics.com/result/161207...5484f2437/

Any other remaining things to try? If not, I may either abandon autoscale and manage my own ami's with updates off, or pad the beginning of every queue with 10 or so tests that get thrown away...
Find all posts by this user
Quote this message in a reply
12-22-2016, 09:37 AM
Post: #20
RE: Unrealistic performance at beginning of agent start (EC2)
For those curious, I've worked around this by padding the beginning of every queue with some 'warm up' tests that I ignore. Although I hate that this is wasting resources/time, I realize this is a tricky problem to solve.

Let me know if anyone else comes up with some better alternatives!

Thanks
Find all posts by this user
Quote this message in a reply
Post Reply 


Forum Jump:


User(s) browsing this thread: 1 Guest(s)