Current time: 12-15-2017, 02:28 PM Hello There, Guest! (LoginRegister)

Post Reply 
 
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Unrealistic performance at beginning of agent start (EC2)
11-10-2016, 03:29 AM
Post: #1
Unrealistic performance at beginning of agent start (EC2)
Hey Patrick -

I'm using the EC2 autoscale feature to keep costs down (since I only run tests a few hours a day), and I've noticed an unwelcome trend - it seems that the first few tests that run on any agent after it boots are extremely inaccurate. Load times can be as much as 4x slower, and if you look at the waterfall it appears the browser simply halts loading the page for several seconds.

Here's an example of the same url, run first at the beginning of the queue, and again at the end of the queue (only a few minutes apart, all settings and content the same). I have confirmed this is consistent behavior, happening every day for weeks since I set it up, and on different urls too. If you want more examples I have plenty.

I'm guessing that the agent is doing something in the background after it starts up the browser. Could it be that the EC2 images are so old that they're churning trying to update themselves to the latest code and browser?

Any suggestions other than keeping the agents always running, or throwing away the first X minutes of testing?

Thanks so much,
Shane
Find all posts by this user
Quote this message in a reply
11-12-2016, 09:08 AM
Post: #2
RE: Unrealistic performance at beginning of agent start (EC2)
What size instances?

The OS should not auto-update and the browsers are installed when the instance starts up (and testing doesn't start until they have finished installing).

That said, it's clear that SOMETHING is eating the CPU time. Any chance you can enable tcpdump capture which might help identify if maybe Chrome is downloading something it shouldn't be.
Visit this user's website Find all posts by this user
Quote this message in a reply
11-12-2016, 09:17 AM
Post: #3
RE: Unrealistic performance at beginning of agent start (EC2)
Thanks Patrick -

I'm using c4.large instances (I'm the nerd that ran a bunch of tests on instance sizes, so I'm sure we have enough power here Wink )

'Enabling tcpdump capture' as in logging into the machine and monitoring wireshark while it runs? or is there an easier setting I'm not aware of...
Find all posts by this user
Quote this message in a reply
11-12-2016, 09:32 AM
Post: #4
RE: Unrealistic performance at beginning of agent start (EC2)
There's a per-test setting... Advanced tab of advanced settings, in the middle: "Capture network packet trace (tcpdump)"
Visit this user's website Find all posts by this user
Quote this message in a reply
11-12-2016, 09:33 AM
Post: #5
RE: Unrealistic performance at beginning of agent start (EC2)
The tcpdump will show up as a download to the left of the waterfall.
Visit this user's website Find all posts by this user
Quote this message in a reply
11-12-2016, 09:38 AM
Post: #6
RE: Unrealistic performance at beginning of agent start (EC2)
Awesome, glad I checked. Updated my API calls to enable that and will post back in a few after we have some data. Thanks, and have a great weekend!
Find all posts by this user
Quote this message in a reply
11-15-2016, 08:36 AM
Post: #7
RE: Unrealistic performance at beginning of agent start (EC2)
Hey Patrick -

I have a few test runs that have the network packet trace data. Here's one for reference.

I opened it up in Wireshark, and saw a huge gap in time between 1.9 and 9 seconds, where's it seems to be doing nothing. Here's a screenshot, am I reading this correctly?

[Image: SISXjOI.png]

If this verifies that nothing is being downloaded in the background, what else could the agent be doing?

Thanks,
Shane
Find all posts by this user
Quote this message in a reply
11-16-2016, 07:14 AM
Post: #8
RE: Unrealistic performance at beginning of agent start (EC2)
I wonder if there is some screwy IPv6 stuff going on: https://technet.microsoft.com/en-us/libr...s.10).aspx

If you're feeling adventerous you can connect to a launched instance (administrator pw is 2dialit) go into the interface settings and make sure IPv6 is disabled, create an image from the instance and see if that does anything (and/or set the reg keys from the above article).

I finally have a dummynet replacement that works on Server 2012 R2 or later and should work in EC2 so right after Thanksgiving I should be able to build new AMIs with Server 2016 and IPv6 disabled and see if that helps.
Visit this user's website Find all posts by this user
Quote this message in a reply
11-16-2016, 07:15 AM
Post: #9
RE: Unrealistic performance at beginning of agent start (EC2)
It's really bizarre that I don't see the issue and neither does SpeedCurve and we both launch and destroy hundreds of instances weekly Sad

Is it always at the same point in the waterfall and always for the same page? I wonder if there is something that is only IPv6 reachable that is causing it to try to bring up a tunnel.
Visit this user's website Find all posts by this user
Quote this message in a reply
11-17-2016, 06:55 AM
Post: #10
RE: Unrealistic performance at beginning of agent start (EC2)
Tried to login to an agent while it was running and wasn't able to. Was using Remote Desktop Connection to the public IP address shown in my running instance list (of my EC2 account)...verified that it was actually up through the getTesters.php screen, yet still couldn't connect. Was launched as part of a 'default' security group which had all ports open (side note - I realize this is a huge security risk, how do I specify the security group of the autoscale instances?). What am I missing? I know I've been able to connect years ago (but those were manually started).

It is not always at the same point within a page, and it is not always the same page - but does seem to happen only on the first 2-3 tests that are run after an instance starts up. Here's a test of Google.com, which was 3rd in line in the queue, showing a page load of 5s, when we know that isn't the case. The waterfalls of all affects tests seem to show periods of simple inactivity, not waiting in a particular phase.

I am equally puzzled why no one else has noticed this. I'm using an EC2 image for the wpt server, so there shouldn't be anything out of the ordinary. Are you and SpeedCurve using autoscale to launch your new weekly instances?
Find all posts by this user
Quote this message in a reply
Post Reply 


Forum Jump:


User(s) browsing this thread: 1 Guest(s)