Current time: 10-25-2020, 01:43 AM Hello There, Guest! (LoginRegister)

Post Reply 
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
honor robots.txt
03-16-2011, 05:44 AM
Post: #3
RE: honor robots.txt
Hey Pat,

No worries, I'm the one who's confused--I thought WPT was traversing the sites on the lists. Definitely can do the spidering in httparchive.

I'll follow up w/Steve. Thanks!


(03-15-2011 11:03 PM)pmeenan Wrote:  Sorry, I'm a little confused because wpt doesn't spider. It only knows how to load individual pages as they are requested. I wouldn't want to have wpt read robots.txt as if it were a bot because a lot of pages would be untestable. I'd expect you'd want to put the robots.txt logic wherever the spidering is being done.

If you're talking about the project I think you are, last time I checked it didn't spider either, it worked off of a list of pages from various "top X" lists. If the spidering is a new capability being added then that's probably where the logic belongs (though there are things wpt can do to help with just a little work - for example, dumping a list of links as part of the data returned about a page).


Find all posts by this user
Quote this message in a reply
Post Reply 

Messages In This Thread
honor robots.txt - jared - 03-15-2011, 01:56 PM
RE: honor robots.txt - pmeenan - 03-15-2011, 11:03 PM
RE: honor robots.txt - jared - 03-16-2011 05:44 AM

Forum Jump:

User(s) browsing this thread: 1 Guest(s)