By Their Robots.txt Ye Shall Know Them?
Is this a sign (a geeky one) of a more transparent White House?
An interesting catch by Jason Kottke: Most every web site has a file on it called "robots.txt." Search engines are guided by this file on what to include in their indexes when they spider the site, and what to leave out.
Here's the robots.txt file from whitehouse.gov on Jan 19:
User-agent: *
Disallow: /cgi-bin
Disallow: /search
Disallow: /query.html
Disallow: /omb/search
Disallow: /omb/query.html
Disallow: /expectmore/search
Disallow: /expectmore/query.html
Disallow: /results/search
Disallow: /results/query.html
Disallow: /earmarks/search
Disallow: /earmarks/query.html
Disallow: /help
Disallow: /360pics/text
Disallow: /911/911day/text
Disallow: /911/heroes/text
As Jason Kottke notes, "And it goes on like that for almost 2400 lines!" Click here to see the entire thing.
Here's the entire robots.txt file on whitehouse.gov today:
User-agent: *
Disallow: /includes/







COMMENTS (7)