I have a website that is behind a content delivery network (CDN). I want to protect it from being crawled by any robots. I want all access to go through the CDN for reasons. There may be errant links to the source; I don’t care if they continue to work.
htaccess
and basic auth is perfect for this situation.
I added an .htaccess
file that looks like this:
AuthType Basic
AuthName "Secure Content"
AuthUserFile /path/to/.htpasswd
require valid-user
I needed to make sure the path and the file are readable by the web server user.
Then, I added a .htpasswd
entry that looks like this:
user:passwdvalue
If you don’t have access to htpasswd
, the typical program used to generate the password value, this site will generate one for you.
Then, I had to configure my CDN to give it the appropriate header.
Use the Authorization header, and make sure to pass the username and the password. This site will generate the appropriately base64 encoded values.
Voila. Only the CDN has access.
Now, the flaws:
- Depending on how the CDN accesses the site, it may be possible to snoop out the username and password
- If you ever want to get the origin site over HTTP, you’ll need the username/password