I remember writing a spidering program to verify url correctness, about six years ago. I used lwp and wrote threads and all kinds of good stuff. It marked me. Used to be, whenever I want to grab a chunk of html from a server, I scratch out a 30 line perl script. Now I have an alternative. wget (or should it be GNU wget?) is a fantastic way to spider sites. In fact, I just grabbed all the mp3s available here with this command:

wget -r -w 5 --random-wait http://www.turtleserviceslimited.org/jukebox.htm

The random wait is in there because I didn’t want to overwhelm their servers or get locked out due to repeated, obviously nonhuman resource requests. Pretty cool little tool that can do a lot, as you can see from the options list.


© Moore Consulting, 2003-2017 +