Random image from the Library of Congress

Alec Jacobson

May 09, 2011

weblog/

Today I whipped up a set of programs that crawls the library of congress finding all image links of a certain type. Then I have another program that serves them up to the web. Here's the bash one-liner that grabs recursively all the index pages on the library of congress site in the directory where it contains the public domain images. Note it does not download all the linked files (images etc.) so it is relatively quick.
wget -N -r -H -t1 -Aindex.html -erobots=off  http://memory.loc.gov/service/pnp/
The above builds a sort of archive of the library of congress site with folders and everything. First let's clean up by remove indexes that don't contain links to images:
grep -Lor "[^>\"]*r.jpg<" * | xargs -I{} rm {}
Since the or archive has the same folder structure as the website the unix paths are almost the urls that we need. We just need to glue each matching image link to the parent folder's url path. This one-liner does that and saves all the image urls into image-links.txt
grep -or "[^>\"]*r.jpg<" * | sed -e "s/^/http:\/\//g" | sed -e "s/index.html://g" | sed -e "s/<$//g" > image-links.txt
To grab a random link (random line from image-links.txt), here's another handy one-liner:
dd if=image-links.txt skip=$(expr $(date +%N) \% $(stat -c "%s" image-links.txt)) ibs=1 count=200 2>/dev/null|sed -n '2{p;q;}'
Hilary mason lists a bunch of ways to get a random line from a file with bash. I use the last one because I have a large number of lines and this benchmarked as the fastest (also I can't issue sort -R). I'll be calling this from php so I could have used:
function RandomLine($filename) {
    $lines = file($filename) ;
    return $lines[array_rand($lines)] ;
}
$url = RandomLine("image-links.txt");
source But actually bash solution above is faster so I'll call that from php. Here's a small php program that let's me a have a static link that always returns the contents of a random image. This way I can use the url of this php script in an <:img src="XXX"> tag
<?
header('content-type: image/jpeg');
$url = `dd if=image-links.txt skip=$(expr $(date +%N) \% $(stat -c "%s" image-links.txt)) ibs=1 count=200 2>/dev/null|sed -n '2{p;q;}'`;
$curl_handle=curl_init();                                                                                                 
curl_setopt($curl_handle, CURLOPT_URL,$url);
curl_setopt($curl_handle, CURLOPT_CONNECTTIMEOUT, 2);
curl_setopt($curl_handle, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($curl_handle, CURLOPT_USERAGENT, 'random loc image');
$im = curl_exec($curl_handle);
curl_close($curl_handle);
#$im = file_get_contents($url);
echo $im;
?>
Be sure not have any extra white space at the front of the file. Page that displays random image (without redirecting data) Random url server Random image file server (redirector) Example of using random image file server: random image