Posts Tagged ‘php’

Scrape google search by image (query by example) results

Sunday, August 18th, 2013

Here’s a php script that takes as an argument a path to a directory. First save this in a file called common-user-agents.php:


<?php
// http://searchnewscentral.com/20110928186/General-SEO/how-to-scrape-search-engines-without-pissing-them-off.html
$COMMON_USER_AGENTS = array(
  "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/28.0.1500.72 Safari/537.36",
  "Mozilla/5.0 (Windows NT 6.1; WOW64; rv:22.0) Gecko/20100101 Firefox/22.0",
  "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/28.0.1500.95 Safari/537.36",
  "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_8_4) AppleWebKit/536.30.1 (KHTML, like Gecko) Version/6.0.5 Safari/536.30.1",
  "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_8_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/28.0.1500.71 Safari/537.36",
  "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_8_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/28.0.1500.95 Safari/537.36",
  "Mozilla/5.0 (Windows NT 6.2; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/28.0.1500.72 Safari/537.36",
  "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.8; rv:22.0) Gecko/20100101 Firefox/22.0",
  "Mozilla/5.0 (Windows NT 6.2; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/28.0.1500.95 Safari/537.36",
  "Mozilla/5.0 (Windows NT 5.1; rv:22.0) Gecko/20100101 Firefox/22.0",
  "Mozilla/5.0 (Windows NT 6.1; rv:22.0) Gecko/20100101 Firefox/22.0",
  "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/28.0.1500.72 Safari/537.36",
  "Mozilla/5.0 (compatible; MSIE 10.0; Windows NT 6.1; WOW64; Trident/6.0)",
  "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/28.0.1500.95 Safari/537.36",
  "Mozilla/5.0 (iPad; CPU OS 6_1_3 like Mac OS X) AppleWebKit/536.26 (KHTML, like Gecko) Version/6.0 Mobile/10B329 Safari/8536.25",
  "Mozilla/5.0 (iPhone; CPU iPhone OS 6_1_3 like Mac OS X) AppleWebKit/536.26 (KHTML, like Gecko) Version/6.0 Mobile/10B329 Safari/8536.25",
  "Mozilla/5.0 (Windows NT 5.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/28.0.1500.72 Safari/537.36",
  "Mozilla/5.0 (Windows NT 6.2; WOW64; rv:22.0) Gecko/20100101 Firefox/22.0",
  "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/28.0.1500.71 Safari/537.36",
  "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_7_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/28.0.1500.95 Safari/537.36",
  "Mozilla/5.0 (Windows NT 5.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/28.0.1500.95 Safari/537.36",
  "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:22.0) Gecko/20100101 Firefox/22.0",
  "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_7_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/28.0.1500.71 Safari/537.36",
  "Mozilla/5.0 (iPhone; CPU iPhone OS 6_1_4 like Mac OS X) AppleWebKit/536.26 (KHTML, like Gecko) Version/6.0 Mobile/10B350 Safari/8536.25",
  "Mozilla/5.0 (Windows NT 6.1; WOW64; rv:23.0) Gecko/20100101 Firefox/23.0",
  "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_7_5) AppleWebKit/536.30.1 (KHTML, like Gecko) Version/6.0.5 Safari/536.30.1",
  "Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; WOW64; Trident/5.0)",
  "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/28.0.1500.95 Safari/537.36",
  "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.7; rv:22.0) Gecko/20100101 Firefox/22.0",
  "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_6_8) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/28.0.1500.71 Safari/537.36",
  "Mozilla/5.0 (X11; Linux x86_64; rv:22.0) Gecko/20100101 Firefox/22.0",
  "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.6; rv:22.0) Gecko/20100101 Firefox/22.0",
  "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_6_8) AppleWebKit/534.59.8 (KHTML, like Gecko) Version/5.1.9 Safari/534.59.8",
  "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_8_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/28.0.1500.71 Safari/537.36",
  "Mozilla/5.0 (X11; Ubuntu; Linux i686; rv:22.0) Gecko/20100101 Firefox/22.0",
  "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_6_8) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/28.0.1500.95 Safari/537.36",
  "Mozilla/5.0 (compatible; MSIE 10.0; Windows NT 6.2; WOW64; Trident/6.0)",
  "Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; Trident/5.0)",
  "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Ubuntu Chromium/28.0.1500.71 Chrome/28.0.1500.71 Safari/537.36",
  "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_8_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/27.0.1453.116 Safari/537.36",
  "Mozilla/5.0 (compatible; MSIE 10.0; Windows NT 6.1; Trident/6.0)",
  "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_8_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/28.0.1500.95 Safari/537.36",
  "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:23.0) Gecko/20100101 Firefox/23.0",
  "Mozilla/5.0 (Windows NT 6.1; rv:17.0) Gecko/20100101 Firefox/17.0",
  "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.8; rv:23.0) Gecko/20100101 Firefox/23.0",
  "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/27.0.1453.116 Safari/537.36",
  "Mozilla/5.0 (Windows NT 6.0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/28.0.1500.72 Safari/537.36",
  "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/28.0.1500.71 Safari/537.36",
  "Mozilla/5.0 (iPad; CPU OS 5_1_1 like Mac OS X) AppleWebKit/534.46 (KHTML, like Gecko) Version/5.1 Mobile/9B206 Safari/7534.48.3",
  "Mozilla/5.0 (Windows NT 5.1; rv:23.0) Gecko/20100101 Firefox/23.0",
  "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_8_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/28.0.1500.71 Safari/537.36",
  "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_8_3) AppleWebKit/536.28.10 (KHTML, like Gecko) Version/6.0.3 Safari/536.28.10",
  "Mozilla/5.0 (Windows NT 6.0; rv:22.0) Gecko/20100101 Firefox/22.0",
  "Mozilla/5.0 (X11; Linux i686) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/28.0.1500.71 Safari/537.36",
  "Mozilla/5.0 (iPod; CPU iPhone OS 6_1_3 like Mac OS X) AppleWebKit/536.26 (KHTML, like Gecko) Version/6.0 Mobile/10B329 Safari/8536.25",
  "Mozilla/5.0 (Windows NT 6.1; rv:6.0) Gecko/20100101 Firefox/6.0",
  "Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.0; Trident/5.0)",
  "Mozilla/5.0 (Windows NT 6.1; WOW64; rv:17.0) Gecko/20100101 Firefox/17.0",
  "Mozilla/5.0 (Windows NT 6.1; WOW64; rv:19.0) Gecko/20100101 Firefox/19.0",
  "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Ubuntu Chromium/28.0.1500.52 Chrome/28.0.1500.52 Safari/537.36",
  "Opera/9.80 (Windows NT 6.1; WOW64) Presto/2.12.388 Version/12.16",
  "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_8_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/29.0.1547.32 Safari/537.36",
  "Mozilla/5.0 (X11; Linux i686) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/28.0.1500.95 Safari/537.36",
  "Mozilla/5.0 (Windows NT 6.1; rv:23.0) Gecko/20100101 Firefox/23.0",
  "Mozilla/5.0 (Windows NT 6.1; WOW64; rv:21.0) Gecko/20100101 Firefox/21.0",
  "Mozilla/5.0 (Windows NT 6.2; WOW64; rv:23.0) Gecko/20100101 Firefox/23.0",
  "Mozilla/5.0 (X11; Ubuntu; Linux i686; rv:23.0) Gecko/20100101 Firefox/23.0",
  "Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; Win64; x64; Trident/5.0)",
  "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_8_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/28.0.1500.95 Safari/537.36",
  "Mozilla/5.0 (Windows NT 5.1; rv:21.0) Gecko/20100101 Firefox/21.0",
  "Mozilla/5.0 (Windows NT 6.2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/28.0.1500.95 Safari/537.36",
  "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.31 (KHTML, like Gecko) Chrome/26.0.1410.63 Safari/537.31");
?>

Then save this in a file called

scrape-sbi.php:


<?php

include "common-user-agents.php";

// http://stackoverflow.com/a/10635186/148668
function fetch_google(
  $terms="sample search",
  $numpages=1,
  $user_agent='Mozilla/5.0 (Windows NT 6.1; rv:8.0) Gecko/20100101 Firefox/8.0')  
{
    $searched="";
    // Why was this $i<=$numpages ?
    for($i=0;$i<$numpages;$i++)
    {
        $ch = curl_init();
        $url="http://www.google.com/searchbyimage?hl=en&image_url=".urlencode($terms);
        echo "$url\n";
        curl_setopt ($ch, CURLOPT_URL, $url);
        curl_setopt ($ch, CURLOPT_USERAGENT, $user_agent);
        curl_setopt ($ch, CURLOPT_HEADER, 0);
        curl_setopt ($ch, CURLOPT_FOLLOWLOCATION, 1);
        curl_setopt ($ch, CURLOPT_RETURNTRANSFER, 1);
        curl_setopt ($ch, CURLOPT_REFERER, 'http://www.google.com/');
        curl_setopt ($ch,CURLOPT_CONNECTTIMEOUT,120);
        curl_setopt ($ch,CURLOPT_TIMEOUT,120);
        curl_setopt ($ch,CURLOPT_MAXREDIRS,10);
        curl_setopt ($ch,CURLOPT_COOKIEFILE,"cookie.txt");
        curl_setopt ($ch,CURLOPT_COOKIEJAR,"cookie.txt");
        $searched=$searched.curl_exec ($ch);
        curl_close ($ch);
    }

    return $searched;
}

if(sizeof($argv)<2)
{
  die("Usage:\n  scrape-sbi.php path/to/image/dir/\n");
}

$dir = $argv[1];
//  Create output directories
if (!file_exists('match')) {
  mkdir('match', 0777, true);
}
if (!file_exists('match/html')) {
  mkdir('match/html', 0777, true);
}
if (!file_exists('match/images')) {
  mkdir('match/images', 0777, true);
}
$connection = ssh2_connect('alecjacobson.com', 22,array('hostkey'=>'ssh-rsa'));
if(!ssh2_auth_pubkey_file($connection, 'alecjaco',
    '~/.ssh/id_rsa.pub',
    '~/.ssh/id_rsa', 'secret'))
{
  die("Error: SSH authentication failed.");
}
foreach(glob("$dir/*.jpg") as $file)
{
  echo $file."\n";
  if(sizeof(glob("match/images/".pathinfo("$file",PATHINFO_FILENAME).".*"))>0)
  {
    echo "  skipped.\n";
    continue;
  }
  $scp_path = "public_html/drop/closest/".
    uniqid().".jpg";
    //pathinfo($file, PATHINFO_BASENAME);
  if(!ssh2_scp_send($connection,$file,$scp_path,0777))
  {
    die("Error: SSH send failed.");
  }
  $up_url = preg_replace("/public.html/","http://alecjacobson.com/",
    $scp_path);
  echo $up_url."\n";
  $agent = $COMMON_USER_AGENTS[array_rand($COMMON_USER_AGENTS)];
  $res = "";
  $res = fetch_google($up_url,1,$agent);
  //$res =
  //  file_get_contents("test.html");
  if(sizeof($res)==0)
  {
    die("Error: google returned nothing.");
  }
  $matches = array();
  preg_match("/Visually similar.*?imgurl=([^&]*)&/",$res,$matches);
  if(sizeof($matches) < 2)
  {
    // Second type of results ( I think this is because of iPhone user agent)
    preg_match(
      '/Visually similar.*?data-largeimageurl="([^"]*)"/',$res,$matches);
    if(sizeof($matches) < 2)
    {
      file_put_contents("no-match.html",$res);
      // no match found: no visually similar image found?
      print_r($res);
      die("Error: No match found (check no-match.html).");
    }
    // http:\/\/... --> http://
    $matches[1] = trim(str_replace("\\/","/",$matches[1]));
  }
  file_put_contents(
    'match/html/'.
    pathinfo($file, PATHINFO_FILENAME).".html",
    $res);
  $match_url = $matches[1];
  echo "$match_url\n";
  $img = 'match/images/'.
    pathinfo($file, PATHINFO_FILENAME).".".
    pathinfo($match_url, PATHINFO_EXTENSION);
  echo "$img\n";
  file_put_contents($img, file_get_contents($match_url));
  // http://searchnewscentral.com/20110928186/General-SEO/how-to-scrape-search-engines-without-pissing-them-off.html
  sleep(rand(15,60));
} 
return 0;

?>

Then call it with something like:

php scrape-sbi.php src/

where src is full of jpgs. This will create match/images/ and fill it with the first result for each search by image search. Source 1 Source 2

Flushing php output on Safari with php scripts (hosted by Bluehost)

Thursday, June 13th, 2013

There are many, many posts about getting php to flush its output while a script is executing. No one solution worked for me, but I finally found a combination of things that did.

First of all my default php.ini shows these two relevant lines:


output_buffering = On
zlib.output_compression = Off

Then I put an .htaccess file in the directory that I’d like to have my buffering script with the following line:


SetEnv no-gzip dont-vary

Finally here was my small test php file:


<?php  // not in table tags for IE                         
    @ini_set('zlib.output_compression', 0);
    @ini_set('implicit_flush', 1);
    //for ($i = 0; $i < ob_get_level(); $i++) { ob_end_flush(); }
    ob_end_flush();
    ob_implicit_flush(1);
?>
<html>
<body>
<?php
echo str_pad('',1024);  // minimum start for Safari
for ($i=10; $i>0; $i--) {
  echo str_pad("$i<br>\n",8);
  // tag after text for Safari & Firefox
  // 8 char minimum for Firefox
  usleep(100000);
}
?>
</body>
</html>

High resolution images from rijksmuseum

Monday, June 3rd, 2013

Here’s a php script to download and stitch together high resolution images from the rijksmuseum:


<?php

# http://www.php.net/manual/en/function.json-decode.php#107107
function prepareJSON($input) {
    
    //This will convert ASCII/ISO-8859-1 to UTF-8.
    //Be careful with the third parameter (encoding detect list), because
    //if set wrong, some input encodings will get garbled (including UTF-8!)
    $imput = mb_convert_encoding($input, 'UTF-8', 'ASCII,UTF-8,ISO-8859-1');
    
    //Remove UTF-8 BOM if present, json_decode() does not like it.
    if(substr($input, 0, 3) == pack("CCC", 0xEF, 0xBB, 0xBF)) $input = substr($input, 3);
    
    return $input;
}

$url = $argv[1];
$url = preg_replace("/^https/","http",$url);

echo "Getting title...";
if(preg_match("/\/en\/collection\//",$url))
{
  $contents = file_get_contents($url);
  preg_match('/objectNumber : "([^"]*)"/',$contents,$matches);
  $id = $matches[1];
  preg_match('/objectTitle : "([^"]*)"/',$contents,$matches);
}else
{
  $offset = preg_replace("/^.*,([0-9]*)$/","\\1",$url);
  # extract id
  $id = preg_replace("/^.*\//","",$url);
  $id = preg_replace("/,.*$/","",$id);
  #$id="SK-A-147";
  $title_url = preg_replace("/search\/objecten\?/",
    "api/search/browse/items?offset=".$offset."&count=1&",$url);
  $title_url = preg_replace("/#\//", "&objectNumber=",$title_url);
  $title_url = preg_replace("/,[0-9]*$/", "",$title_url);
  $contents = file_get_contents($title_url);
  #$contents = file_get_contents("objecten.js");
  $items = json_decode(prepareJSON($contents), true);
  $title = $items["setItems"][0]["ObjectTitle"];
  $title = preg_replace("/^.*f.principalMaker.sort=([^#]*)#.*$/","\\1",$url).
    "-".$title;
}
$title = html_entity_decode($matches[1], ENT_COMPAT, 'utf-8');
$title = iconv("utf-8","ascii//TRANSLIT",$title);
$title = preg_replace("/[^A-z0-9]+/","-",$title);
$final = strtolower($title);
echo "\n";

echo "Getting images...";
$contents = file_get_contents(
  "http://q42imageserver.appspot.com/api/getTilesInfo?object_id=".$id);
#$contents = file_get_contents("levels.js");


$levels = json_decode(prepareJSON($contents), true);
$levels = $levels{"levels"};

$list="";
foreach( $levels as $level)
{
  if($level{"name"} == "z0")
  {
    $tiles = $level{"tiles"};
    // Obtain a list of columns
    foreach ($tiles as $key => $row) {
      $xs[$key]  = $row['x'];
      $ys[$key] =  $row['y'];
    }

    // Sort the data with volume descending, edition ascending
    // Add $data as the last parameter, to sort by the common key
    array_multisort($ys, SORT_ASC, $xs, SORT_ASC, $tiles);

    $tile_x = 0;
    $tile_y = 0;
    foreach( $tiles as $tile)
    {
      $x = $tile{"x"};
      $y = $tile{"y"};
      $tile_x = max($tile_x,intval($x)+1);
      $tile_y = max($tile_y,intval($y)+1);
      $img = "z0-$x-$y.jpg";
      $url = $tile{"url"};
      echo "(".$x.",".$y.") ";
      file_put_contents($img, file_get_contents($url));
      $list .= " ".$img;
    }
    break;
  }
}
echo "\n";
echo "Composing images...";
`montage $list -tile ${tile_x}x${tile_y} -geometry +0+0 -quality 100 $final.jpg`;
echo "\n";
echo $final.".jpg\n";

echo "Clean up...";
`rm -f $list`;
echo "\n";
?>

Then you can call the script from the command line with something like:


php rijksmuseum.php "https://www.rijksmuseum.nl/en/collection/NG-2011-6-24"

Buried inside of that script is also a nice way to clean up strings for use as filenames:


$title = html_entity_decode($matches[1], ENT_COMPAT, 'utf-8');
$title = iconv("utf-8","ascii//TRANSLIT",$title);
$title = preg_replace("/[^A-z0-9]+/","-",$title);
$final = strtolower($title);

DOM Exception in XMLHttpRequest

Thursday, April 18th, 2013

I was getting this error when trying to set up an XMLHttpRequest using javascript:


INVALID_STATE_ERR: DOM Exception 11: An attempt was made to use an object that is not, or is no longer, usable

The problem was that I was calling req.setRequestHeader(...) before calling req.open(...). Reversing the order fixed the problem.

Compile and run mesa on bluehost web server

Sunday, October 7th, 2012

I want to use the off-screen renderer of Mesa in a php script on my blue host served website. Compiling Mesa on my mac was dead simple (sudo port install mesa), but doing it on the linux server without root access or repositories was a bit tricky. Here’s how I finally got it to work.

Download and compile llvm, if it’s not around already. I found that version 3.1 didn’t play nicely with Mesa but 3.0 did. LLVM installed smoothly.


./configure --prefix=[INSTALL_PREFIX]
make -j5
make install

Next, grab the latest glproto headers. As far as I can tell, there is nothing to compile as only headers are needed.

Download mesa, unzip and compile using the following:


% Set up glproto headers
export GLPROTO_LIBS=../glproto-1.4.16/;
export GLPROTO_CFLAGS=../glproto-1.4.16/;
% configure, disabling DRI support (i.e. graphics card support)
./configure --prefix=[INSTALL_PREFIX] --disable-driglx-direct --enable-xlib-glx --enable-osmesa --disable-dri
make -j5
make install

Then I got the Mesa demos and made sure I could compile src/osdemos/osdemo.c:


gcc -o osdemo osdemo.c -I[INSTALL_PREFIX]/include -L[INSTALL_PREFIX]/lib -lOSMesa -lGLU

Upon running osdemo, you might see:


./osdemo: error while loading shared libraries: libOSMesa.so.8: cannot open shared object file: No such file or directory

But this is fixed by adding you library install path to the LD_LIBRARY_PATH variable:


export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:[INSTALL_PATH]/lib/

or in php:


putenv("LD_LIBRARY_PATH=".$_ENV["LD_LIBRARY_PATH"].":[INSTALL_PATH]/lib/");

If it works then you can run the program with:


./osdemo foo.tga

and produce an image like:
output of osdemo of mesa demos running on web server

PHP cURL command line app not waiting for response

Monday, September 17th, 2012

I had written a PHP command line app that took a URL as a command line argument:


php foo.php http://www.nytimes.com/2012/09/16/arts/shock-me-if-you-can.html?ref=arts&pagewanted=all 

The foo.php was simply calling curl to grab the page in question, do a little parsing and then spit it back out. When I issued the command above though, the program seemed to finish immediately and then later spit out the results. I was left thinking that the curl commands were asynchronous.
It turned out to be much simpler. I had an ampersand (&) in the url argument. This was forking the process in my bash terminal. All I needed to do was put quotes around the argument:


php foo.php "http://www.nytimes.com/2012/09/16/arts/shock-me-if-you-can.html?ref=arts&pagewanted=all"

Setting up a local website with server-side scripts on Mac OS X

Sunday, May 13th, 2012

Try to go to http://localhost/. If you don’t find anything there then turn on System Preferences > Sharing > Web Sharing.

On my 10.7 machine the default site location is at /Library/WebServer/Documents/index.html.en You should be able to edit this file and see that http://localhost/ changes.

Now, we’d like to have an arbitrary folder contain our website, so I’ll use ~/Documents/IGL-website. Copy your websites source or put some source in this folder.

Now we setup the redirect necessary to send some local domain name (we’ll use localigl to our website at ~/Documents/IGL-website. Open up the file /private/etc/hosts and add the line:


127.0.0.1       localigl

Now you should see that going to http://localigl/ also directs to /Library/WebServer/Documents/index.html.en

Now edit the file
/private/etc/apache2/users/YOURUSERNAME.conf
and add the following:


NameVirtualHost *:80

<Directory "/Users/YOURUSERNAME/Sites/">
    Options Indexes MultiViews Includes
    AllowOverride All
    Order allow,deny
    Allow from all
</Directory>

<VirtualHost *:80>
    ServerName localhost
    DocumentRoot /Users/YOURUSERNAME/Sites/
</VirtualHost>

<Directory "/Users/YOURUSERNAME/Documents/IGL-website/">
    Options Indexes MultiViews Includes
    AllowOverride All
    Order allow,deny
    Allow from all
</Directory>

<VirtualHost *:80>
    ServerName localigl
    DocumentRoot /Users/YOURUSERNAME/Documents/IGL-website/
</VirtualHost>

You’ll need to add executable permissions to the path preceding your website, so for my example this means:


chmod +x ~/Documents

To get php scripts working correctly, in /private/etc/apache2/httpd.conf uncomment the following line to look like this:


LoadModule php5_module libexec/apache2/libphp5.so

Now, you must restart your apache server. The easiest way is to toggle System Preferences > Sharing > Web Sharing.

With all this, you should now be able to go to http://localigl/ and see your (possibly php-based) webpage.

Php generate html select options from array

Tuesday, November 8th, 2011

Here’s a super obvious php snippet that generates a list of options for a “select” html form tag on the fly:


<select id="tf_select" name="tf_select"
onchange="update(this,document.getElementById('tf'));">
  <option value="">Select...</option>
  <?php
$a = array("New Haven","Rochester","New York","Washington D.C.","Seattle","Zurich");
foreach($a as $e)
{
  echo "<option value='".$e."'>".$e."</option>";
}
  ?>
  <option value="Other...">Other...</option>
</select>

Combined with the javascript from an earlier post, it should be very easy to generate selectors for php based forms.

List all movies ever made (as determined by wikipedia)

Thursday, September 15th, 2011

Here’s a php script the grabs a list of all films ever made (according to en.wikipedi.org)


<?php                                                                                                    
  // Returns a string containg the name of every movie (known to wikipedia)                              
  // separated by lines                                                                                  
  function all_movies()                                                                                  
  {                                                                                                      
    // List of wiki urls containing lists of movies for each "letter" of alphabet                        
    $urls = array(                                                                                       
      "http://en.wikipedia.org/wiki/List_of_films:_numbers",                                             
      "http://en.wikipedia.org/wiki/List_of_films:_A",                                                   
      "http://en.wikipedia.org/wiki/List_of_films:_B",                                                   
      "http://en.wikipedia.org/wiki/List_of_films:_C",                                                   
      "http://en.wikipedia.org/wiki/List_of_films:_D",                                                   
      "http://en.wikipedia.org/wiki/List_of_films:_E",                                                   
      "http://en.wikipedia.org/wiki/List_of_films:_F",                                                   
      "http://en.wikipedia.org/wiki/List_of_films:_G",                                                   
      "http://en.wikipedia.org/wiki/List_of_films:_H",                                                   
      "http://en.wikipedia.org/wiki/List_of_films:_I",                                                   
      "http://en.wikipedia.org/wiki/List_of_films:_J-K",                                                 
      "http://en.wikipedia.org/wiki/List_of_films:_L",                                                   
      "http://en.wikipedia.org/wiki/List_of_films:_M",                                                   
      "http://en.wikipedia.org/wiki/List_of_films:_N-O",                                                 
      "http://en.wikipedia.org/wiki/List_of_films:_P",                                                   
      "http://en.wikipedia.org/wiki/List_of_films:_Q-R",                                                 
      "http://en.wikipedia.org/wiki/List_of_films:_S",                                                   
      "http://en.wikipedia.org/wiki/List_of_films:_T",                                                   
      "http://en.wikipedia.org/wiki/List_of_films:_U-V-W",                                               
      "http://en.wikipedia.org/wiki/List_of_films:_X-Y-Z");                                              
    // output string                                                                                     
    $titles = array();                                                                                   
    // Loop over urls                                                                                    
    foreach($urls as $url)                                                                               
    {                                                                                                    
      $curl = curl_init();                                                                               
      curl_setopt($curl,CURLOPT_URL,$url);                                                               
      curl_setopt($curl,CURLOPT_RETURNTRANSFER,1);                                                       
      curl_setopt($curl,CURLOPT_TIMEOUT,2);                                                              
      $buffer = curl_exec($curl);                                                                        
      if (curl_errno($curl))                                                                             
      {                                                                                                  
        die ("An error occurred:".curl_error());                                                         
      }                                                                                                  
      preg_match_all("/<li><i>(.*)<\/i>.*/", $buffer, $matches);                                         
      foreach ($matches[1] as $title)                                                                    
      {                                                                                                  
        $title = html_entity_decode(strip_tags($title));                                                 
        array_push($titles,$title);                                                                      
      }                                                                                                  
    }                                                                                                    
    return $titles;                                                                                      
  }                                                                                                      
  if(__FILE__ == $_SERVER['SCRIPT_FILENAME'])
  {
    header ('Content-type: text/plain; charset=utf-8');
    echo implode("\n",all_movies());
  }
?>

This defined the function all_movies() and when the above is called directly it lists all the titles as a line in a plain text file.
Try it here

Related project page

Decode/unencode/uncode html entities using php as bash one-liner

Thursday, September 15th, 2011

Here’s a very simple one-liner that takes input from standard in and decodes html entities:


php -R 'echo html_entity_decode($argn)."\n";'

So this example


echo "&quot;Watson &amp; Crick&quot;" | php -R 'echo html_entity_decode($argn)."\n";'

produces:


"Watson & Crick"