Posts Tagged ‘film’

Scrape all torrent titles from tehconnection.eu

Monday, March 24th, 2014

Here’s a bash script to log into <tehconnection.eu> then scrape all movie titles from their list of available torrents, doing minor cleanup on the symbols.

#!/bin/bash

LOGIN_URL="https://tehconnection.eu/login.php"
USERNAME=myusername
PASSWORD=mypassword
# log in to website and save cookies
wget --post-data \
  "username=$USERNAME&password=$PASSWORD" \
  --save-cookies=cookies.txt --keep-session-cookies \
  -O /dev/null \
  -q \
  $LOGIN_URL \
  &>/dev/null

FILMS_URL="https://tehconnection.eu/torrents.php?order_by=s1&order_way=ASC"
## download first page determine last page
RES=`wget --load-cookies=cookies.txt $FILMS_URL -q -O -`
LAST=`echo "$RES" | grep -m 1 -o "page[^\/]* Last" | \
  sed -e "s/page=\([0-9][0-9]*\).*/\1/g"`
#LAST=363

for p in $(seq 1 $LAST);
do 
  URL="$FILMS_URL&page=$p"
  RES=`wget --load-cookies=cookies.txt $URL -q -O -`
  echo "$RES"| grep "torrent_title\"" | \
    sed -e "s/.*View Torrent\">\([^<]*\).*/\1/g" | ./html_entity_decode.php
  sleep 3
done

# get rid of cookies
rm cookies.txt

List all movies ever made (as determined by wikipedia)

Thursday, September 15th, 2011

Here’s a php script the grabs a list of all films ever made (according to en.wikipedi.org)


<?php                                                                                                    
  // Returns a string containg the name of every movie (known to wikipedia)                              
  // separated by lines                                                                                  
  function all_movies()                                                                                  
  {                                                                                                      
    // List of wiki urls containing lists of movies for each "letter" of alphabet                        
    $urls = array(                                                                                       
      "http://en.wikipedia.org/wiki/List_of_films:_numbers",                                             
      "http://en.wikipedia.org/wiki/List_of_films:_A",                                                   
      "http://en.wikipedia.org/wiki/List_of_films:_B",                                                   
      "http://en.wikipedia.org/wiki/List_of_films:_C",                                                   
      "http://en.wikipedia.org/wiki/List_of_films:_D",                                                   
      "http://en.wikipedia.org/wiki/List_of_films:_E",                                                   
      "http://en.wikipedia.org/wiki/List_of_films:_F",                                                   
      "http://en.wikipedia.org/wiki/List_of_films:_G",                                                   
      "http://en.wikipedia.org/wiki/List_of_films:_H",                                                   
      "http://en.wikipedia.org/wiki/List_of_films:_I",                                                   
      "http://en.wikipedia.org/wiki/List_of_films:_J-K",                                                 
      "http://en.wikipedia.org/wiki/List_of_films:_L",                                                   
      "http://en.wikipedia.org/wiki/List_of_films:_M",                                                   
      "http://en.wikipedia.org/wiki/List_of_films:_N-O",                                                 
      "http://en.wikipedia.org/wiki/List_of_films:_P",                                                   
      "http://en.wikipedia.org/wiki/List_of_films:_Q-R",                                                 
      "http://en.wikipedia.org/wiki/List_of_films:_S",                                                   
      "http://en.wikipedia.org/wiki/List_of_films:_T",                                                   
      "http://en.wikipedia.org/wiki/List_of_films:_U-V-W",                                               
      "http://en.wikipedia.org/wiki/List_of_films:_X-Y-Z");                                              
    // output string                                                                                     
    $titles = array();                                                                                   
    // Loop over urls                                                                                    
    foreach($urls as $url)                                                                               
    {                                                                                                    
      $curl = curl_init();                                                                               
      curl_setopt($curl,CURLOPT_URL,$url);                                                               
      curl_setopt($curl,CURLOPT_RETURNTRANSFER,1);                                                       
      curl_setopt($curl,CURLOPT_TIMEOUT,2);                                                              
      $buffer = curl_exec($curl);                                                                        
      if (curl_errno($curl))                                                                             
      {                                                                                                  
        die ("An error occurred:".curl_error());                                                         
      }                                                                                                  
      preg_match_all("/<li><i>(.*)<\/i>.*/", $buffer, $matches);                                         
      foreach ($matches[1] as $title)                                                                    
      {                                                                                                  
        $title = html_entity_decode(strip_tags($title));                                                 
        array_push($titles,$title);                                                                      
      }                                                                                                  
    }                                                                                                    
    return $titles;                                                                                      
  }                                                                                                      
  if(__FILE__ == $_SERVER['SCRIPT_FILENAME'])
  {
    header ('Content-type: text/plain; charset=utf-8');
    echo implode("\n",all_movies());
  }
?>

This defined the function all_movies() and when the above is called directly it lists all the titles as a line in a plain text file.
Try it here

Related project page