Scrape all torrent titles from tehconnection.eu

Alec Jacobson

March 24, 2014

weblog/

Here's a bash script to log into <tehconnection.eu> then scrape all movie titles from their list of available torrents, doing minor cleanup on the symbols.

#!/bin/bash

LOGIN_URL="https://tehconnection.eu/login.php"
USERNAME=myusername
PASSWORD=mypassword
# log in to website and save cookies
wget --post-data \
  "username=$USERNAME&password=$PASSWORD" \
  --save-cookies=cookies.txt --keep-session-cookies \
  -O /dev/null \
  -q \
  $LOGIN_URL \
  &>/dev/null

FILMS_URL="https://tehconnection.eu/torrents.php?order_by=s1&order_way=ASC"
## download first page determine last page
RES=`wget --load-cookies=cookies.txt $FILMS_URL -q -O -`
LAST=`echo "$RES" | grep -m 1 -o "page[^\/]* Last" | \
  sed -e "s/page=\([0-9][0-9]*\).*/\1/g"`
#LAST=363

for p in $(seq 1 $LAST);
do 
  URL="$FILMS_URL&page=$p"
  RES=`wget --load-cookies=cookies.txt $URL -q -O -`
  echo "$RES"| grep "torrent_title\"" | \
    sed -e "s/.*View Torrent\">\([^<]*\).*/\1/g" | ./html_entity_decode.php
  sleep 3
done

# get rid of cookies
rm cookies.txt