Posts Tagged ‘cookies’

Scrape all torrent titles from tehconnection.eu

Monday, March 24th, 2014

Here’s a bash script to log into <tehconnection.eu> then scrape all movie titles from their list of available torrents, doing minor cleanup on the symbols.

#!/bin/bash

LOGIN_URL="https://tehconnection.eu/login.php"
USERNAME=myusername
PASSWORD=mypassword
# log in to website and save cookies
wget --post-data \
  "username=$USERNAME&password=$PASSWORD" \
  --save-cookies=cookies.txt --keep-session-cookies \
  -O /dev/null \
  -q \
  $LOGIN_URL \
  &>/dev/null

FILMS_URL="https://tehconnection.eu/torrents.php?order_by=s1&order_way=ASC"
## download first page determine last page
RES=`wget --load-cookies=cookies.txt $FILMS_URL -q -O -`
LAST=`echo "$RES" | grep -m 1 -o "page[^\/]* Last" | \
  sed -e "s/page=\([0-9][0-9]*\).*/\1/g"`
#LAST=363

for p in $(seq 1 $LAST);
do 
  URL="$FILMS_URL&page=$p"
  RES=`wget --load-cookies=cookies.txt $URL -q -O -`
  echo "$RES"| grep "torrent_title\"" | \
    sed -e "s/.*View Torrent\">\([^<]*\).*/\1/g" | ./html_entity_decode.php
  sleep 3
done

# get rid of cookies
rm cookies.txt

Log in to nytimes.com download crossword puzzle and convert to pdf script

Tuesday, January 12th, 2010

I have given my girlfriend a subscription to the New York Times crossword puzzle which she (graciously?) allows me to use. Using the help of the decode_crossword.pl perl script, I have made a bash script to log in to nytimes.com, grab today’s puzzle in .puz format and convert it to pdf so I can easily print and view it. Here’s the script (replace userid and password with your own):


#!/bin/bash

if test -z $1 ; then
  date=`date +%b%d%y`
else
  date="$1"
fi

#
# log in to nytimes and save cookies
wget --post-data \
  "USERID=youremail%40gmail.com&PASSWORD=yourpassword&is_continue=true" \
  --save-cookies=cookies.txt --keep-session-cookies \
  -O /dev/null\
  http://www.nytimes.com/auth/login &>/dev/null

# download puzzle
wget --load-cookies=cookies.txt \
  http://select.nytimes.com/premium/xword/$date.puz &>/dev/null

# get rid of cookies
rm cookies.txt

# convert to pdf
./decode_crossword -P $date.puz | ps2pdf - $date.pdf &>/dev/null

# get rid of .puz version
rm $date.puz

Update: NYTimes changed their login routine so now you should use something like:


#!/bin/bash

if test -z $1 ; then
  date=`date +%b%d%y`
else
  date="$1"
fi

wget \
  --no-check-certificate \
  https://myaccount.nytimes.com/auth/login \
  -O login.html \
  &>/dev/null
 
token=`grep token login.html | sed -e "s/^.*value=\"\([A-z0-9]*\)\".*$/\\1/g"`
expires=`grep expires login.html | sed -e "s/^.*value=\"\([A-z0-9]*\)\".*$/\\1/g"`

# log in as annie, should get rid of her password from this
wget --post-data \
  "userid=youremail%40gmail.com&password=yourpassword&is_continue=false&remember=true&token=$token&expires=$expires" \
  --save-cookies=cookies.txt --keep-session-cookies \
  --no-check-certificate \
  -O /dev/null \
  https://myaccount.nytimes.com/auth/login \
  &>/dev/null

# download puzzle
wget --load-cookies=cookies.txt \
  http://select.nytimes.com/premium/xword/$date.puz &>/dev/null

# get rid of cookies
rm cookies.txt
# get rid of login cache
rm login.html

# convert to pdf
./decode_crossword -P $date.puz | ps2pdf - $date.pdf &>/dev/null

# get rid of .puz version
rm $date.puz