Log in to nytimes.com download crossword puzzle and convert to pdf script

I have given my girlfriend a subscription to the New York Times crossword puzzle which she (graciously?) allows me to use. Using the help of the decode_crossword.pl perl script, I have made a bash script to log in to nytimes.com, grab today’s puzzle in .puz format and convert it to pdf so I can easily print and view it. Here’s the script (replace userid and password with your own):


#!/bin/bash

if test -z $1 ; then
  date=`date +%b%d%y`
else
  date="$1"
fi

#
# log in to nytimes and save cookies
wget --post-data \
  "USERID=youremail%40gmail.com&PASSWORD=yourpassword&is_continue=true" \
  --save-cookies=cookies.txt --keep-session-cookies \
  -O /dev/null\
  http://www.nytimes.com/auth/login &>/dev/null

# download puzzle
wget --load-cookies=cookies.txt \
  http://select.nytimes.com/premium/xword/$date.puz &>/dev/null

# get rid of cookies
rm cookies.txt

# convert to pdf
./decode_crossword -P $date.puz | ps2pdf - $date.pdf &>/dev/null

# get rid of .puz version
rm $date.puz

Update: NYTimes changed their login routine so now you should use something like:


#!/bin/bash

if test -z $1 ; then
  date=`date +%b%d%y`
else
  date="$1"
fi

wget \
  --no-check-certificate \
  https://myaccount.nytimes.com/auth/login \
  -O login.html \
  &>/dev/null
 
token=`grep token login.html | sed -e "s/^.*value=\"\([A-z0-9]*\)\".*$/\\1/g"`
expires=`grep expires login.html | sed -e "s/^.*value=\"\([A-z0-9]*\)\".*$/\\1/g"`

# log in as annie, should get rid of her password from this
wget --post-data \
  "userid=youremail%40gmail.com&password=yourpassword&is_continue=false&remember=true&token=$token&expires=$expires" \
  --save-cookies=cookies.txt --keep-session-cookies \
  --no-check-certificate \
  -O /dev/null \
  https://myaccount.nytimes.com/auth/login \
  &>/dev/null

# download puzzle
wget --load-cookies=cookies.txt \
  http://select.nytimes.com/premium/xword/$date.puz &>/dev/null

# get rid of cookies
rm cookies.txt
# get rid of login cache
rm login.html

# convert to pdf
./decode_crossword -P $date.puz | ps2pdf - $date.pdf &>/dev/null

# get rid of .puz version
rm $date.puz

Tags: , , , , , ,

11 Responses to “Log in to nytimes.com download crossword puzzle and convert to pdf script”

  1. rod says:

    Hey thanks I’ve been using this for a long time and it is GREAT!

    Now as the Dec 6 it is no longer working. I now get a download of the html with a very very long filename. Any thoughts?

    typ. filename:
    index.html?URI=http:index.html?URI=http:

    %2F%2Fselect.nytimes.com%2Fpremium%2Fxword%2FDec0910.puz&OQ=_rQ3D1&OP=3e23905fQ2FQ2Apc1Q2AW4Kir44xCQ2ABrcYQ5EQ25YQ2Akp4rWQ2AvcKQ7BQ3CbQ7BwBQ257

  2. ajx says:

    Yes I noticed that it stopped working, too :-(. NYtimes changed how the login is handled so wget was no longer saving valid cookies. I have fixed the script. Now you should use something like the new version above.

  3. Rod says:

    Hey thanks! I was working on it also when I saw the new post. Look like good work. Did work on the first try.
    sed reports:
    >grep token login.html | sed -e “s/^.*value=\”\([A-z0-9]*\)\”.*$/\\1/g”
    sed: -e expression #1, char 34: Invalid range end

    >grep token login.html

    It is too late for my to figure out what the global sub in sed is trying do.
    I’ll look more closely in the morn.
    Thanks much though!!

  4. Rod says:

    I had a type it did NOT work. sed has a problem

  5. ajx says:

    Hmmm. Still working on mine. Perhaps we have different seds? All that line is doing is stripping out the number that comes after value= on the line that contains the word token. Maybe it’s not too hard to fix yourself? If you’re not used to grep/sed let me know and we can try to find out what’s up. In any case. Let me know if you find a fix :-)

  6. Rod says:

    I settled on
    sed -e ‘s/^.*value=”\([0-9a-f]\+\)”.*$/\1/’

    Thank you very much!

  7. Fred says:

    I am having same problem as Rod. I am just going through the 3 steps manually, without a script. I am wondering if my cookie file came out OK. This is what it looks like:

    # HTTP cookie file.
    # Generated by Wget on 2011-02-07 22:28:58.
    # Edit at your own risk.

    I manually retrieved the token and expire values and typed them in for the 2nd wget.

    Thanks,

  8. ajx says:

    @Fred, Are you sure you’re changing the wget line to have your correct email and password? Certain characters will have to be escaped (like a URL I think). I get the cookies file like yours if I don’t have the correct username or password.
    -A

  9. Fred says:

    ajx,
    Thanks for the insight – Can you believe I am still working on this? I’ve think I may have tracked down my problem. I wasn’t able to get the sed working I had to edit the line to get the right value:
    token= grep token login.html | sed -e s/^.*value=\\\”\\\([a-f0-9]*\\\).*$/\\1/

    I think I may still have a problem. If I take out “&token=$token&expires=$expires” and instead type in the token and expires values, then everything works fine. I inserted echo $token and echo $expires into the script just to see that the values were being retrieved correctly, and they so seem to be.

    Do I need quotes around these values or something to make work? Based on the sed line, I wonder if something is weird in my bash interpreter. I am using ubuntu 10.

    Thanks – I think I am close.

  10. ajx says:

    Could be that your unix tools are slightly different, but seems like something else is up. If you have the program “echo” the token and expires like you say then try the wget line with those typed in manually does it work? It’s important that you use the values that are echoed.

    I don’t think you should need quotes, since the variables are already used in quotes. Maybe try leaving out the -O /dev/null and &>/dev/null so you can see better what’s going wrong.

  11. Fred says:

    ajx,
    Thanks again for staying with me. I already had tried what you suggested, and manually it seems to work. I finally got things to work, and it was in fact a quoting issue. It turns out I wasn’t echoing the token and expires, but instead I was seeing the output from the token=’grep… So somehow my mis-quoting was preventing the token value to be assigned.

    This is the final line that made it all work for me:

    token=`grep token login.html | sed -e “s/^.*value=\”\([a-f0-9]*\).*$/\1/”`

    Thanks for your help and the great original work!!

Leave a Reply