Posts Tagged ‘wget’

Download video as mp4 from somevid.com

Thursday, February 5th, 2015

Given a url to a somevid.com video like: http://somevid.com/YvEkX82q2zH9Mbo6y8VX. Here’s how you can rearrange the url to download the video as an mp4:

wget $(echo "http://somevid.com/YvEkX82q2zH9Mbo6y8VX" | sed -e "s/com\/\(.*\)/com\/transcodes\/\1?format=1\&chunk=0/") -O output.mp4

Probably if it’s a long video you’ll need to increment the chunk=0 to chunk=1 and so on and then stitch the video together.

Scrape all torrent titles from tehconnection.eu

Monday, March 24th, 2014

Here’s a bash script to log into <tehconnection.eu> then scrape all movie titles from their list of available torrents, doing minor cleanup on the symbols.

#!/bin/bash

LOGIN_URL="https://tehconnection.eu/login.php"
USERNAME=myusername
PASSWORD=mypassword
# log in to website and save cookies
wget --post-data \
  "username=$USERNAME&password=$PASSWORD" \
  --save-cookies=cookies.txt --keep-session-cookies \
  -O /dev/null \
  -q \
  $LOGIN_URL \
  &>/dev/null

FILMS_URL="https://tehconnection.eu/torrents.php?order_by=s1&order_way=ASC"
## download first page determine last page
RES=`wget --load-cookies=cookies.txt $FILMS_URL -q -O -`
LAST=`echo "$RES" | grep -m 1 -o "page[^\/]* Last" | \
  sed -e "s/page=\([0-9][0-9]*\).*/\1/g"`
#LAST=363

for p in $(seq 1 $LAST);
do 
  URL="$FILMS_URL&page=$p"
  RES=`wget --load-cookies=cookies.txt $URL -q -O -`
  echo "$RES"| grep "torrent_title\"" | \
    sed -e "s/.*View Torrent\">\([^<]*\).*/\1/g" | ./html_entity_decode.php
  sleep 3
done

# get rid of cookies
rm cookies.txt

Grab today’s Astronomy Picture of the Day and set as background

Wednesday, September 11th, 2013

I wrote this for my brother a while back. First set up your desktop to use some photo /path/to/today.jpg as a background.

Next save this in a file called apod.sh

#!/bin/bash
URL=`wget -q http://apod.nasa.gov/apod/ -O - | grep "<a href=.image" | sed -e 's/<a href=.\([^"]*\).*$/http:\/\/apod.nasa.gov\/apod\/\1/'`
wget -q "$URL" -O /path/to/today.jpg 

Note: be sure to change /path/to/today.jpg above.

Then set up a daily cronjob to execute this script:

@daily /path/to/apod.sh

High resolution images from rijksmuseum

Monday, June 3rd, 2013

Here’s a php script to download and stitch together high resolution images from the rijksmuseum:


<?php

# http://www.php.net/manual/en/function.json-decode.php#107107
function prepareJSON($input) {
    
    //This will convert ASCII/ISO-8859-1 to UTF-8.
    //Be careful with the third parameter (encoding detect list), because
    //if set wrong, some input encodings will get garbled (including UTF-8!)
    $imput = mb_convert_encoding($input, 'UTF-8', 'ASCII,UTF-8,ISO-8859-1');
    
    //Remove UTF-8 BOM if present, json_decode() does not like it.
    if(substr($input, 0, 3) == pack("CCC", 0xEF, 0xBB, 0xBF)) $input = substr($input, 3);
    
    return $input;
}

$url = $argv[1];
$url = preg_replace("/^https/","http",$url);

echo "Getting title...";
if(preg_match("/\/en\/collection\//",$url))
{
  $contents = file_get_contents($url);
  preg_match('/objectNumber : "([^"]*)"/',$contents,$matches);
  $id = $matches[1];
  preg_match('/objectTitle : "([^"]*)"/',$contents,$matches);
}else
{
  $offset = preg_replace("/^.*,([0-9]*)$/","\\1",$url);
  # extract id
  $id = preg_replace("/^.*\//","",$url);
  $id = preg_replace("/,.*$/","",$id);
  #$id="SK-A-147";
  $title_url = preg_replace("/search\/objecten\?/",
    "api/search/browse/items?offset=".$offset."&count=1&",$url);
  $title_url = preg_replace("/#\//", "&objectNumber=",$title_url);
  $title_url = preg_replace("/,[0-9]*$/", "",$title_url);
  $contents = file_get_contents($title_url);
  #$contents = file_get_contents("objecten.js");
  $items = json_decode(prepareJSON($contents), true);
  $title = $items["setItems"][0]["ObjectTitle"];
  $title = preg_replace("/^.*f.principalMaker.sort=([^#]*)#.*$/","\\1",$url).
    "-".$title;
}
$title = html_entity_decode($matches[1], ENT_COMPAT, 'utf-8');
$title = iconv("utf-8","ascii//TRANSLIT",$title);
$title = preg_replace("/[^A-z0-9]+/","-",$title);
$final = strtolower($title);
echo "\n";

echo "Getting images...";
$contents = file_get_contents(
  "http://q42imageserver.appspot.com/api/getTilesInfo?object_id=".$id);
#$contents = file_get_contents("levels.js");


$levels = json_decode(prepareJSON($contents), true);
$levels = $levels{"levels"};

$list="";
foreach( $levels as $level)
{
  if($level{"name"} == "z0")
  {
    $tiles = $level{"tiles"};
    // Obtain a list of columns
    foreach ($tiles as $key => $row) {
      $xs[$key]  = $row['x'];
      $ys[$key] =  $row['y'];
    }

    // Sort the data with volume descending, edition ascending
    // Add $data as the last parameter, to sort by the common key
    array_multisort($ys, SORT_ASC, $xs, SORT_ASC, $tiles);

    $tile_x = 0;
    $tile_y = 0;
    foreach( $tiles as $tile)
    {
      $x = $tile{"x"};
      $y = $tile{"y"};
      $tile_x = max($tile_x,intval($x)+1);
      $tile_y = max($tile_y,intval($y)+1);
      $img = "z0-$x-$y.jpg";
      $url = $tile{"url"};
      echo "(".$x.",".$y.") ";
      file_put_contents($img, file_get_contents($url));
      $list .= " ".$img;
    }
    break;
  }
}
echo "\n";
echo "Composing images...";
`montage $list -tile ${tile_x}x${tile_y} -geometry +0+0 -quality 100 $final.jpg`;
echo "\n";
echo $final.".jpg\n";

echo "Clean up...";
`rm -f $list`;
echo "\n";
?>

Then you can call the script from the command line with something like:


php rijksmuseum.php "https://www.rijksmuseum.nl/en/collection/NG-2011-6-24"

Buried inside of that script is also a nice way to clean up strings for use as filenames:


$title = html_entity_decode($matches[1], ENT_COMPAT, 'utf-8');
$title = iconv("utf-8","ascii//TRANSLIT",$title);
$title = preg_replace("/[^A-z0-9]+/","-",$title);
$final = strtolower($title);

Download web file onto a server from client

Tuesday, March 19th, 2013

If I want to download a file from a URL I can use wget. Then I can use scp to copy to my webserver. Or I could ssh to the server and call wget from there. Here’s a one-liner to do just that:


ssh SERVER 'wget URL -O LOCAL_SERVER_OUTPUT_FILE'

Download high definition images from christies

Saturday, July 14th, 2012

Place this in christies.sh:


#!/bin/bash

#id=D5391252
id=`echo "$1" | sed -e "s/.*ObjectID=\([0-9]*\)&.*/d\1/g"`
final=`echo "$1" | sed -e "s/.*lotfinder\\/[^\\/]*\\/\(.*\)-[0-9]*-details.*/\1/g"`
echo $id
echo $final


URL=http://www.christies.com/lotfinderimages/${id:0:6}/$id/TileGroup0/
echo $URL

# Explicitly declare integers
typeset -i i j k max_i max_j max_k tile_x tile_y

if [ $2 ]; then
  let max_i=$2
else
  #Try to find biggest zoom size
  let i=0
  while true; do
    if wget $URL/$i-0-0.jpg --spider -v -O/dev/null 2>&1 | grep "Remote file exists\." 1> /dev/null
    then
      echo "$i-0-0.jpg exists"
      max_i=$i
    else
      break
    fi
    let i++
  done
fi

#let max_i=4

#Try to find biggest first index
let j=0 max_j=0
while true; do
  if wget $URL/$max_i-$j-0.jpg -v 2>&1 | grep "image/jpeg" 1> /dev/null
  then
    echo "$max_i-$j-0.jpg exists"
    max_j=$j
  else
    # clean up
    rm -f $max_i-$j-0.jpg
    break
  fi
  let j++
done

#let max_j=12

#Try to find biggest second index
let k=1 max_k=0
while true; do
  if wget $URL/$max_i-0-$k.jpg -v 2>&1 | grep "image/jpeg" 1> /dev/null
  then
    echo "$max_i-0-$k.jpg exists"
    max_k=$k
  else
    # clean up
    rm -f $max_i-0-$k.jpg
    break
  fi
  let k++
done

#let max_k=8

let j=1
while ((j<=max_j)); do
  let k=1
  while ((k<=max_k)); do
    if wget $URL/$max_i-$j-$k.jpg -v 2>&1 | grep "image/jpeg" 1> /dev/null
    then
      echo "$max_i-$j-$k.jpg exists"
    else
      echo "$max_i-$j-$k.jpg not found!"
    fi
    let k++
  done
  let j++
done

# get list of images in an order that montage will understand
list=""
let k=0
while ((k<=max_k)); do
  let j=0
  while ((j<=max_j)); do
    list="$list $max_i-$j-$k.jpg"
    let j++
  done
  let k++
done

let tile_y=max_j+1
let tile_x=max_k+1

montage $list -tile ${tile_y}x${tile_x} -geometry +0+0 -quality 100 $final.jpg
# clean up
rm -f $list

Then you can issue:


./christies "http://www.christies.com/lotfinder/sculptures-statues-figures/jeff-koons-winter-bears-5408907-details.aspx?from=searchresults&intObjectID=5408907&sid=275b24b4-22e5-4277-8e72-36c4708279d8"

Just be sure to put quotes around the url.

Get largest image from webpage using php, wget and imagemagick

Thursday, March 10th, 2011

Here’s a script I wrote to make autoblog a little more interesting. Now any time a spammer makes a comment it checks the URL they provide for a large image and appends it to their comment when their comment becomes a post.

I use a little php script to get the largest image from the url and return an image tag (as long as the image is big enough). Here it is:


<?php

function isValidURL($url)
{
  return preg_match('|^http(s)?://[a-z0-9-]+(.[a-z0-9-]+)*(:[0-9]+)?(/.*)?$|i', $url);
}

function image_from_URL($URL)
{
  $tempdir = "temp_images";
  if(isValidURL($URL))
  {
    `wget -r -l1 -H -t1 -nd -N -np -A jpg,gif,png -P $tempdir -erobots=off $URL`;
    $handle = opendir($tempdir); 
    $max_size = 0;
    $biggest = "";
    while (false !== ($file = readdir($handle)))
    {
      $extension = strtolower(substr(strrchr($file, '.'), 1)); 
      if($extension == 'jpg' || $extension == 'gif' || $extension == 'png')
      {
        // identify from imagemagick can return the w and h as a string "w*h"
        // which then bc can compute as a multiplication giving the area in pixels
        $size = (int) exec(
          "identify -format \"%[fx:w]*%[fx:h]\" \"$tempdir/$file\" | bc", $ret);
        if($size > $max_size)
        {
          $max_size = $size;
          $biggest = $file;
        }
      } 
    } 

   // HERE YOU CAN ADD CODE TO DELETE THE TEMP FILES ETC

    if($max_size >= 80000)
    {
      return "<img src='$tempdir/$biggest' class='center'>";
    }
  }
  return "";
}

isvalidurl source
wget one-liner source

Download all files of certain extension from website using wget

Monday, May 17th, 2010

Issue this command in a terminal to download all mp3s linked to on a page using wget

wget -r -l1 -H -t1 -nd -N -np -A.mp3 -erobots=off [url of website]

OR if you want to download all linked mp3s from multiple pages then make a text file containing each url on a separate line, then issue:

wget -r -l1 -H -t1 -nd -N -np -A.mp3 -erobots=off -i ~/mp3blogs.txt

If the site is behind basic http authentication you can use something like:

wget --http-user [username] --http-passwd [passwd] -r -l1 -H -t1 -nd -N -np -A.mp3 -erobots=off "[url]"

Log in to nytimes.com download crossword puzzle and convert to pdf script

Tuesday, January 12th, 2010

I have given my girlfriend a subscription to the New York Times crossword puzzle which she (graciously?) allows me to use. Using the help of the decode_crossword.pl perl script, I have made a bash script to log in to nytimes.com, grab today’s puzzle in .puz format and convert it to pdf so I can easily print and view it. Here’s the script (replace userid and password with your own):


#!/bin/bash

if test -z $1 ; then
  date=`date +%b%d%y`
else
  date="$1"
fi

#
# log in to nytimes and save cookies
wget --post-data \
  "USERID=youremail%40gmail.com&PASSWORD=yourpassword&is_continue=true" \
  --save-cookies=cookies.txt --keep-session-cookies \
  -O /dev/null\
  http://www.nytimes.com/auth/login &>/dev/null

# download puzzle
wget --load-cookies=cookies.txt \
  http://select.nytimes.com/premium/xword/$date.puz &>/dev/null

# get rid of cookies
rm cookies.txt

# convert to pdf
./decode_crossword -P $date.puz | ps2pdf - $date.pdf &>/dev/null

# get rid of .puz version
rm $date.puz

Update: NYTimes changed their login routine so now you should use something like:


#!/bin/bash

if test -z $1 ; then
  date=`date +%b%d%y`
else
  date="$1"
fi

wget \
  --no-check-certificate \
  https://myaccount.nytimes.com/auth/login \
  -O login.html \
  &>/dev/null
 
token=`grep token login.html | sed -e "s/^.*value=\"\([A-z0-9]*\)\".*$/\\1/g"`
expires=`grep expires login.html | sed -e "s/^.*value=\"\([A-z0-9]*\)\".*$/\\1/g"`

# log in as annie, should get rid of her password from this
wget --post-data \
  "userid=youremail%40gmail.com&password=yourpassword&is_continue=false&remember=true&token=$token&expires=$expires" \
  --save-cookies=cookies.txt --keep-session-cookies \
  --no-check-certificate \
  -O /dev/null \
  https://myaccount.nytimes.com/auth/login \
  &>/dev/null

# download puzzle
wget --load-cookies=cookies.txt \
  http://select.nytimes.com/premium/xword/$date.puz &>/dev/null

# get rid of cookies
rm cookies.txt
# get rid of login cache
rm login.html

# convert to pdf
./decode_crossword -P $date.puz | ps2pdf - $date.pdf &>/dev/null

# get rid of .puz version
rm $date.puz

Whereami, find out your physical location via command line

Saturday, December 5th, 2009

Using a wget and sed combo I found on go2linux and a little web scraping, I’ve com up with a little command line bash script to find your public ip address and then determine your physical location. Save this in a file called whereami.sh


#!/bin/bash

# get public ip address from checkip.dyndns.org
public_ip=`wget -q -O - checkip.dyndns.org | \
sed -e 's/.*Current IP Address: //' -e 's/<.*$//'`

echo $public_ip

# get physical address from ip address from melissadata.com
wget -q -O - \
http://www.melissadata.com/Lookups/iplocation.asp?ipaddress=$public_ip | \
grep "\(\(City\)\|\(State or Region\)\|\(Country\)\)<\/td>" | \
sed "s/.*<b>\([^<]*\)<\/b>.*/\1/"

Note: If you know of a more stable ip to physical location site to scape leave it below.