Posts Tagged ‘mogrify’

Convert two-page color scan of book into monochrome single pdf

Friday, November 25th, 2016

Two days in a row now I’ve had to visit the physical library to retrieve an old paper. Makes me feel very authentic as an academic. Our library has free scanning facilities, but the resulting PDF will have a couple problems. If I’m scanning a book then each page of the pdf actually contains 2 pages of the book. Depending on the scanner settings, I might also accidentally have my 2 pages running vertically instead of horizontally. Finally, if I forgot to set the color settings on the scanner, then I get a low-contrast color image instead of a high-contrast monochrome scan.

Here’s a preview of pdf of an article from a book I scanned that has all these problems: scanned low contrast color pdf

If this pdf is in input.pdf then I call the following commands to create output.pdf:

pdfimages input.pdf .scan
mogrify -format png -monochrome -rotate 90 -crop 50%x100% .scan*
convert +repage .scan*png output.pdf
rm .scan*

output monochrome pdf

I’m pretty happy with the output. There are some speckles, but the simple -monochrome flag does a fairly good job.

I use Adobe Acrobat Pro to run OCR so that the text is selectable (haven’t found a good command line solution for that, yet).

Note: I think the -rotate 90 is needed because the images are stored rotated by -90 degrees but the input.pdf is compositing them after rotation. This hints that this script won’t generalize to complicated pdfs. But we’re safe here because a scanner will probably apply the same transformation to each page.

Composite thumbnails of multipage pdf into stacked image

Monday, April 22nd, 2013

Here’s a bash script that takes a multipage pdf and produces a stack of thumbnails with nice shadows: multipage pdf thumbnail stack Save this in multipagethumb.sh: #!/bin/bash

# montage -gravity center null: null: 'supplemental_opt.pdf' null: null:
# -thumbnail 128x128 -sharpen 10 -bordercolor white -border 0 -background none
# +polaroid -set label '' -background Transparent -tile x1 -geometry -0+64
# -reverse -flop png:- | convert png:- -flop -trim output.png
if [ $# -lt 2 ]
  then
  echo "Usage:"
  echo "  ./multipagethumb input.pdf output.png"
  exit 1
fi

output="${2%.*}"
## this occassionally gives a concatentation of number of pages number of pages
## times: 10101010101010101010
#n=`identify -format %n $1`
n=`pdftk $1 dump_data | grep NumberOfPages | sed 's/[^0-9]*//'`

# 88+12+30*16 = 580
w="88"
x="30"
y="3"
for p in $(seq 1 $n)
do
  p=`echo "$p-1"|bc`
  echo "convert $1[$p] -flatten -thumbnail ${w}x -bordercolor none -border 0 \( +clone \
    -background none -shadow 80x3+2+2 \) +swap -background none -layers \
    merge +repage  $output-$p.png"
  convert $1[$p] -flatten -thumbnail ${w}x -bordercolor none -border 0 \( +clone \
    -background none -shadow 80x3+2+2 \) +swap -background none -layers \
    merge +repage  $output-$p.png
  if [[ $p == "0" ]]
  then
    echo "convert $output-$p.png $2"
    convert $output-$p.png $2
  else
    echo "convert $output.png -gravity SouthEast -background none -splice ${x}x${y} $output.png"
    convert $output.png -gravity SouthEast -background none -splice ${x}x${y} $output.png
    echo "composite -compose dst-over $output-$p.png $output.png -gravity SouthEast $output.png"
    composite -compose dst-over $output-$p.png $output.png -gravity SouthEast $output.png
  fi
  rm $output-$p.png
done

Then issue:

./multipagethumb.sh input.pdf output.png

Note: You can achieve something similar with the montage and +polaroid command but it was difficult to achieve diagonal stacking and the correct order.

High quality desktop backgrounds script

Tuesday, July 10th, 2012

Here’s a script I use to create 2560xx1440 high quality backgrounds from images:


mkdir -p output
mogrify -fuzz 25% -trim -resize 2560x1440 -background white -gravity center -extent 2560x1440 -format jpg -quality 100 -path output *.jpg

It first creates the output directory than converts all jpgs in the directory to be 2560 by 1440 images trimmed of their borders then padded with white.

Extract all images from a pdf as png files (at full resolution)

Friday, August 26th, 2011

Here’s a two-liner to extract all the embedded color images in a pdf and convert then to png files. pdfimages extracts the images as ppm files. But I couldn’t open these immediately on my mac with my favorite image editing tools, so I convert them with mogrify from the imagemagick suite to png files.


pdfimages original.pdf ./extracted-images
mogrify -format png ./extracted-images*.ppm

and to get rid of the ppm files


rm ./extracted-images*.ppm