Posts Tagged ‘convert’

Convert two-page color scan of book into monochrome single pdf

Friday, November 25th, 2016

Two days in a row now I’ve had to visit the physical library to retrieve an old paper. Makes me feel very authentic as an academic. Our library has free scanning facilities, but the resulting PDF will have a couple problems. If I’m scanning a book then each page of the pdf actually contains 2 pages of the book. Depending on the scanner settings, I might also accidentally have my 2 pages running vertically instead of horizontally. Finally, if I forgot to set the color settings on the scanner, then I get a low-contrast color image instead of a high-contrast monochrome scan.

Here’s a preview of pdf of an article from a book I scanned that has all these problems: scanned low contrast color pdf

If this pdf is in input.pdf then I call the following commands to create output.pdf:

pdfimages input.pdf .scan
mogrify -format png -monochrome -rotate 90 -crop 50%x100% .scan*
convert +repage .scan*png output.pdf
rm .scan*

output monochrome pdf

I’m pretty happy with the output. There are some speckles, but the simple -monochrome flag does a fairly good job.

I use Adobe Acrobat Pro to run OCR so that the text is selectable (haven’t found a good command line solution for that, yet).

Note: I think the -rotate 90 is needed because the images are stored rotated by -90 degrees but the input.pdf is compositing them after rotation. This hints that this script won’t generalize to complicated pdfs. But we’re safe here because a scanner will probably apply the same transformation to each page.

Rasterize everything in pdf except text

Wednesday, October 19th, 2016

I had an issue including a PDF with transparency as a subfigure to another PDF. This lead me down a dark path of trying to rasterize everything in a pdf except for the text. I tried rasterizing everything and just running OCR on top of the text but OCR-ized selection is weird and the text recognition wasn’t perfect. Not to mention that would have been a really round about way to solve this.

Here’s the insane pipeline I settled on:

  • open the PDF in illustrator
  • save as input.svg, under options “use system fonts”,
  • run ./rasterize-everything-but-text.sh input.svg output.svg (see below)
  • open output.svg in illustrator, save as raster-but-text.pdf

The bash script ./rasterize-everything-but-text.sh is itself an absurd, likely very fragile text manipulation and rasterization of the .svg files:

#!/bin/bash
#
# Usage:
#
#     rasterize-everything-but-text.sh input.svg output.svg
#
input="$1"
output="$2"
# suck out header from svg file
header=`dos2unix < $input | tr '\n' '\00' | sed 's/\(.*<svg[^<]*>\).*/\1/' | tr '\00' '\n'`
# grab all text tags
text=`cat $input | grep     "<text.*"`
# create svg file without text tags
notextsvg="no-text.svg"
notextpng="no-text.png"
cat $input | grep  -v "<text.*" > $notextsvg
# convert to png
rsvg-convert -h 1000 $notextsvg > $notextpng
# convert back to svg (containing just <image> tag)
rastersvg="raster.svg"
convert $notextpng $rastersvg
# extract body (image tag)
body=`dos2unix < $rastersvg | tr '\n' '\00' | sed 's/\(.*<svg[^<]*>\)\(.*\)<\/svg>/\2/' | tr '\00' '\n'`
# piece together original header, image tag, and text
echo "$header
$body
$text
</svg>" > "$output"
# Fix image tag to have same size as document
dim=`echo "$header" | grep -o 'width=".*" height="[^"]*"' | tr '"' "'"`
sed -i '' "s/\(image id=\"image0\" \)width=\".*\" height=\"[^\"]*\"/\1$dim/" $output

Get values of solve variables as list in maple

Wednesday, August 21st, 2013

I issue a solve command in maple like this one:

s:= solve({2*x1-x2=0,x1+x2=1},{x1,y2});

and I see as output:

s := {x1 = 1/3, x2 = 2/3}

I’d like to get the results as an ordered list. I’ve already carefully chosen my variable names so that they’re ordered lexicographically, which is respected by the solve output. I tried to use convert, table, entries, indices to no avail. What I came up with is:

convert(map(rhs,s),list);

which produces:

[1/3, 2/3]

This can be easily dumped into matlab for processing.

Note: If you use square brackets in your solve:

s:= solve({2*x1-x2=0,x1+x2=1},[x1,x2]);

Then the map above will give you an error like this one:

Error, invalid input: rhs received [x1 = 1/3, x2 = 2/3], which is not valid for its 1st argument, expr

You need to get the first index of the solve output:

map(rhs,s[1]);

High resolution images from rijksmuseum

Monday, June 3rd, 2013

Here’s a php script to download and stitch together high resolution images from the rijksmuseum:


<?php

# http://www.php.net/manual/en/function.json-decode.php#107107
function prepareJSON($input) {
    
    //This will convert ASCII/ISO-8859-1 to UTF-8.
    //Be careful with the third parameter (encoding detect list), because
    //if set wrong, some input encodings will get garbled (including UTF-8!)
    $imput = mb_convert_encoding($input, 'UTF-8', 'ASCII,UTF-8,ISO-8859-1');
    
    //Remove UTF-8 BOM if present, json_decode() does not like it.
    if(substr($input, 0, 3) == pack("CCC", 0xEF, 0xBB, 0xBF)) $input = substr($input, 3);
    
    return $input;
}

$url = $argv[1];
$url = preg_replace("/^https/","http",$url);

echo "Getting title...";
if(preg_match("/\/en\/collection\//",$url))
{
  $contents = file_get_contents($url);
  preg_match('/objectNumber : "([^"]*)"/',$contents,$matches);
  $id = $matches[1];
  preg_match('/objectTitle : "([^"]*)"/',$contents,$matches);
}else
{
  $offset = preg_replace("/^.*,([0-9]*)$/","\\1",$url);
  # extract id
  $id = preg_replace("/^.*\//","",$url);
  $id = preg_replace("/,.*$/","",$id);
  #$id="SK-A-147";
  $title_url = preg_replace("/search\/objecten\?/",
    "api/search/browse/items?offset=".$offset."&count=1&",$url);
  $title_url = preg_replace("/#\//", "&objectNumber=",$title_url);
  $title_url = preg_replace("/,[0-9]*$/", "",$title_url);
  $contents = file_get_contents($title_url);
  #$contents = file_get_contents("objecten.js");
  $items = json_decode(prepareJSON($contents), true);
  $title = $items["setItems"][0]["ObjectTitle"];
  $title = preg_replace("/^.*f.principalMaker.sort=([^#]*)#.*$/","\\1",$url).
    "-".$title;
}
$title = html_entity_decode($matches[1], ENT_COMPAT, 'utf-8');
$title = iconv("utf-8","ascii//TRANSLIT",$title);
$title = preg_replace("/[^A-z0-9]+/","-",$title);
$final = strtolower($title);
echo "\n";

echo "Getting images...";
$contents = file_get_contents(
  "http://q42imageserver.appspot.com/api/getTilesInfo?object_id=".$id);
#$contents = file_get_contents("levels.js");


$levels = json_decode(prepareJSON($contents), true);
$levels = $levels{"levels"};

$list="";
foreach( $levels as $level)
{
  if($level{"name"} == "z0")
  {
    $tiles = $level{"tiles"};
    // Obtain a list of columns
    foreach ($tiles as $key => $row) {
      $xs[$key]  = $row['x'];
      $ys[$key] =  $row['y'];
    }

    // Sort the data with volume descending, edition ascending
    // Add $data as the last parameter, to sort by the common key
    array_multisort($ys, SORT_ASC, $xs, SORT_ASC, $tiles);

    $tile_x = 0;
    $tile_y = 0;
    foreach( $tiles as $tile)
    {
      $x = $tile{"x"};
      $y = $tile{"y"};
      $tile_x = max($tile_x,intval($x)+1);
      $tile_y = max($tile_y,intval($y)+1);
      $img = "z0-$x-$y.jpg";
      $url = $tile{"url"};
      echo "(".$x.",".$y.") ";
      file_put_contents($img, file_get_contents($url));
      $list .= " ".$img;
    }
    break;
  }
}
echo "\n";
echo "Composing images...";
`montage $list -tile ${tile_x}x${tile_y} -geometry +0+0 -quality 100 $final.jpg`;
echo "\n";
echo $final.".jpg\n";

echo "Clean up...";
`rm -f $list`;
echo "\n";
?>

Then you can call the script from the command line with something like:


php rijksmuseum.php "https://www.rijksmuseum.nl/en/collection/NG-2011-6-24"

Buried inside of that script is also a nice way to clean up strings for use as filenames:


$title = html_entity_decode($matches[1], ENT_COMPAT, 'utf-8');
$title = iconv("utf-8","ascii//TRANSLIT",$title);
$title = preg_replace("/[^A-z0-9]+/","-",$title);
$final = strtolower($title);

Composite thumbnails of multipage pdf into stacked image

Monday, April 22nd, 2013

Here’s a bash script that takes a multipage pdf and produces a stack of thumbnails with nice shadows: multipage pdf thumbnail stack Save this in multipagethumb.sh: #!/bin/bash

# montage -gravity center null: null: 'supplemental_opt.pdf' null: null:
# -thumbnail 128x128 -sharpen 10 -bordercolor white -border 0 -background none
# +polaroid -set label '' -background Transparent -tile x1 -geometry -0+64
# -reverse -flop png:- | convert png:- -flop -trim output.png
if [ $# -lt 2 ]
  then
  echo "Usage:"
  echo "  ./multipagethumb input.pdf output.png"
  exit 1
fi

output="${2%.*}"
## this occassionally gives a concatentation of number of pages number of pages
## times: 10101010101010101010
#n=`identify -format %n $1`
n=`pdftk $1 dump_data | grep NumberOfPages | sed 's/[^0-9]*//'`

# 88+12+30*16 = 580
w="88"
x="30"
y="3"
for p in $(seq 1 $n)
do
  p=`echo "$p-1"|bc`
  echo "convert $1[$p] -flatten -thumbnail ${w}x -bordercolor none -border 0 \( +clone \
    -background none -shadow 80x3+2+2 \) +swap -background none -layers \
    merge +repage  $output-$p.png"
  convert $1[$p] -flatten -thumbnail ${w}x -bordercolor none -border 0 \( +clone \
    -background none -shadow 80x3+2+2 \) +swap -background none -layers \
    merge +repage  $output-$p.png
  if [[ $p == "0" ]]
  then
    echo "convert $output-$p.png $2"
    convert $output-$p.png $2
  else
    echo "convert $output.png -gravity SouthEast -background none -splice ${x}x${y} $output.png"
    convert $output.png -gravity SouthEast -background none -splice ${x}x${y} $output.png
    echo "composite -compose dst-over $output-$p.png $output.png -gravity SouthEast $output.png"
    composite -compose dst-over $output-$p.png $output.png -gravity SouthEast $output.png
  fi
  rm $output-$p.png
done

Then issue:

./multipagethumb.sh input.pdf output.png

Note: You can achieve something similar with the montage and +polaroid command but it was difficult to achieve diagonal stacking and the correct order.

Round images to width and heights divisible by 2, by cropping

Saturday, November 10th, 2012

To make an h264 movie from a bunch of images using ffmpeg I need all the images to have size dimensions divisible by two. Here’s a little one-liner to crop a bunch of images to the nearest size divisible by 2:


for file in *.png; do convert -crop `identify -format "(%[fx:w]/2)*2" $file | bc`x`identify -format "(%[fx:h]/2)*2" $file | bc`+0+0 $file cropped_$file; done

I suppose using mogrify would potentially be faster. But I’m not sure how to introduce the rounding.

High quality desktop backgrounds script

Tuesday, July 10th, 2012

Here’s a script I use to create 2560xx1440 high quality backgrounds from images:


mkdir -p output
mogrify -fuzz 25% -trim -resize 2560x1440 -background white -gravity center -extent 2560x1440 -format jpg -quality 100 -path output *.jpg

It first creates the output directory than converts all jpgs in the directory to be 2560 by 1440 images trimmed of their borders then padded with white.

imagemagick animated gif layers showing through transparency

Tuesday, May 1st, 2012

Today I finally got around to supporting screen dumps from my opengl apps with transparency. I’ve been wanting to do this for a while, so that I can easily make animated gifs that can be overlaid on a background image/video. I ran into a weird problem using imagemagick’s convert tool. I was dumping every frame to a file called: screencapture-01.tga, screencapture-02.tga, … and then calling:

convert screencapture-*.tga screencapture.gif
 This made an animated gif with transparency, but each frame showed the previous frames behind it. Resulting in something like this: 

worm animation with wrong transparency The magic keyword seems to be “dispose” and calling the following fixed my problem:

convert -dispose 2 screencapture-*.tga screencapture.gif
 which results in: 

worm animation with correct transparency Then I can underlay a background image and get something like: worm animation with correct transparency over clouds Update: To automatically trim the image in the same command use:

convert -dispose 2 screencapture-*.tga -coalesce -repage 0x0 -trim +repage screencapture.gif
convert -dispose 2 screencapture-00*.tga -coalesce -trim -layers TrimBounds screencapture.gif

Resizing animated gifs scaling issue

Monday, February 27th, 2012

When trying to resize animated gifs into thumbnails using imagemagick’s convert, I noticed that with certain gifs the first frame of the animation would resize correctly but the subsequent frames would remain unscaled or only partially scaled. The command I was using was:


convert input.gif -thumbnail x200 -resize '200x<' -resize 50% -gravity center -crop 100x100+0+0 +repage output.gif

For non-animated gifs, this correctly makes a 100 by 100 thumbnail. For animated gifs where each frame is sufficiently different this creates a animated thumbnail. But for animations whose frames differ only slightly it seems to resize each "difference frame" which may not have the same bounding box. To correct this just add the command "-coalesce" to the beginning of the imagemagick command sequence. Like this:


convert input.gif -coalesce -thumbnail x200 -resize '200x<' -resize 50% -gravity center -crop 100x100+0+0 +repage output.gif

TexMapPreview: simple texture mapping utility

Wednesday, September 21st, 2011

texmappreview simple texture mapping utility working on woody I posted the source and binary of TexMapPreview. It’s a little texture mapping utility I’ve been using to visualize texture maps on the meshes I deform. It takes as input a mesh (with texture coordinates) and an (texture) image. Then it can either write the visualization of the texture mapped mesh to an output file or display it in a GLUT window. Glut is the only dependency. TexMapPreview itself can only read and write .tga image files. But, I include a bash script wrapper which uses ImageMagick’s convert tool to enable reading and writing of all sorts of file formats (.png, .jpg, .tiff, whatever convert can read/write).

We thank Scott Schaefer for providing the wooden gingerbread man image from “Image Deformation Using Moving Least Squares”.