Posts Tagged ‘image’

Convert two-page color scan of book into monochrome single pdf

Friday, November 25th, 2016

Two days in a row now I’ve had to visit the physical library to retrieve an old paper. Makes me feel very authentic as an academic. Our library has free scanning facilities, but the resulting PDF will have a couple problems. If I’m scanning a book then each page of the pdf actually contains 2 pages of the book. Depending on the scanner settings, I might also accidentally have my 2 pages running vertically instead of horizontally. Finally, if I forgot to set the color settings on the scanner, then I get a low-contrast color image instead of a high-contrast monochrome scan.

Here’s a preview of pdf of an article from a book I scanned that has all these problems: scanned low contrast color pdf

If this pdf is in input.pdf then I call the following commands to create output.pdf:

pdfimages input.pdf .scan
mogrify -format png -monochrome -rotate 90 -crop 50%x100% .scan*
convert +repage .scan*png output.pdf
rm .scan*

output monochrome pdf

I’m pretty happy with the output. There are some speckles, but the simple -monochrome flag does a fairly good job.

I use Adobe Acrobat Pro to run OCR so that the text is selectable (haven’t found a good command line solution for that, yet).

Note: I think the -rotate 90 is needed because the images are stored rotated by -90 degrees but the input.pdf is compositing them after rotation. This hints that this script won’t generalize to complicated pdfs. But we’re safe here because a scanner will probably apply the same transformation to each page.

Closed mesh of piece-wise constant height field surface from an image

Wednesday, May 6th, 2015

I pushed a little function box_height_field.m to gptoolbox. This function creates a height field from an image where each pixel becomes an extruded square (rather than just a point/vertex). There are two modes, one that’s fast, vectorized version which doesn’t add vertices so that the mesh is closed (though the underlying surface will still be “water-tight”). And a slower version which really creates a perfectly closed height field. Heres’ the result of the second one on the red-channel of the Hans Hass image:

im = im2double(imresize(rgb2gray(imread('hans-hass.jpg')),0.1));
[V,F] = box_height_field(im);

box height field

Here’s the same result but computed after quantizing the colors:

im = round(im/0.25)*0.25;
[V,F] = box_height_field(im);

box height field

You can clearly see the piecewise-constant regions. Using remesh_planar_patches you can reduce the number of version with losing any information:

[W,G] = remesh_planar_patches(V,F);

box height field

Old-style GPGPU reduction, average pixel color

Tuesday, April 28th, 2015

Here’s a little demo which computes the average pixel value of an OpenGL rendering. As a sanity check I compute the average value on the cpu-side using glReadPixels and then compare to computing the average value using a “mip-map” style, ping-pong, texture-buffer GPGPU reduction. Finally I render the buffers for fun.

I’m using yimg just to read in a .png file to render as an example.

On my weak little laptop, the GPU code is about 30 times faster. Can’t shrug at that! Rookie mistake. Made timings in debug mode. In release there’s hardly a speed up : – (

I was careful to compute the average in a coherent way (the images get progressively blurred out, rather than averaging the image quadrants recursively). This would be useful if, say, computing the average of 100 renderings. You could render them into a 10s x 10s texture and run the GPU reduction until the result is just a 10×10 image containing the 100 results. That’d only require a single final call to glReadPixels (rather than calling glReadPixels to read the final single pixel result of each reduction).

#include <YImage.hpp>
#include <YImage.cpp>
#include <GLUT/glut.h>
#include <iostream>
#include <string>
#include <iomanip>

// Size of image rounded up to next power of 2
size_t s;
// Shader program for doing reduction
GLuint reduction_prog_id;
// Need two textures and buffers to ping-pong
GLuint tex_id[] = {0,0};
GLuint fbo_id[] = {0,0};
GLuint dfbo_id[] = {0,0};
// Image (something to render in the first place)
std::string filename("hockney-512.png");
YImage yimg;

void init_render_to_texture(
  const size_t width,
  const size_t height,
  GLuint & tex_id,
  GLuint & fbo_id,
  GLuint & dfbo_id)
  using namespace std;
  // Delete if already exists
  glGenTextures(1, &tex_id);
  glBindTexture(GL_TEXTURE_2D, tex_id);
  //NULL means reserve texture memory, but texels are undefined
  glTexImage2D(GL_TEXTURE_2D, 0, GL_RGBA32F_ARB, width, height, 0, GL_BGRA, GL_FLOAT, NULL);
  glBindTexture(GL_TEXTURE_2D, 0);
  glGenFramebuffersEXT(1, &fbo_id);
  glBindFramebufferEXT(GL_FRAMEBUFFER_EXT, fbo_id);
  //Attach 2D texture to this FBO
  glGenRenderbuffersEXT(1, &dfbo_id);
  glBindRenderbufferEXT(GL_RENDERBUFFER_EXT, dfbo_id);
  glRenderbufferStorageEXT(GL_RENDERBUFFER_EXT, GL_DEPTH_COMPONENT24, width, height);
  //Attach depth buffer to FBO (for this example it's not really needed, but if
  //drawing a 3D scene it would be necessary to attach something)
  //Does the GPU support current FBO configuration?
  GLenum status;
  status = glCheckFramebufferStatusEXT(GL_FRAMEBUFFER_EXT);
  assert(status == GL_FRAMEBUFFER_COMPLETE_EXT);
  // Unbind to clean up
  glBindRenderbufferEXT(GL_RENDERBUFFER_EXT, 0);
  glBindFramebufferEXT(GL_FRAMEBUFFER_EXT, 0);

int main(int argc, char * argv[])
    filename = argv[1];
  s = std::max(yimg.width(),yimg.height());
  s |= s >> 1;
  s |= s >> 2;
  s |= s >> 4;
  s |= s >> 8;
  s |= s >> 16;

  glutInitDisplayString("rgba depth double stencil");
      using namespace std;
      // Initialize **both** buffers and set to render into first
      for(int i = 1;i>=0;i--)
        glBindFramebufferEXT(GL_FRAMEBUFFER_EXT, fbo_id[i]);
        glBindRenderbufferEXT(GL_RENDERBUFFER_EXT, dfbo_id[i]);

      // Render something
      glOrtho(0,s,0,s, -10000,10000);
      glDrawPixels(yimg.width(), yimg.height(), GL_RGBA, GL_UNSIGNED_BYTE,;

      // Even the cpu code should use a buffer (rather than the screen)
      GLfloat * rgb = new GLfloat[yimg.width() * yimg.height() * 3];
      glReadPixels(0, 0, yimg.width(), yimg.height(), GL_RGB, GL_FLOAT, rgb);
      // Gather into double: sequential add is prone to numerical error
      double avg[] = {0,0,0};
      for(size_t i = 0;i<yimg.width()*yimg.height()*3;i+=3)
        avg[0] += rgb[i + 0];
        avg[1] += rgb[i + 1];
        avg[2] += rgb[i + 2];
      for_each(avg,avg+3,[](double & c){c/=yimg.width()*yimg.height();});
      delete[] rgb;

      // Size of square being rendered
      assert(((s != 0) && ((s & (~s + 1)) == s)) && "s should be power of 2");
      size_t h = s/2;
      // odd or even ping-pong iteration
      int odd = 1;
      // Tell shader about texel step size
        // Select which texture to draw/compute into
        glBindFramebufferEXT(GL_FRAMEBUFFER_EXT, fbo_id[odd]);
        glBindRenderbufferEXT(GL_RENDERBUFFER_EXT, dfbo_id[odd]);
        // Select which texture to draw/compute from
        // Scale to smaller square
        const float f = 2.*(float)h/(float)s;
        // Draw quad filling viewport with shrinking texture coordinates
        glVertex2f  (-1,-1);
        glVertex2f  (1,-1);
        glVertex2f  (1,1);
        glVertex2f  (-1,1);

        // ping-pong
        odd = 1-odd;
        h = h/2;
      // Read corner pixel of last render
      float px[3];
      glReadPixels(0, 0, 1, 1, GL_RGB, GL_FLOAT, px);
      // Correct for size not power of 2
      for_each(px,px+3,[](float& c){c*=(float)s*s/yimg.width()/yimg.height();});
      cout<<" gpu: "<< px[0]<<" "<< px[1]<<" "<< px[2]<<" "<<endl;
      cout<<" cpu: "<<avg[0]<<" "<<avg[1]<<" "<<avg[2]<<" "<<endl;

      // Purely for vanity, draw the buffers
      for(odd = 0;odd<2;odd++)
        glVertex2f  (-1,-1);
        glVertex2f  (1,-1);
        glVertex2f  (1,1);
        glVertex2f  (-1,1);

  // Vertex shader is a **true** pass-through
  const std::string vertex_shader = R"(
#version 120
void main()
  gl_Position = gl_Vertex;
  gl_TexCoord[0] = gl_MultiTexCoord0;
  // fragment shader sums texture of this pixel and left/bottom neighbors
  const std::string fragment_shader = R"(
#version 120
// size of a half-texel in 1/pixels units: 0.5/(full size)
uniform float dt;
uniform sampler2D texture;
void main()
  gl_FragColor = 0.25*(
    texture2D(texture, gl_TexCoord[0].st - vec2( 0, 0)) +
    texture2D(texture, gl_TexCoord[0].st - vec2(dt, 0)) +
    texture2D(texture, gl_TexCoord[0].st - vec2(dt,dt)) +
    texture2D(texture, gl_TexCoord[0].st - vec2( 0,dt)));

  // Compile and link reduction shaders into program
  const auto & compile_shader = [](const GLint type, const char * str)->GLuint
    GLuint id = glCreateShader(type);
    return id;
  GLuint vid = compile_shader(GL_VERTEX_SHADER,vertex_shader.c_str());
  GLuint fid = compile_shader(GL_FRAGMENT_SHADER,fragment_shader.c_str());
  reduction_prog_id = glCreateProgram();


Extract full resolution (original) gif image (or other media) from a power point file

Sunday, March 29th, 2015

I’d lost the original to an animated gif that I’d embedded in a previous talk’s powerpoint slide. I tried clicking “Save image as…” but this gave me a lower resolution, scaled version without the animation. Seems there is a well known trick to finding original media in modern Microsoft office files. The .*x files are actually zipped directories. So unzip them to a folder using something like:

unzip myfile.pptx -d myfile/

Then you should find your media files somewhere in this directory. I found mine in: myfile/ppt/media/.


“Blockwise”-Matrix multiply each 2d slice in 3d array

Monday, January 5th, 2015

I’m sure I’ve done this before and pretty sure I’ve also posted it here before, but I couldn’t find it.

Suppose you have to 3d-arrays:

A an M by S by K array formed for example by A = cat(3,A1,A2,A3, ... ,AK)
B an S by N by K array formed for example by B = cat(3,B1,B2,B3, ... ,BK)

and you’d like to compute a new 3d array C such that

C an M by N by K array as if formed by C = cat(3, A1 * B1, A2 * B2, ..., AK * BK)

where X*Y is the usual 2d matrix multiply. In my case, K >> M,S,N.

In matlab, a first attempt might be to write a single for loop over the last dimension of size K:

C = zeros(m,n,k);
for k = 1:K
  C(:,:,k) = A(:,:,k) * B(:,:,k);

Matlab’s notorious for loops have gotten better in the last couple years, but this is still slow.

Better is to unroll the 2D loop and take advantage of vectorized, elementwise vector-vector multiplication:

C = zeros(m,n,k);
for i = 1:M
  for j = 1:S
    for k = 1:N
      C(i,k,:) = C(i,k,:) + A(i,j,:).*B(j,k,:);

Slightly better still is to use bsxfun to compute vectorize outer-products in a single for loop

C = zeros(m,n,k);
for j = 1:size(A,2)
  C = C + bsxfun(@times,A(:,j,:),B(j,:,:));

This solution works especially well if S is small compared to the other dimensions.

For (M=S=N=3, K=10000000), these take 60 secs, 13 secs, and 9 secs respectively.

Triangle mesh for image as surface

Sunday, October 26th, 2014

Here’s a little chuck of matlab code I use to make a surface mesh out of an image:

% load image as grayscale
im = rgb2gray(imresize(im2double(imread('hans-hass.jpg')),0.75));
% create triangle mesh grid
[V,F] = create_regular_grid(size(im,2),size(im,1),0,0);
% Scale (X,Y) to fit image
V(:,1) = V(:,1)*size(im,2);
V(:,2) = (1-V(:,2))*size(im,1);
V(:,3) = im(:);

Paste directly from clipboard into matlab image array

Sunday, October 12th, 2014

I find myself often wanting to experiment with images from papers I’m reading. To load the image into matlab properly I should extract it from the pdf, save it in a file and load the file via matlab. Often I skip the first step and just take a screengrab. But I still need to create a dummy file just to load it into matlab.

Here’s a mex function to load images directly from the clipboard. It even maintains all color channels (RGB, CMYK, RGBA, etc.)

Save this in a file paste.h:

#include <cstddef>
// Get image from clipboard as an RGB image
// Outputs:
//   IM  h*w*c list of rgb pixel color values as uint8, running over height,
//     then width, then rgb channels
//   h  height
//   w  width
//   c  channels
bool paste(unsigned char *& IM,size_t & h, size_t & w, size_t & c);

and in

#import "paste.h"
#import <Foundation/Foundation.h>
#import <Cocoa/Cocoa.h>
#import <unistd.h>

bool paste(unsigned char *& IM,size_t & h, size_t & w, size_t & c)
  NSPasteboard *pasteboard = [NSPasteboard generalPasteboard];
  NSArray *classArray = [NSArray arrayWithObject:[NSImage class]];
  NSDictionary *options = [NSDictionary dictionary];
  BOOL ok = [pasteboard canReadObjectForClasses:classArray options:options]; 
    //printf("Error: clipboard doesn't seem to contain image...");
    return false;
  NSArray *objectsToPaste = [pasteboard readObjectsForClasses:classArray options:options];
  NSImage *image = [objectsToPaste objectAtIndex:0];
  NSBitmapImageRep* bitmap = [NSBitmapImageRep imageRepWithData:[image TIFFRepresentation]];
  w = [bitmap pixelsWide];
  h = [bitmap pixelsHigh];
  size_t rowBytes = [bitmap bytesPerRow];
  c = rowBytes/w;
  unsigned char* pixels = [bitmap bitmapData];
  IM = new unsigned char[w*h*c];
  for(size_t y = 0; y < h ; y++)
    for(size_t x = 0; x < w; x++)
      for(size_t d = 0;d<c;d++)
        // For some reason colors are invertex
        IM[y+h*(x+d*w)] = pixels[y*rowBytes + x*c + d];
  [image release];
  return true;

and in impaste.cpp

#ifdef MEX
#include <mex.h>
#include "paste.h"
#include <iostream>

void mexFunction(
  int nlhs, mxArray *plhs[], 
  int nrhs, const mxArray *prhs[])
  unsigned char * IM;
  size_t h,w,c;
    mexErrMsgTxt("Clipboard doesn't contain image.");
      mexErrMsgTxt("Too many output parameters.");
    case 1:
      size_t dims[] = {h,w,c};
      plhs[0] = mxCreateNumericArray(3,dims,mxUINT8_CLASS,mxREAL);
      unsigned char * IMp = (unsigned char *)mxGetData(plhs[0]);
      // Fall through
    case 0: break;


Then compile on Mac OS X using something like:

mex -v -largeArrayDims -DMEX CXX='/usr/bin/clang++' LD='/usr/bin/clang++' LDOPTIMFLAGS='-O ' LDFLAGS='\$LDFLAGS -framework Foundation -framework AppKit' -o impaste impaste.cpp

Notice, I’m forcing the mex compiler to use clang and include the relevant frameworks.

Then you can call from matlab using something like:

IM = impaste();



Interactive image comparison with sliding splitter in MATLAB

Wednesday, April 16th, 2014

Wojciech Jarosz has nice interactive comparisons on his website. I wanted to have a similar feature in matlab. Here’s a little function to do just that:

function [ph,ih] = sliding_comparison(A,B)
  % SLIDING_COMPARISON Create a figure which allows the user to mouse over and
  % move a sliding splitter to compare two images interactively.
  % Inputs:
  %   A  h by w by c left image (should be double)
  %   B  h by w by c right image (should be double)
  % Outputs:
  %   ph  handle to plot of separator line
  %   ih  handle to image line
  function C = split(A,B,x)
    x = max(min(x,size(A,2)),1);
    C = [A(:,1:round(x)-1,:) B(:,round(x):end,:)];

  function onmove(src,ev)
    % get current mouse position, and remember old one
    x = pos(1,1,1);
    if ishandle(ih) && ishandle(ph)
      % Clean up

  w = size(A,2);
  h = size(A,1);
  % `imshow` will clamp but `set(...,'CData',...)` will not
  clamp = @(X) max(min(X,1),0);
  A = clamp(A);
  B = clamp(B);

  x = w/2;
  ih = imshow(split(A,B,x));
  hold on;
    ph = plot([x;x],[0;h],'LineWidth',2);
  hold off;

  old_onmove = get(gcf,'windowbuttonmotionfcn');

So, say you have your image in im and your depth image in D then:

sliding_comparison(im,repmat(D,[1 1 3]))

Then you’ll see something like:


Photoshop-style checkboard background in MATLAB

Monday, February 10th, 2014

Here’s a gnarly one-liner to produce a checkerboard image ch with the same dimensions as you image im in matlab:

ch = repmat(1-0.2*xor((mod(repmat(0:size(im,2)-1,size(im,1),1),8*2)>7),(mod(repmat((0:size(im,1)-1)',1,size(im,2)),8*2)>7)),[1 1 3]);

This is useful for alpha-compositing. So suppose you had an alpha-map A to extract the star from this colorful image im. Then you could composite the star over ch:

imshow([ ...
  ch; ...
  repmat(A,[1 1 3]); ...
  im; ...

checkboard background matlab compositing

Update: Here’s an even gnarlier anonymous function which takes an image im and an alpha map A and produces an image of the composite over the checkboard:

checker = @(im,A) bsxfun(@times,1-A,repmat(1-0.2xor((mod(repmat(0:size(im,2)-1,size(im,1),1),82)>7),(mod(repmat((0:size(im,1)-1)’,1,size(im,2)),8*2)>7)),[1 1 3])) + bsxfun(@times,A,im);

Then you can quickly display your checkboard background images with:


checkboard image matlab

Copy image file to clipboard

Saturday, January 25th, 2014

Here’s a small objective-c program to take an image file and copy it to the clipboard on Mac OS X. Dump this in a file called impbcopy.m:

#import <Foundation/Foundation.h>
#import <Cocoa/Cocoa.h>
#import <unistd.h>
BOOL copy_to_clipboard(NSString *path)
  NSImage * image;
  if([path isEqualToString:@"-"])
    NSFileHandle *input = [NSFileHandle fileHandleWithStandardInput];
    image = [[NSImage alloc] initWithData:[input readDataToEndOfFile]];
    image =  [[NSImage alloc] initWithContentsOfFile:path];
  BOOL copied = false;
  if (image != nil)
    NSPasteboard *pasteboard = [NSPasteboard generalPasteboard];
    [pasteboard clearContents];
    NSArray *copiedObjects = [NSArray arrayWithObject:image];
    copied = [pasteboard writeObjects:copiedObjects];
    [pasteboard release];
  [image release];
  return copied;

int main(int argc, char * const argv[])
  NSAutoreleasePool *pool = [[NSAutoreleasePool alloc] init];
      "Copy file to clipboard:\n    ./impbcopy path/to/file\n\n"
      "Copy stdin to clipboard:\n    cat /path/to/file | ./impbcopy -");
    return EXIT_FAILURE;
  NSString *path= [NSString stringWithUTF8String:argv[1]];
  BOOL success = copy_to_clipboard(path);
  [pool release];
  return (success?EXIT_SUCCESS:EXIT_FAILURE);

Then compile with:

gcc -Wall -g -O3 -ObjC -framework Foundation -framework AppKit -o impbcopy impbcopy.m

And run with something like:

./impbcopy path/to/file.png

or from stdin use:

cat path/to/file.jpg | ./impbcopy -

Update: For completeness, here’s a small program to paste the clipboard image to a file:

#import <Foundation/Foundation.h>
#import <Cocoa/Cocoa.h>
#import <unistd.h>

@interface NSImage(saveAsPNGWithName)
- (void) saveAsPNGWithName:(NSString*) fileName;

@implementation NSImage(saveAsPNGWithName)

- (void) saveAsPNGWithName:(NSString*) fileName
    // Cache the reduced image
    NSData *imageData = [self TIFFRepresentation];
    NSBitmapImageRep *imageRep = [NSBitmapImageRep imageRepWithData:imageData];
    NSDictionary *imageProps = [NSDictionary dictionaryWithObject:[NSNumber numberWithFloat:1.0] forKey:NSImageCompressionFactor];
    imageData = [imageRep representationUsingType:NSPNGFileType properties:imageProps];
    [imageData writeToFile:fileName atomically:NO];        


BOOL paste_from_clipboard(NSString *path)
  NSPasteboard *pasteboard = [NSPasteboard generalPasteboard];
  NSArray *classArray = [NSArray arrayWithObject:[NSImage class]];
  NSDictionary *options = [NSDictionary dictionary];
  BOOL ok = [pasteboard canReadObjectForClasses:classArray options:options]; 
    NSArray *objectsToPaste = [pasteboard readObjectsForClasses:classArray options:options];
    NSImage *image = [objectsToPaste objectAtIndex:0];
    [image release];
    printf("Error: Clipboard doesn't seem to contain an image.\n");
  return ok;

int main(int argc, char * const argv[])
  NSAutoreleasePool *pool = [[NSAutoreleasePool alloc] init];
      "Paste clipboard to file:\n    ./impbcopy path/to/file\n\n");
    return EXIT_FAILURE;
  NSString *path= [NSString stringWithUTF8String:argv[1]];
  BOOL success = paste_from_clipboard(path);
  [pool release];
  return (success?EXIT_SUCCESS:EXIT_FAILURE);

Compile with:

clang -Wall -g -O3 -ObjC -framework Foundation -framework AppKit -o impbpaste impbpaste.m

and run:

./impbpaste foo.png

Update: could also try the command line program pngpaste which has more features.