Posts Tagged ‘bash’

Unwrap hard-wrapped text via command line

Monday, October 24th, 2016

I searched for a bash/sed/tr combination to unwrap hard 80-character per line text like:

Lorem ipsum dolor sit amet, consectetur adipiscing elit. Sed imperdiet felis
suscipit odio fringilla, pharetra ullamcorper felis interdum. Aenean ut mollis
est. Maecenas mattis convallis enim. Nullam eget maximus mi. Vivamus nec risus
suscipit, facilisis nunc at, eleifend massa. Aliquam erat volutpat. Aenean
malesuada velit vel libero cursus, et aliquam nibh imperdiet. Maecenas
ultrices, orci eu posuere commodo, leo diam ultricies velit, sed hendrerit odio
leo sed erat.

Pellentesque at enim id lacus tristique blandit. Duis at suscipit odio, eu
ullamcorper lorem. Interdum et malesuada fames ac ante ipsum primis in
faucibus. Sed non massa urna. Cum sociis natoque penatibus et magnis dis
parturient montes, nascetur ridiculus mus. Etiam blandit metus eget sem
consequat tincidunt. Vivamus auctor pharetra sapien non iaculis. Curabitur quis
fermentum est. Mauris laoreet augue finibus, rhoncus enim et, finibus nibh.
Praesent varius neque mi, id tempor massa facilisis eget. Nulla consectetur,
massa sed tempus laoreet, nisl purus posuere ipsum, eu gravida purus arcu nec
ante.

Pellentesque dapibus ultrices purus, et accumsan sapien ultrices a. Nulla
ultricies odio sit amet tellus tempus, et gravida dui feugiat. Aenean pretium
in lectus vitae molestie. Proin in rhoncus eros. Donec in ultricies nisi,
volutpat ultrices lacus. Suspendisse gravida hendrerit ipsum vitae feugiat.
Phasellus pharetra malesuada orci et euismod. Proin luctus nunc sit amet
gravida pulvinar. Nam quis dapibus mauris. Nulla accumsan nisl vel turpis
lobortis vulputate. Integer sem orci, lobortis ut blandit quis, consequat eget
purus. Fusce accumsan magna eu mi placerat rhoncus.

Into single lines per paragraph, like this:

Lorem ipsum dolor sit amet, consectetur adipiscing elit. Sed imperdiet felis suscipit odio fringilla, pharetra ullamcorper felis interdum. Aenean ut mollis est. Maecenas mattis convallis enim. Nullam eget maximus mi. Vivamus nec risus suscipit, facilisis nunc at, eleifend massa. Aliquam erat volutpat. Aenean malesuada velit vel libero cursus, et aliquam nibh imperdiet. Maecenas ultrices, orci eu posuere commodo, leo diam ultricies velit, sed hendrerit odio leo sed erat.

Pellentesque at enim id lacus tristique blandit. Duis at suscipit odio, eu ullamcorper lorem. Interdum et malesuada fames ac ante ipsum primis in faucibus. Sed non massa urna. Cum sociis natoque penatibus et magnis dis parturient montes, nascetur ridiculus mus. Etiam blandit metus eget sem consequat tincidunt. Vivamus auctor pharetra sapien non iaculis. Curabitur quis fermentum est. Mauris laoreet augue finibus, rhoncus enim et, finibus nibh. Praesent varius neque mi, id tempor massa facilisis eget. Nulla consectetur, massa sed tempus laoreet, nisl purus posuere ipsum, eu gravida purus arcu nec ante.

Pellentesque dapibus ultrices purus, et accumsan sapien ultrices a. Nulla ultricies odio sit amet tellus tempus, et gravida dui feugiat. Aenean pretium in lectus vitae molestie. Proin in rhoncus eros. Donec in ultricies nisi, volutpat ultrices lacus. Suspendisse gravida hendrerit ipsum vitae feugiat. Phasellus pharetra malesuada orci et euismod. Proin luctus nunc sit amet gravida pulvinar. Nam quis dapibus mauris. Nulla accumsan nisl vel turpis lobortis vulputate. Integer sem orci, lobortis ut blandit quis, consequat eget purus. Fusce accumsan magna eu mi placerat rhoncus.

This is useful, for example, when editing a plain text entry with vi that is ultimately pasted into a web form.

I couldn’t find a good unix tools solution so I settled on a python script I found. Here’s the slightly edited version I save in unwrap:

#!/usr/bin/env python

import sys;paragraph = []
for line in sys.stdin:
   line = line.strip()
   if line:
      paragraph.append(line)
   else:
      print ' '.join(paragraph).replace('  ', ' ')
      paragraph = []
print ' '.join(paragraph).replace('  ', ' ')

Then I call it with

unwrap < my-text-file.txt

Rasterize everything in pdf except text

Wednesday, October 19th, 2016

I had an issue including a PDF with transparency as a subfigure to another PDF. This lead me down a dark path of trying to rasterize everything in a pdf except for the text. I tried rasterizing everything and just running OCR on top of the text but OCR-ized selection is weird and the text recognition wasn’t perfect. Not to mention that would have been a really round about way to solve this.

Here’s the insane pipeline I settled on:

  • open the PDF in illustrator
  • save as input.svg, under options “use system fonts”,
  • run ./rasterize-everything-but-text.sh input.svg output.svg (see below)
  • open output.svg in illustrator, save as raster-but-text.pdf

The bash script ./rasterize-everything-but-text.sh is itself an absurd, likely very fragile text manipulation and rasterization of the .svg files:

#!/bin/bash
#
# Usage:
#
#     rasterize-everything-but-text.sh input.svg output.svg
#
input="$1"
output="$2"
# suck out header from svg file
header=`dos2unix < $input | tr '\n' '\00' | sed 's/\(.*<svg[^<]*>\).*/\1/' | tr '\00' '\n'`
# grab all text tags
text=`cat $input | grep     "<text.*"`
# create svg file without text tags
notextsvg="no-text.svg"
notextpng="no-text.png"
cat $input | grep  -v "<text.*" > $notextsvg
# convert to png
rsvg-convert -h 1000 $notextsvg > $notextpng
# convert back to svg (containing just <image> tag)
rastersvg="raster.svg"
convert $notextpng $rastersvg
# extract body (image tag)
body=`dos2unix < $rastersvg | tr '\n' '\00' | sed 's/\(.*<svg[^<]*>\)\(.*\)<\/svg>/\2/' | tr '\00' '\n'`
# piece together original header, image tag, and text
echo "$header
$body
$text
</svg>" > "$output"
# Fix image tag to have same size as document
dim=`echo "$header" | grep -o 'width=".*" height="[^"]*"' | tr '"' "'"`
sed -i '' "s/\(image id=\"image0\" \)width=\".*\" height=\"[^\"]*\"/\1$dim/" $output

MAC Address Spoofing on Mac OS X for unlimited free hour passes on xfinitywifi and CableWiFi networks

Friday, July 8th, 2016

From what I gather, xfinity charges people to “rent” wifi routers and then uses that hardware to host pay-per-use public wifi networks. These networks are usually named xfinitywifi or CableWiFi. Every 24 hours each MAC Address is granted a “$0.00 Complimentary Free Pass”:

  1. CLICK I am not an XFINITY customer
  2. CLICK Sign Up
  3. CHOOSE $0.00 for a Complimentary Hour Pass
  4. CLICK Start Session

To “spoof” a new wifi MAC Address on MAC OS X, one can issue:

ifconfig en0 | grep ether

This will spit out a number like: 70:51:81:c1:3f:6e. Record this number. To set your MAC address to a random yet valid address use:

sudo ifconfig en0 ether `openssl rand -hex 6 | sed 's/\(..\)/\1:/g; s/.$//'`

Then, later, if you want to return to your old address issue:

ifconfig en0 ether 70:51:81:c1:3f:6e

It seems that System Preferences > Network > Advanced > Hardware will reveal your original MAC address in case you forget it.

You can also place these commands as aliases in your ~/.profile:

alias random_mac="ifconfig en0 ether \`openssl rand -hex 6 | sed 's/\(..\)/\1:/g; s/.$//'\`"
alias reset_mac="ifconfig en0 ether 70:56:81:c0:3f:6d"
alias sudo='sudo '

This all assumes en0 is your wifi location. It might be en1 on other macs.

Linker error on freshly brewed python install

Wednesday, May 18th, 2016
from PIL import Image
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python2.7/site-packages/PIL/Image.py", line 119, in <module>
    import io
  File "/usr/local/Cellar/python/2.7.11/Frameworks/Python.framework/Versions/2.7/lib/python2.7/io.py", line 51, in <module>
    import _io
ImportError: dlopen(/usr/local/Cellar/python/2.7.11/Frameworks/Python.framework/Versions/2.7/lib/python2.7/lib-dynload/_io.so, 2): Symbol not found: __PyCodecInfo_GetIncrementalDecoder
  Referenced from: /usr/local/Cellar/python/2.7.11/Frameworks/Python.framework/Versions/2.7/lib/python2.7/lib-dynload/_io.so
  Expected in: flat namespace
 in /usr/local/Cellar/python/2.7.11/Frameworks/Python.framework/Versions/2.7/lib/python2.7/lib-dynload/_io.so

Apparently this is happening because bash was still confused about which python to use after brew install python. I issued:

hash -r python

to fix the problem. But just using a new shell would also work.

Make the most recent tex document in the current directory and open it

Wednesday, January 13th, 2016

Here’s a little bash script to compile (pdflatex, bitex, 2*pdflatex,etc.) the most recent .tex file in your current directory that contains begin{document} (i.e. the main document):

#!/bin/bash
if [ -z "$LMAKEFILE" ]; then
  echo "Error: didn't find LMAKEFILE environment variable"
  exit 1
fi
TEX=$( \
  grep -Il1dskip "begin{document}" *.tex | \
  xargs stat -f "%m %N" | \
  sort -r | \
  head -n 1 | \
  sed -e "s/^[^ ]* //")
BASE="${TEX%.*}"
if [ -z "$TEX" ]; then
  echo "Error: Didn't find begin{document} in any .tex files"
  exit 1
fi
make -f $LMAKEFILE $BASE && open $BASE.pdf

Simply use it:

texmake

Save As Optimized PDF using Acrobat Pro via the command line

Saturday, January 2nd, 2016

Here’s a tremendously hacky way to automate the procedure of optimizing a PDF using Acrobat Pro (with default settings) from the command line. It’s an applescript sending mouse clicks and keyboard signals so don’t get too excited.

However, I’m doing this all the time and it will hopefully save clicking through menus.

#!/usr/bin/osascript
on run argv
    if (count of argv) < 2 then
        do shell script "echo " & "\"optimizepdf path/to/input.pdf simple-output-name\""
    else
        set p to item 1 of argv
        set out_name to item 2 of argv
        set abs to do shell script "[[ \"" & p & "\" = /* ]] && echo \"" & p & "\" || echo \"$PWD/\"" & p & "\"\""
        set a to POSIX file abs
        tell application "Adobe Acrobat Pro"
            activate
            open a
            tell application "System Events"
                click menu item "Optimized PDF..." of ((process "Acrobat")'s (menu bar 1)'s ¬
                    (menu bar item "File")'s (menu "File")'s ¬
                    (menu item "Save As")'s (menu "Save As"))
                tell process "Acrobat"
                    keystroke return
                    keystroke out_name
                    keystroke return
                    keystroke "r" using {command down}
                end tell
            end tell
            close document 1
        end tell
    end if
end run

Then you can run this with something like:

optimizepdf path/to/input.pdf simple-output-name

overwrite warning: this will overwrite the output file (and potentially files named similarly if the keystrokes fail or get garbled).

Oddly, it seems to work fastest if the input document is not already open in acrobat pro.

This code above is written for Acrobat Pro Version 10.1.16.

Update: Here’s a legacy version for Acrobat Pro Version 9.5.1

#!/usr/bin/osascript
on run argv
    if (count of argv) < 2 then
        do shell script "echo " & "\"optimizepdf path/to/input.pdf simple-output-name\""
    else
        set p to item 1 of argv
        set out_name to item 2 of argv
        set abs to do shell script "[[ \"" & p & "\" = /* ]] && echo \"" & p & "\" || echo \"$PWD/\"" & p & "\"\""
        set a to POSIX file abs
        tell application "Adobe Acrobat Pro"
            activate
            open a
            tell application "System Events"
                click menu item "PDF Optimizer..." of ((process "Acrobat")'s (menu bar 1)'s ¬
                    (menu bar item "Advanced")'s (menu "Advanced"))
                tell process "Acrobat"
                    keystroke return
                    keystroke out_name
                    keystroke return
                    keystroke "r" using {command down}
                end tell
            end tell
            close document 1
        end tell
    end if
end run

Note: You may have to enable scripts to use keystrokes.

Determine how much space is used by .git/.svn/.hg in a directory

Thursday, November 12th, 2015

Here’s a nasty little bash one-liner to determine how much space is being “wasted” but .svn/ or .git/ or .hg/ repos in your current directory:

du -k | sed -nE 's/^([0-9]*).*\.(svn|git|hg)$/\1/p' | awk '{s+=$1*1024} END {print s}' | awk '{ sum=$1 ; hum[1024**3]="Gb";hum[1024**2]="Mb";hum[1024]="Kb"; for (x=1024**3; x>=1024; x/=1024){ if (sum>=x) { printf "%.2f %s\n",sum/x,hum[x];break } }}'

Re-order id3 track numbers of multi-disc audiobook

Thursday, September 3rd, 2015

Yesterday I was floundering trying to get iTunes and the iPhone iBook app to iFunctionCorrectly. I have an audiobook composed of multiple mp3s ripped from multiple cds. The track names look like:

1-01-madame-bovary-1a.mp3
1-02-madame-bovary-1b.mp3
1-03-madame-bovary-1c.mp3
...
2-01-madame-bovary-2a.mp3
2-02-madame-bovary-2b.mp3
2-03-madame-bovary-2c.mp3
...
11-01-madame-bovary-11a.mp3
11-02-madame-bovary-11b.mp3
11-03-madame-bovary-11c.mp3
...

This is already unfortunate because lexicographically they sort to:

1-01-madame-bovary-1a.mp3
1-02-madame-bovary-1b.mp3
1-03-madame-bovary-1c.mp3
...
11-01-madame-bovary-11a.mp3
11-02-madame-bovary-11b.mp3
11-03-madame-bovary-11c.mp3
...
2-01-madame-bovary-2a.mp3
2-02-madame-bovary-2b.mp3
2-03-madame-bovary-2c.mp3
...

iTunes deals with this reasonably well. The bigger problem was the id3 tags of these files. All files had the same artist and album.

1-01-madame-bovary-1a.mp3
1-02-madame-bovary-1b.mp3
1-03-madame-bovary-1c.mp3
...

had track numbers 1/30, 2/30, 3/30 etc. and all had part number 1/11 (part 1 of an 11 part set). However, iBook refused to sort these by part number then track number. Instead, only sorting by track number, getting this:

1-01-madame-bovary-1a.mp3
2-01-madame-bovary-2a.mp3
...
11-01-madame-bovary-11a.mp3
1-02-madame-bovary-1b.mp3
2-02-madame-bovary-2b.mp3
11-02-madame-bovary-11b.mp3
...
1-03-madame-bovary-1c.mp3
2-03-madame-bovary-2c.mp3
11-03-madame-bovary-11c.mp3
...

I tried converting everything into a single giant mp3 file using ffmpeg:

ls *.mp3 | sort -n | sed -e "s/^\(.*\)$/file '\1'/" > concat.txt
ffmpeg -f concat -i concat.txt -y -vn -acodec copy -threads 3 madame-bovary.mp3

or a single m4b file:

ffmpeg -f concat -i concat.txt -y -vn -acodec libfaac -ab 64k -ar 44100 -threads 3 -f mp4 madame-bovary.m4b

but these files were 900MB and 350MB and iBooks seems to choke on that size.

Finally, my solution is to re-order all of the track numbers to increment across parts. I achieved this with a little bash script, I saved in track_number_explode.sh

#!/bin/bash
USAGE="track_number_explode [path to directory of mp3s]"
if [ -z "$1" ]; then
    echo "Usage: $USAGE"
    exit 1
fi
OLD_DIR=$(pwd)
cd "$1"
MP3S=$(ls *.mp3 | sort -n)
N=$(echo -e "$MP3S" | wc -l)
# cheap way to clear leading spaces
N=$((N+0))
T="1"
SAVEIFS=$IFS
IFS=$(echo -en "\n\b")
for mp3 in $MP3S; do
  id3v2 --id3v2-only -T "$T/$N" "$mp3"
  # strip id3.1 as it can handle large track numbers
  id3v2 -s "$mp3" &>/dev/null
  T=$((T+1))
done
IFS=$SAVEIFS
cd "$OLD_DIR"

Then I run this on the directory containing my mp3s

track_number_explode.sh madame-bovary/

Finally I load these onto the iPhone, select all of them, right-click and choose Get Info > Options and set media kind to Audiobook.

Split a long mp3 audio file into 3 min files

Wednesday, July 29th, 2015

After some frustration trying to get mp3splt to work, I caved in and wrote a script to split apart a large audio file into many 3min chunks. Save this in mp3split.sh:

#!/bin/bash
big="$1"
duration_stamp=$(ffmpeg -i "$big" 2>&1 | grep Duration | sed 's/^.*Duration: *\([^ ,]*\),.*/\1/g')
title=$(ffmpeg -i "$big" 2>&1  | grep "title *:" | sed 's/^.*title *: *\(.*\)/\1/g')
# get minutes as a raw integer number (rounded up)
prefix=$(basename "$big" .mp3)
echo $duration_stamp
mins=$(echo "$duration_stamp" | sed 's/\([0-9]*\):\([0-9]*\):\([0-9]*\)\.\([0-9]*\)/\1*60+\2+\3\/60+\4\/60\/100/g' | bc -l | python -c "import math; print int(math.ceil(float(raw_input())))")
ss="0"
count="1"
total_count=$(echo "$mins/3+1" | bc)
while [ "$ss" -lt "$mins" ]
do
  zcount=$(printf "%05d" $count)
  ss_hours=$(echo "$ss/60" | bc)
  ss_mins=$(echo "$ss%60" | bc)
  ss_stamp=$(printf "%02d:%02d:00" $ss_hours $ss_mins)
  ffmpeg -i "$big" -acodec copy -t 00:03:00 -ss $ss_stamp -metadata track="$count/$total_count" -metadata title="$title $zcount" "$prefix-$zcount.mp3" 
  ss=$[$ss+3]
  count=$[$count+1]
done

The execute mp3split.sh my-long-file.mp3. This will output a sequence of files:

my-long-file-00001.mp3
my-long-file-00002.mp3
my-long-file-00003.mp3
my-long-file-00004.mp3
...

Each will retain the meta data from the original file except the file number will be appended to the track name and the track number will be set accordingly (i.e. this will work well for splitting enormous audiobook files into file lists that play in the correct sequence on an iphone).

Note: mp3splt really seems like the right tool for this. It supposedly has fancy features like silence detection and presumably won’t reload the file for each new split.

Extract full resolution (original) gif image (or other media) from a power point file

Sunday, March 29th, 2015

I’d lost the original to an animated gif that I’d embedded in a previous talk’s powerpoint slide. I tried clicking “Save image as…” but this gave me a lower resolution, scaled version without the animation. Seems there is a well known trick to finding original media in modern Microsoft office files. The .*x files are actually zipped directories. So unzip them to a folder using something like:

unzip myfile.pptx -d myfile/

Then you should find your media files somewhere in this directory. I found mine in: myfile/ppt/media/.

source