Posts Tagged ‘string’

Parse floating point numbers from a string in ruby

Monday, April 11th, 2016

Given a string like "v 1.2 4.2342 8.2 -1.0e14" (e.g., from a .obj file), you could use the following ruby line to extract an ordered list of the contained floating point numbers:

line.scan(/[+-]?[0-9]*\.?[0-9]+(?:[eE][-+]?[0-9]+)?/).collect{|s| s.to_f}

which would produce

=> [1.2, 4.2342, 8.2, 100000000000000.0]

Parse a list of numbers without hard coding size

Thursday, March 19th, 2015

I’ve been used to parsing numbers from strings with the c-style sscanf function. So if I need to read in 3 floats into a vector I’d use something like:

vector<float> v(3);
sscanf(string_to_parse,"%f %f %f",&v[0],&v[1],&v[2]);

This is fine if you know at compile time that the vector will be size 3. But if the number isn’t known at compile time, the c-style gets messy with loops.

Here’s a c++-style solution. Admittedly there are also some messy-looking loops, but I still prefer it. It also makes it easy to have a richer delimiter set.

#include <vector>
#include <string>
#include <sstream>
  using namespace std;
  vector<double> vect;
  string delimeters = ", ";
  stringstream ss(string_to_parse);
  double v;
  while(ss >> v)
    while(string::npos != delimeters.find(ss.peek()))

Using embree compiled with gcc from a program compiled with clang

Friday, September 12th, 2014

I compiled Embree using my standard compiler gcc 4.7 (I’m holding out on clang for openmp support), but for a specific project I need to use clang. When I try to link against libembree.a and libsys.a I get this sort of error:

  "std::string::find(char, unsigned long) const", referenced from:
      embree::AccelRegistry::create(std::string, embree::RTCGeometry*) in libembree.a(registry_accel.o)
      embree::BuilderRegistry::build(std::string, embree::TaskScheduler::Event*, embree::Accel*) in libembree.a(registry_builder.o)
      embree::IntersectorRegistry<embree::Intersector1>::get(std::string, std::string, embree::Accel*) in libembree.a(registry_intersector.o)
      embree::IntersectorRegistry<embree::Intersector4>::get(std::string, std::string, embree::Accel*) in libembree.a(registry_intersector.o)
  "std::string::compare(char const*) const", referenced from:
      embree::AccelRegistry::create(std::string, embree::RTCGeometry*) in libembree.a(registry_accel.o)
      embree::BuilderRegistry::build(std::string, embree::TaskScheduler::Event*, embree::Accel*) in libembree.a(registry_builder.o)
      embree::IntersectorRegistry<embree::Intersector1>::get(std::string, std::string, embree::Accel*) in libembree.a(registry_intersector.o)
      embree::IntersectorRegistry<embree::Intersector4>::get(std::string, std::string, embree::Accel*) in libembree.a(registry_intersector.o)

I tried using the flag -stdlib=libc++ but this doesn’t fix the problem. Using -stdlib=libstdc++ most errors go away except for:

  "std::ctype<char>::_M_widen_init() const", referenced from:
      embree::rtcBuildAccel(embree::RTCGeometry*, char const*) in libembree.a(embree.o)
      embree::BVHStatisticsT<embree::BVH2>::print() in libembree.a(bvh2.o)
      embree::BVHStatisticsT<embree::BVH4>::print() in libembree.a(bvh4.o)
      embree::BVHBuilderT<embree::BVH4, embree::HeuristicBinning<0> >::createLeafNode(unsigned long, embree::atomic_set<embree::PrimRefBlock>&, embree::HeuristicBinning<0>::PrimInfo const&) in libembree.a(bvh4.o)
      embree::BVHBuilderT<embree::BVH4, embree::HeuristicBinning<2> >::createLeafNode(unsigned long, embree::atomic_set<embree::PrimRefBlock>&, embree::HeuristicBinning<2>::PrimInfo const&) in libembree.a(bvh4.o)
      embree::BVHBuilderT<embree::BVH4, embree::HeuristicSpatial<0> >::createLeafNode(unsigned long, embree::atomic_set<embree::PrimRefBlock>&, embree::HeuristicSpatial<0>::PrimInfo const&) in libembree.a(bvh4.o)
      embree::BVHBuilderT<embree::BVH4, embree::HeuristicSpatial<2> >::createLeafNode(unsigned long, embree::atomic_set<embree::PrimRefBlock>&, embree::HeuristicSpatial<2>::PrimInfo const&) in libembree.a(bvh4.o)

This error seems trickier and to remove it I explicitly link against libgcc_s.a found in my gcc’s libraries: add the linker argument /opt/local/lib/gcc47/libstdc++.a. This gets me close, but there’s a new error:

  "___emutls_get_address", referenced from:
      ___cxa_get_globals_fast in libstdc++.a(eh_globals.o)
      ___cxa_get_globals in libstdc++.a(eh_globals.o)

Turns out that I need another library from gcc making the arguments now: /opt/local/lib/gcc47/libstdc++.a /opt/local/lib/gcc47/libgcc_s.1.dylib. And this finally works.

Update: Oddly it seems on the static libstdc++.a has the final problem. Linking to the dymnamic library libstdc++.dylib in the same directory works fine (maybe because I’m just avoiding the issue with this example).

Default and implicit values/arguments using boost program options

Wednesday, December 4th, 2013

I struggled to find good documentation for implementing a command line program option using boost which supports the following behaviors:

./test -t
./test -t explicit

That is, a default value, an “implicit value” and an explicit value. The ./test -t case was giving me trouble and I was getting runtime errors like:

.main(65770) malloc: *** error for object 0x7fff77545860: pointer being freed was not allocated
*** set a breakpoint in malloc_error_break to debug
#4  0x00000001002a4313 in std::_Rb_tree<std::string, std::pair<std::string const, std::string>, std::_Select1st<std::pair<std::string const, std::string> >, std::less<std::string>, std::allocator<std::pair<std::string const, std::string> > >::_M_erase (this=0x7fff5fbfe398, __x=0x10705f650) at basic_string.h:246

Turns out this is possible using the implicit_value() function. Here’s a small example:

#include <iostream>
#include <boost/program_options.hpp>

int main(int argc, char * argv[])
  using namespace std;
  string test;
  namespace po = boost::program_options;
  po::options_description desc("Options");
    "test option, followed by string");
    po::variables_map vm;
    po::store(po::parse_command_line(argc, argv,desc), vm);
  }catch(const exception & e)
    cerr << e.what() << endl << endl;
    cout << desc << endl;
    return 1;
  cout<<"test: "<<test<<endl;

Running ./test produces test: default, ./test -t or ./test --test produces test: implicit, and ./test -texplicit, ./test -t explicit, ./test --test=explicit, or ./test --test explicit produces test: explicit.

Bug fix in AntTweakBar TwDefineEnumFromString

Wednesday, November 20th, 2013

After experiencing some odd behavior, I tracked down a bug in AntTweakBar’s TwDefineEnumFromString function. The problem seems to be that the labels in the label-value pairs stored for custom enum types are stored as const char *. The original code sets these pointers to dynamically allocated std::strings via the c_str function. This is trouble because the strings are in a local scope and then the memory can be arbitrarily written over.

Without changing too much I fixed this by first copying the strings into dynamic memory. This causes a leak, but I think to fix the leak I’d have to change the implementation of the struct that stores the label-value pair and refactor.

So just change the code in TwMgr.cpp to look like:

TwType TW_CALL TwDefineEnumFromString(const char *_Name, const char *_EnumString)
    if (_EnumString == NULL) 
        return TwDefineEnum(_Name, NULL, 0);

    // split enumString
    stringstream EnumStream(_EnumString);
    string Label;
    vector<string> Labels;
    while( getline(EnumStream, Label, ',') ) {
        // trim Label
        size_t Start = Label.find_first_not_of(" \n\r\t");
        size_t End = Label.find_last_not_of(" \n\r\t");
        if( Start==string::npos || End==string::npos )
            Label = "";
            Label = Label.substr(Start, (End-Start)+1);
        // store Label
    // create TwEnumVal array
    vector<TwEnumVal> Vals(Labels.size());
    for( int i=0; i<(int)Labels.size(); i++ )
        Vals[i].Value = i;
        // CHANGE HERE:
        //Vals[i].Label = Labels[i].c_str();
        char * c_label = new char[Labels[i].length()+1];
        std::strcpy(c_label, Labels[i].c_str());
        Vals[i].Label = c_label;

    return TwDefineEnum(_Name, Vals.empty() ? NULL : &(Vals[0]), (unsigned int)Vals.size());

Parsing optional input parameters/arguments in MATLAB function

Thursday, August 29th, 2013

Here’s some boilerplate code I use for parsing additional, optional input parameters when I write a matlab function. Usual I structure my matlab function prototypes as follows:

function C = function_name(A,B,varargin)

I comment this prototype with a message that’s very useful when issuing help function_name:

% FUNCTION_NAME This is a high-level description of what function_name takes
% as input, what it does and what it produces as output
% C = function_name(A,B)
% C = function_name(A,B,'ParameterName',ParameterValue)
% Inputs:
%   A  num_cols by num_rows matrix of blah blah blah
%   B  num_cols by num_rows matrix of blah blah blah
%   Optional:
%     'ParameterName1' followed by an integer blah blah {1}
%     'ParameterName2' followed by one of 'foo','bar',{'oof'} blah blah
% Output:
%   C  num_cols by num_rows matrix of blah blah blah

I parse the optional inputs with a while loop and switch like this:

% defaults for optional parameters
parameter_value1 = 1;
parameter_value2 = 'oof';

% parse optional input parameters
v = 1;
while v < numel(varargin)
  switch varargin{v}
  case 'ParameterName1'
    v = v+1;
    parameter_value1 = varargin{v};
  case 'ParameterName2'
    v = v+1;
    parameter_value2 = varargin{v};
    error('Unsupported parameter: %s',varargin{v});
  v = v+1;

Update: I should probably be using matlab’s built-in inputParser class, but it means that parameter names have to match variable names in my code. And variables are stored as fields in the inputParse instance: e.g. parser.Results.variable_name1. The advantage though is that it can easily handle input type validation.

Here’s an updated custom optional input parser. So far without validation, though it seems clear how to add support via function handles, e.g. @isnumeric, @islogical, @() customthing... .

  % default values
  variable_name1 = false;
  variable_name2 = [1,2,3];
  % Map of parameter names to variable names
  params_to_variables = containers.Map( ...
    {'ParamName1','ParamName2'}, ...
  v = 1;
  while v <= numel(varargin)
    param_name = varargin{v};
    if isKey(params_to_variables,param_name)
      v = v+1;
      % Trick: use feval on anonymous function to use assignin to this workspace 
      error('Unsupported parameter: %s',varargin{v});

Oneliner for dealing with strings in C++

Friday, October 5th, 2012

I always end up clawing at my face whenever I have to deal with strings in C++. Here are two handy macros to make life a bit easier:

// Suppose you have a function:
//   void func(const char * c);
// Then you can write:
//   func(C_STR("foo"<<1<<"bar"));
#define C_STR(X) static_cast<std::ostringstream&>(std::ostringstream().seekp(0) << X).str().c_str()
// Suppose you have a function:
//   void func(std::string c);
// Then you can write:
//   func(STR("foo"<<1<<"bar"));
#define STR(X) static_cast<std::ostringstream&>(std::ostringstream().seekp(0) << X).str()

“Free” two-line font

Thursday, March 8th, 2012

Writing SIGGRAPH rebuttals, we’re all paranoid about word count (we get 1000 words of ASCII text to refute the reviews). The usual word count tricks are often employed: contraction “do not” to “don’t”, dropping articles, dropping “el al.”s from citations. I’ve even heard of even riskier tricks like combining “Reviewer #01” into “Rev01”. Most of these tricks come at the expense of clarity and professionalism, but it got me thinking about the word count game.

The word count algorithm employed by SIGGRAPH’s SIS system follows a view rules revealed through their source. The gist is that “non-alphanumerics” get replaced by spaces, then the count of words are just the number of tokens separated by whitespace. In reg-ex form non-alphanumerics are defined here to be:


Note:The only interesting thing here is that the apostrophe is OK meaning, where as “light-blue” counts as two words “don’t” counts as one.
Note:There’s another slight subtlety that if an apostrophe occurs alone it is OK. This just means ” ‘ ” is 0 words and ” -‘- ” is also 0 words.

But if non-alphanumerics show up alone, that is, never neighboring an alphanumeric character, then you get 0 words.

This leads to a ridiculous “trick” for gaming the word count system by employing a “two-line” font composed entirely of non-alphanumeric words. Here’s quick prototype:

/'`  /_\  |\ |   \ / /'\ | |   |_) /_  /_\  |'\
\_. /   \ | \|    |  \_/ |_|   | \ \_ /   \ |_/

''|'' |_| |  ('   [' /'\ |\ | ''|''   )
  |   | | |  _)   |  \_/ | \|   |     .

Note: The obvious danger (besides being embarrassed by actually submitting something written in this font) is that the reviewers may not see it in a correctly line-separated or monospaced, original font. In that case it just looks like garbage.

New lines in matlab, figure legends and elsewhere

Wednesday, November 16th, 2011

I tried naively to use \n in a matlab string hoping that it would come out as the newline character:

a = 'first\nsecond'

But you just get literally the 2 characters, \ and n in your string.

a =


You code filter your string through sprintf:

b = sprintf('first\nsecond')

which gives you the newline:

b =


But this would cause problems if your string happened to contain any of the other sprintf symbols, like %, resulting in missing characters or errors:


Puts in the newline but removes the % sign:

ans =


Instead, it’s best to use the character code for the new line directly:

c = ['first' char(10) 'second'];

Concatenating strings in matlab is clunky but the newline shows up correctly without any side effects:

c =


You could easily make a simple replacement function that finds any \n sequences and replaces them with char(10)s with the following:

nl = @(s) strrep(s,'\n',char(10));

Then use your new function nl on your original string with \n to get the newline without worrying about concatenating:

d = nl('first\nsecond');


d =


Matlab fprintf function returning cell array of strings

Monday, June 20th, 2011

I always forget how to fprintf the output of a function return a cell array. For example:


Produces the error:

??? Error using ==> fprintf
Function is not defined for 'cell' inputs.

The solution for passing each string in the cell array to fprintf is simple if you can first save the output to a variable:

A = depfun('magic','-quiet');

Correctly produces