A common pattern I find myself writing is something like the following which divides $n$ elements into $m$ components and then processes each component:

```
% Some big array of data
X = rand(n,3);
% labels of each entry into a component [1,m] (use random here)
C = randi(m,n,1);
% Loop over each component
for c = 1:max(C)
% Get the indices of the c-th component
Ic = find(C == c);
% Do something with just the c-th component
Xc = X(Ic,:);
...
end
```

This code is $O(n\cdot m)$ for $n$ entries in my big array and $m$ components because the `find(C == c)`

line is $O(n)$.
Since for loops can be slow in matlab and the `% Do something`

might be pretty heavy, often the cost of this `find(C == c)`

isn't felt. Especially if $m$ is small.

When $m$ is big this can be slow. One way to deal with it is to sort `C`

and
process the sorted chunks. That'd be $O(n\cdot \log(n))$ (ever element is
processed only once during the for loop):

```
% Sort the indices of C
[~,I] = sort(C);
% Cumulative lengths of each component
cumlens = [0;find(C(I(1:end-1))~=C(I(2:end)));numel(C)];
% Loop over each component
for c = 1:max(C)
Ic = I(cumlens(c)+1:cumlens(c+1));
...
end
```

That's not too difficult once it's written out, but it's not something that I can brainlessly write without bugs 100% of the time.

An alternative is to utilize sparse matrices to do the sorting for us and continue using `find`

to get the indices:

```
% Sparse matrix with 1's at the indices of C
S = sparse(1:n,C,1,n,m);
% Loop over each component
for c = 1:max(C)
% Get the indices of the c-th component
Ic = find(S(:,c));
% Do something with just the c-th component
Xc = X(Ic,:);
...
end
```

This works because matlab sparse matrices are stored in compressed sparse
column format. This costs us $O(n\cdot \log(n))$ when we create the sparse
matrix, but then the `find`

is linear cost in the size of the component (i.e.,
it has output bound complexity; just what we would like).

The pitfall of this method is remember that the columns are compressed not
the rows. If you try to do `S = sparse(C,1:n,1,m,n);`

and ```
Ic =
find(S(c,:));
```

you'll get burned and be back to $O(n)$ inside each
loop.

A super common place that I'm running into this is when I'm using gptoolbox to process submeshes. I might have code like:

```
% Some mesh with O(n) vertices and faces
[V,F] = ...
% Connected component labels per face
[~,C] = connected_components(F);
% process each component
for c = 1:max(C)
Ic = find(C == c);
Fc = F(Ic,:);
% Extract submesh. (warning this next line is O(n))
[Vc,~,~,Fc] = remove_unreferenced(V,Fc);
% Now Vc and Fc are both O(|Ic|). Do something with them.
...
end
```

There's another pitfall here, this time it's gptoolbox's fault. For legacy
reasons I've forgotten, the second output of `remove_unreferenced(V,Fc)`

is `#V`

-long
regardless of how few unique elements are passed in for `Fc`

. This causes the whole loop to be $O(n \cdot m)$ again.

I've finally added a 3rd, boolean argument to `remove_unreferenced`

so that
you can indicate that you don't carry about the second output argument and you'd
like the complexity to be $O(|Fc|)$. So the adjusted loop looks like:

```
S = sparse(1:size(F,1),C,1,size(F,1),max(C));
for c = 1:max(C)
Ic = find(S(:,c));
Fc = F(Ic,:);
% Extract submesh. This next line is now O(|Ic|)
[Vc,~,~,Fc] = remove_unreferenced(V,Fc,true);
% Now Vc and Fc are both O(|Ic|). Do something with them.
...
end
```