Splitting a String in C#

When I was at university programming in C, I spent a lot of time writing command line (shell based) programs. In doing so, one of the things you might need to do is parse the command line and split them it into various options (i.e. a.out –cd –sstring would often generate an array of –c, –d and –sstring.).

There was a library to do that of course (getopt), but I sometimes enjoyed writing my own, figuring I would learn more that way. Anyway, thinking back to that and a recently read challenge on Daily Programmer, I wondered how you could partially achieve that in C# and the excellent built in class libraries. That led me to:

string[] words = line.Split(new char[] { ' ' });

The Split method takes a character array of delimiters, and returns an array of strings. This works perfectly on strings like:

This is a wonderful world.

But falls down when you have sentences (strings) like this:

This    is     still   a   wonderful world.

The reason for that is because Split, in the form above, will create an empty string array element between each pair of delimiters (in this case, spaces). What do I mean? You basically end up with:

{ “This”, “”, “”, “”, “”, “is”….

and so on. To fix this, Split does take a second parameter (StringSplitOptions) which is an enumerated type that contains the element: RemoveEmptyEntries. Adding that in, we reach a final solution of:

string[] words = line.Split( new char[] { ' ' } , StringSplitOptions.RemoveEmptyEntries);

That’s pretty easy, but what if you wanted to roll your own? Here is my solution which doesn’t cater for multiple delimiters, nor is it overly careful about inputs. It’s merely a simple attempt at replicating the functionality in C#. Hope it’s useful.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
/// 
/// Split a string into a number of delimited entries, returned as a string array
///
///String representing line of text
///Character used to delimit entries (e.g. space)
/// Array of strings containing delimited entries
static string\[\] splitWords(string line, char delimiter)
{
List myList = new List();

// if a null object passed in, return empty List as a string array
if(line == null)
return myList.ToArray();

int i = 0, start;
do
{
// skip delimiters
while(i < line.Length && line\[i\] == delimiter)
i++;

// provided we aren't at the end of the string, ...
if(i < line.Length)
{
// make a note of where the start of the word begins
start = i;

// iterate over the non-delimiters
while(i < line.Length && line\[i\] != delimiter)
i++;

// extract the substring and add it to the List
myList.Add( line.Substring( start,i - start ) );
}
} while(i < line.Length);

return myList.ToArray( );
}

Hi! Did you find this useful or interesting? I have an email list coming soon, but in the meantime, if you ready anything you fancy chatting about, I would love to hear from you. You can contact me here or at stephen ‘at’ logicalmoon.com