Jump to content

How to Trim Whitespace in C#

0
  chco's Photo
Posted Aug 23 2010 10:50 AM

The following excerpt from Programming C# 4.0, Sixth Edition shows you how to trim extra whitespace from strings of text.
You often (but not always) want to trim whitespace from the beginning and/or end of a piece of text; especially user-provided text. When storing data in a SQL database, for example, it is frequently desirable to trim this whitespace.

With that in mind, the framework provides us with the Trim, TrimStart, and TrimEnd methods. Example 10-77 uses Trim to remove the whitespace at the start and end of every line.

Example 10-77. Trimming whitespace

foreach (string line in strings)
{
    if (line != null)
    {
        string trimmedLine = line.Trim();
        if (trimmedLine.Length != 0)
        {
            output.AppendLine(trimmedLine);
        }
        else
        {
            System.Diagnostics.Debug.WriteLine(
                "Found a blank line (after trimming)");
        }
    }
    else
    {
        System.Diagnostics.Debug.WriteLine("Found a null line");
    }
}


Notice how we’re trimming the line once, and storing a reference to the result in a variable, then using that trimmed string in our subsequent tests. Because we’re calling a method on our string instance, we need to test it for nullness before we do that, or we’ll get a null reference exception. This means that we don’t need to call IsNullOrEmpty in our later test. We know that it cannot be null. Instead, we do a quick test for emptiness. It turns out that the most efficient way to do this is not to compare against String.Empty but to check the Length of our string.

If we build and run this, we see the following output:

To be, or not to be--that is the question:
Whether 'tis nobler in the mind to suffer,
The slings and arrows of outrageous fortune ,
Or to take arms against a sea of troubles,
And by opposing end them.


And in the output window:

Found a blank line (after trimming)
Found a null line
Found a blank line (after trimming)
Found a blank line (after trimming)
Found a blank line (after trimming)
Found a blank line (after trimming)
Found a blank line (after trimming)
Found a blank line (after trimming)
Found a blank line (after trimming)


You’ll notice that Trim has successfully removed all the whitespace at the beginning and end of each line, both spaces and tab characters, but left the whitespace in the middle of the line alone.

Trim isn’t limited to removing whitespace characters, though. Another overload allows us to specify the array of characters we want to trim from the beginning or end of the line. We could use this to get rid of those spurious commas, too, using the code in Example 10-78.

Example 10-78. Trimming specific characters

string trimmedLine = line.Trim(' ', '\t', ',');


This overload of Trim uses the parameter array syntax, so we can specify the characters we want to trim as a simple parameter list. In this case, we tell it to trim spaces, tabs, and commas.

Our output, then, looks like this:

To be, or not to be--that is the question:
Whether 'tis nobler in the mind to suffer
The slings and arrows of outrageous fortune
Or to take arms against a sea of troubles
And by opposing end them.


Of course, although the output is correct for this particular input, it isn’t quite the same as the original Trim function—it isn’t removing all possible whitespace characters, just the ones we happened to remember to list. There are a surprising number of different characters that represent whitespace—as well as your basic ordinary space, .NET recognizes a character for an en space (one the same width as the letter N), an em space (the same width as M), a thin space, and a hair space, to name just a few. There are more than 20 of the things!

Example 10-79 shows a function that will trim all whitespace, plus any additional characters we specify.

Example 10-79. Trimming any whitespace and specific additional characters

private static string TrimWhitespaceAnd(
    string inputString,
    params char[] characters)
{
    int start = 0;
    while (start < inputString.Length)
    {
        // If it is neither whitespace nor a character from our list
        // then we've hit the first non-trimmable character, so we can stop
        if (!char.IsWhiteSpace(inputString[start]) &&
            !characters.Contains(inputString[start]))
        {
            break;
        }
        // Work forward a character
        start++;
    }
    // Work backwards from the end
    int end = inputString.Length −1;
    while (end >= start)
    {
        // If it is neither whitespace nor a character from our list
        // then we've hit the first non-trimmable character
        if (!char.IsWhiteSpace(inputString[end]) &&
            !characters.Contains(inputString[end]))
        {
            break;
        }
        // Work back a character
        end--;
    }
    // Work out how long our string is for the
    // substring function
    int length = (end - start) + 1;
    if (length == inputString.Length)
    {
        // If we didn't trim anything, just return the
        // input string (don't create a new one
        return inputString;
    }
    // If the length is zero, then return the empty string
    if (length == 0)
    {
        return string.Empty;
    }
    return inputString.Substring(start, length);
}



This method works by iterating through our string, examining each character and checking to see whether it should be trimmed. If so, then we increment the start position by one character, and check the next one, until we hit a character that should not be trimmed, or the end of the string. We then do the same thing starting from the end of the string, and reversing character by character until we reach the start point.


Note: If you wanted to write the equivalent of TrimStart or TrimEnd you would just optionally leave out the end or start checking, respectively.



Finally, we create our new output string, by using the Substring method we looked at earlier. Notice how we’ve avoided creating strings unnecessarily; we don’t build up the results as we go along, and we don’t create new strings in the “no change” and “empty” cases. (We could have written a much shorter function if we weren’t worried about this: inputString.Trim().Trim(characters) would have done the whole job! However, with two calls to Trim, we end up generating two new strings instead of one. You’d need to measure your code’s performance in realistic test scenarios to find out whether the more complex code in Example 10-79 is worth the effort. We’re showing it mainly to illustrate how to dig around inside a string.)

The interesting new bit of code, though, is that char.IsWhitespace method.

Programming C# 4.0

Learn more about this topic from Programming C# 4.0.

This bestselling tutorial shows you how to build web, desktop, and rich Internet applications using C# 4.0 with .NET's database capabilities, UI framework (WPF), extensive communication services (WCF), and more. The sixth edition covers the latest enhancements to C#, as well as the fundamentals of both the language and framework. You'll learn concurrent programming with C# 4.0, and how to use .NET tools such as the Entity Framework for easier data access, and the Silverlight platform for browser-based RIA development.

See what you'll learn


Tags:
0 Subscribe


0 Replies