Jump to content

How to manipulate numbers with thousand separators using regular expressions

+ 2
  Jan Goyvaerts's Photo
Posted Sep 18 2009 03:55 PM

If you want to match numbers that use the comma as the thousand separator and the dot as the decimal separator, refer to the following examples:


Mandatory integer and fraction:

^[0-9]{1,3}(,[0-9]{3})*\.[0-9]+$
Regex options: None
Regex flavors: .NET, Java, Javascript, PCRE, Perl, Python, Ruby

Mandatory integer and optional fraction. Decimal dot must be omitted if the fraction is omitted.

^[0-9]{1,3}(,[0-9]{3})*(\.[0-9]+)?$
Regex options: None
Regex flavors: .NET, Java, Javascript, PCRE, Perl, Python, Ruby

Optional integer and optional fraction. Decimal dot must be omitted if the fraction is omitted.

^([0-9]{1,3}(,[0-9]{3})*(\.[0-9]+)?|\.[0-9]+)$
Regex options: None
Regex flavors: .NET, Java, Javascript, PCRE, Perl, Python, Ruby

The preceding regex, edited to find the number in a larger body of text:

\b[0-9]{1,3}(,[0-9]{3})*(\.[0-9]+)?\b|\.[0-9]+\b
Regex options: None
Regex flavors: .NET, Java, Javascript, PCRE, Perl, Python, Ruby

Since these are all regular expressions for matching floating-point numbers, they use the same techniques as the previous recipe. The only difference is that instead of simply matching the integer part with [0-9]+, we now use [0-9]{1,3}(,[0-9]{3})*. This regular expression matches between 1 and 3 digits, followed by zero or more groups that consist of a comma and 3 digits.

We cannot use [0-9]{0,3}(,[0-9]{3})* to make the integer part optional, because that would match numbers with a leading comma, e.g., ,123. It’s the same trap of making everything optional, explained in the previous recipe. To make the integer part optional, we don’t change the part of the regex for the integer, but instead make it optional in its entirety. The last two regexes in the solution do this using alternation. The regex for a mandatory integer and optional fraction is alternated with a regex that matches the fraction without the integer. That yields a regex where both integer and fraction are optional, but not at the same time.

Cover of Regular Exp<b></b>ressions Cookbook
Learn more about this topic from Regular Expressions Cookbook. 

This cookbook provides more than 100 recipes to help you crunch data and manipulate text with regular expressions. With recipes for popular programming languages such as C#, Java, Javascript, Perl, PHP, Python, Ruby, and VB.NET, Regular Expressions Cookbook will help you learn powerful new tricks, avoid language-specific gotchas, and save valuable time with this library of proven solutions to difficult, real-world problems.

Learn More Read Now on Safari


Tags:
0 Subscribe


1 Reply

0
  bustrofedico's Photo
Posted Yesterday, 12:05 PM

I would like to get rid of the commas when used as a thousand separator. I have adapted your regex as follows:

(^|\s)([0-9]{1,3})(,([0-9]{3}))*(\.[0-9]+)?(\s|$)


My hope was that by saying replace with $2$4$5 I would get rid of the comma.

Unfortunately the $4 replacement only outputs the first match of the recursive match....

for example given

1,213,233,213,121,232.32

I get

1232.32

Any brilliant solution to this?