Jump to content

How to validate traditional date formats with regular expressions

+ 2
  Jan Goyvaerts's Photo
Posted Sep 18 2009 12:45 PM

If you want to validate dates in the traditional formats mm/dd/yy, mm/dd/yyyy, dd/mm/yy, and dd/mm/yyyy, using a simple regex that simply checks whether the input looks like a date, without trying to weed out things such as February 31st, refer to the following:

Match any of these date formats, allowing leading zeros to be omitted:

^[0-3]?[0-9]/[0-3]?[0-9]/(?:[0-9]{2})?[0-9]{2}$
Regex options: None
Regex flavors: .NET, Java, Javascript, PCRE, Perl, Python, Ruby

Match any of these date formats, requiring leading zeros:

^[0-3][0-9]/[0-3][0-9]/(?:[0-9][0-9])?[0-9][0-9]$
Regex options: None
Regex flavors: .NET, Java, Javascript, PCRE, Perl, Python, Ruby

Match m/d/yy and mm/dd/yyyy, allowing any combination of one or two digits for the day and month, and two or four digits for the year:

^(1[0-2]|0?[1-9])/(3[01]|[12][0-9]|0?[1-9])/(?:[0-9]{2})?[0-9]{2}$
Regex options: None
Regex flavors: .NET, Java, Javascript, PCRE, Perl, Python, Ruby

Match mm/dd/yyyy, requiring leading zeros:

^(1[0-2]|0[1-9])/(3[01]|[12][0-9]|0[1-9])/[0-9]{4}$
Regex options: None
Regex flavors: .NET, Java, Javascript, PCRE, Perl, Python, Ruby

Match d/m/yy and dd/mm/yyyy, allowing any combination of one or two digits for the day and month, and two or four digits for the year:

^(3[01]|[12][0-9]|0?[1-9])/(1[0-2]|0?[1-9])/(?:[0-9]{2})?[0-9]{2}$
Regex options: None
Regex flavors: .NET, Java, Javascript, PCRE, Perl, Python, Ruby

Match dd/mm/yyyy, requiring leading zeros:

^(3[01]|[12][0-9]|0[1-9])/(1[0-2]|0[1-9])/[0-9]{4}$
Regex options: None
Regex flavors: .NET, Java, Javascript, PCRE, Perl, Python, Ruby

Match any of these date formats with greater accuracy, allowing leading zeros to be omitted:

^(?:(1[0-2]|0?[1-9])/(3[01]|[12][0-9]|0?[1-9])|↵

(3[01]|[12][0-9]|0?[1-9])/(1[0-2]|0?[1-9]))/(?:[0-9]{2})?[0-9]{2}$
Regex options: None
Regex flavors: .NET, Java, Javascript, PCRE, Perl, Python, Ruby

Match any of these date formats with greater accuracy, requiring leading zeros:

^(?:(1[0-2]|0[1-9])/(3[01]|[12][0-9]|0[1-9])|↵

(3[01]|[12][0-9]|0[1-9])/(1[0-2]|0[1-9]))/[0-9]{4}$
Regex options: None
Regex flavors: .NET, Java, Javascript, PCRE, Perl, Python, Ruby

The free-spacing option makes these last two a bit more readable:

^(?:

  # m/d or mm/dd

  (1[0-2]|0?[1-9])/(3[01]|[12][0-9]|0?[1-9])

|

  # d/m or dd/mm

  (3[01]|[12][0-9]|0?[1-9])/(1[0-2]|0?[1-9])

)

# /yy or /yyyy

/(?:[0-9]{2})?[0-9]{2}$
Regex options: Free-spacing
Regex flavors: .NET, Java, PCRE, Perl, Python, Ruby

^(?:

  # mm/dd

  (1[0-2]|0[1-9])/(3[01]|[12][0-9]|0[1-9])

|

  # dd/mm

  (3[01]|[12][0-9]|0[1-9])/(1[0-2]|0[1-9])

)

# /yyyy

/[0-9]{4}$
Regex options: Free-spacing
Regex flavors: .NET, Java, PCRE, Perl, Python, Ruby

You might think that something as conceptually trivial as a date should be an easy job for a regular expression. But it isn’t, for two reasons. Because dates are such an everyday thing, humans are very sloppy with them. 4/1 may be April Fools’ Day to you. To somebody else, it may be the first working day of the year, if New Year’s Day is on a Friday. The solutions shown match some of the most common date formats.

The other issue is that regular expressions don’t deal directly with numbers. You can’t tell a regular expression to “match a number between 1 and 31”, for instance. Regular expressions work character by character. We use 3[01]|[12][0-9]|0?[1-9] to match 3 followed by 0 or 1, or to match 1 or 2 followed by any digit, or to match an optional 0 followed by 1 to 9. In character classes, we can use ranges for single digits, such as [1-9]. That’s because the characters for the digits 0 through 9 occupy consecutive positions in the ASCII and Unicode character tables.

Because of this, you have to choose how simple or how accurate you want your regular expression to be. If you already know your subject text doesn’t contain any invalid dates, you could use a trivial regex such as \d{2}/\d{2}/\d{4}. The fact that this matches things like 99/99/9999 is irrelevant if those don’t occur in the subject text. You can quickly type in this simple regex, and it will be quickly executed.

The first two solutions for this recipe are quick and simple, too, and they also match invalid dates, such as 0/0/00 and 31/31/2008. They only use literal characters for the date delimiters, and character classes for the digits and the question mark to make certain digits optional. (?:[0-9]{2})?[0-9]{2} allows the year to consist of two or four digits. [0-9]{2} matches exactly two digits. (?:[0-9]{2})? matches zero or two digits. The noncapturing group is required, because the question mark needs to apply to the character class and the quantifier {2} combined. [0-9]{2}? matches exactly two digits, just like [0-9]{2}. Without the group, the question mark makes the quantifier lazy, which has no effect because {2} cannot repeat more than two times or fewer than two times.

Solutions 3 through 6 restrict the month to numbers between 1 and 12, and the day to numbers between 1 and 31. We use alternation inside a group to match various pairs of digits to form a range of two-digit numbers. We use capturing groups here because you’ll probably want to capture the day and month numbers anyway.

The final two solutions are a little more complex, so we’re presenting these in both condensed and free-spacing form. The only difference between the two forms is readability. Javascript does not support free-spacing. The final solutions allow all of the date formats, just like the first two examples. The difference is that the last two use an extra level of alternation to restrict the dates to 12/31 and 31/12, disallowing invalid months, such as 31/31.


Variations


If you want to search for dates in larger bodies of text instead of checking whether the input as a whole is a date, you cannot use the anchors ^ and $. Merely removing the anchors from the regular expression is not the right solution. That would allow any of these regexes to match 12/12/2001 within 9912/12/200199, for example. Instead of anchoring the regex match to the start and end of the subject, you have to specify that the date cannot be part of longer sequences of digits.

This is easily done with a pair of word boundaries. In regular expressions, digits are treated as characters that can be part of words. Replace both ^ and $ with \b. As an example:

\b(1[0-2]|0[1-9])/(3[01]|[12][0-9]|0[1-9])/[0-9]{4}\b
Regex options: None
Regex flavors: .NET, Java, Javascript, PCRE, Perl, Python, Ruby
Regular Exp<b></b>ressions Cookbook

Learn more about this topic from Regular Expressions Cookbook.

This cookbook provides more than 100 recipes to help you crunch data and manipulate text with regular expressions. With recipes for popular programming languages such as C#, Java, Javascript, Perl, PHP, Python, Ruby, and VB.NET, Regular Expressions Cookbook will help you learn powerful new tricks, avoid language-specific gotchas, and save valuable time with this library of proven solutions to difficult, real-world problems.

See what you'll learn


0 Replies