Jump to content

How to Match IPv4 Addresses with Regular Expressions

+ 1
  Jan Goyvaerts's Photo
Posted Sep 30 2009 01:11 PM

If you want to check whether a certain string represents a valid IPv4 address in 255.255.255.255 notation, try one of these examples from Regular Expressions Cookbook:

Simple regex to check for an IP address:

^(?:[0-9]{1,3}\.){3}[0-9]{1,3}$
Regex options: None
Regex flavors: .NET, Java, Javascript, PCRE, Perl, Python, Ruby

Accurate regex to check for an IP address:

^(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}↵

(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)$
Regex options: None
Regex flavors: .NET, Java, Javascript, PCRE, Perl, Python, Ruby

Simple regex to extract IP addresses from longer text:

\b(?:[0-9]{1,3}\.){3}[0-9]{1,3}\b
Regex options: None
Regex flavors: .NET, Java, Javascript, PCRE, Perl, Python, Ruby

Accurate regex to extract IP addresses from longer text:

\b(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}↵

(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\b
Regex options: None
Regex flavors: .NET, Java, Javascript, PCRE, Perl, Python, Ruby

Simple regex that captures the four parts of the IP address:

^([0-9]{1,3})\.([0-9]{1,3})\.([0-9]{1,3})\.([0-9]{1,3})$
Regex options: None
Regex flavors: .NET, Java, Javascript, PCRE, Perl, Python, Ruby

Accurate regex that captures the four parts of the IP address:

^(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.↵

(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.↵

(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.↵

(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)$
Regex options: None
Regex flavors: .NET, Java, Javascript, PCRE, Perl, Python, Ruby

Perl

if ($subject =~ m/^([0-9]{1,3})\.([0-9]{1,3})\.([0-9]{1,3})\.([0-9]{1,3})/)

{

    $ip = $1 << 24 | $2 << 16 | $3 << 8 | $4;

}

A version 4 IP address is usually written in the form 255.255.255.255, where each of the four numbers must be between 0 and 255. Matching such IP addresses with a regular expression is very straightforward.

In the solution we present four regular expressions. Two of them are billed as “simple,” while the other two are marked “accurate.”

The simple regexes use [0-9]{1,3} to match each of the four blocks of digits in the IP address. These actually allow numbers from 0 to 999 rather than 0 to 255. The simple regexes are more efficient when you already know your input will contain only valid IP addresses, and you only need to separate the IP addresses from the other stuff.

The accurate regexes use 25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]? to match each of the four numbers in the IP address. This regex accurately matches a number in the range 0 to 255, with one optional leading zero for numbers between 10 and 99, and two optional leading zeros for numbers between 0 and 9. 25[0-5] matches 250 through 255, 2[0-4][0-9] matches 200 to 249, and [01]?[0-9][0-9]? takes care of 0 to 199, including the optional leading zeros.

If you want to check whether a string is a valid IP address in its entirety, use one of the regexes that begin with a caret and end with a dollar. These are the start-of-string and end-of-string anchors. If you want to find IP addresses within longer text, use one of the regexes that begin and end with the word boundaries \b.

The first four regular expressions use the form (?:number\.){3}number. The first three numbers in the IP address are matched by a noncapturing group that is repeated three times. The group matches a number and a literal dot, of which there are three in an IP address. The last part of the regex matches the final number in the IP address. Using the noncapturing group and repeating it three times makes our regular expression shorter and more efficient.

To convert the textual representation of the IP address into an integer, we need to capture the four numbers separately. The last two regexes in the solution do this. Instead of using the trick of repeating a group three times, they have four capturing groups, one for each number. Spelling things out this way is the only way we can separately capture all four numbers in the IP address.

Once we’ve captured the number, combining them into a 32-bit number is easy. In Perl, the special variables $1, $2, $3, and $4 hold the text matched by the four capturing groups in the regular expression. In Perl, the string variables for the capturing groups are automatically coerced into numbers when we apply the bitwise left shift operator (<<) to them. In other languages, you may have to call String.toInteger() or something similar before you can shift the numbers and combine them with a bitwise or.

Regular Exp<b></b>ressions Cookbook

Learn more about this topic from Regular Expressions Cookbook.

This cookbook provides more than 100 recipes to help you crunch data and manipulate text with regular expressions. With recipes for popular programming languages such as C#, Java, Javascript, Perl, PHP, Python, Ruby, and VB.NET, Regular Expressions Cookbook will help you learn powerful new tricks, avoid language-specific gotchas, and save valuable time with this library of proven solutions to difficult, real-world problems.

See what you'll learn


1 Reply

 : Sep 06 2013 08:28 AM
The solutions that you present seem to be the standard reply to this question on the Web, but they do not take into account that a leading 0 in a byte signifies that the byte is expressed as an octal number, not a decimal number. For example, in Windows 7:

>ping 0300.168.218.201

Pinging 192.168.218.201 with 32 bytes of data:
Reply from 192.168.218.201: bytes=32 time=1ms TTL=252
Reply from 192.168.218.201: bytes=32 time=3ms TTL=252
Reply from 192.168.218.201: bytes=32 time=1ms TTL=252
Reply from 192.168.218.201: bytes=32 time=2ms TTL=252

Similarly, on Solaris 10:

> ping -s 0300.168.218.201
PING 0300.168.218.201: 56 data bytes
64 bytes from mephisto (192.168.218.201): icmp_seq=0. time=1.03 ms
64 bytes from mephisto (192.168.218.201): icmp_seq=1. time=7.30 ms
64 bytes from mephisto (192.168.218.201): icmp_seq=2. time=4.97 ms

Certainly, the use of octal numbers in IP addresses is rare, but it is allowed.

A corollary to this is that 192.168.214.095 is not a valid IP address, and since it is not, Windows 7 and Solaris 10 seem to react to it differently.

On Windows 7, it seems to be taken as host name, which cannot be resolved:
>ping 192.168.214.95
Pinging 192.168.214.95 with 32 bytes of data:
Reply from 192.168.214.95: bytes=32 time=1ms TTL=61
Reply from 192.168.214.95: bytes=32 time=1ms TTL=61
...
>ping 192.168.214.095
Ping request could not find host 192.168.214.095. Please check the name and try again.

On solaris 10, it appears that Solaris resolved the IP address to something and sent it packets :
> ping 192.168.214.95
192.168.214.95 is alive
> ping 192.168.214.095
no answer from 192.168.214.095
> ping -s 192.168.214.095
PING 192.168.214.095: 56 data bytes
^C
----192.168.214.095 PING Statistics----
451 packets transmitted, 0 packets received, 100% packet loss

Regards,
Jim