There are a few commands to use when trying to look into an issue. The administrator needs to first identify the cluster status and determine if it is communicating.
The
show chassis cluster status command, although simple in nature, shows the administrator the status of the cluster. It shows who is the primary member for each redundancy group and the status of those nodes, and it will give insight into who should be passing traffic in the network. Here’s a sample:{primary:node1}
root@SRX210-B> show chassis cluster status
Cluster ID: 1
Node Priority Status Preempt Manual failover
Redundancy group: 0 , Failover count: 1
node0 254 secondary no no
node1 1 primary no no
Redundancy group: 1 , Failover count: 2
node0 254 primary no no
node1 1 secondary no no
{primary:node1}
root@SRX210-B>Things to look for here are that both nodes show as up, both have a priority greater than zero, both have a status of either primary, secondary, or secondary-hold, and one and only one node is primary for each redundancy group. Generally, if those conditions are met, things in the cluster should be looking OK. If not, and for some reason one of the nodes does not show up in this output, communication to the other node has been lost. The administrator should then connect to the other node and verify that it can communicate.
To validate that the two nodes can communicate the
show chassis cluster control-plane statistics command is used, showing the messages that are being sent between the two members. The send and receive numbers should be incrementing between the two nodes. If they are not, something may be wrong with both the control and fabric links. Here is an example with the statistics highlighted:{primary:node0}
root@SRX210-A> show chassis cluster control-plane statistics
Control link statistics:
Control link 0:
Heartbeat packets sent: 124
Heartbeat packets received: 95
Heartbeat packet errors: 0
Fabric link statistics:
Probes sent: 122
Probes received: 56
Probe errors: 0
{primary:node0}
root@SRX210-A>Again, this command should be familiar as it has been used in this chapter. If these (highlighted) numbers are not increasing, check the fabric and control plane interfaces. The fabric interfaces method is the same across all SRX products.
Next let’s check the fabric links. It’s important to verify that the fabric link and the child links show they are in an up state:
{primary:node0}
root@SRX210-A> show interfaces terse
Interface Admin Link Proto Local Remote
--snip--
fe-0/0/4.0 up up aenet --> fab0.0
fe-0/0/5 up up
fe-0/0/5.0 up up aenet --> fab0.0
--snip--
fe-2/0/4.0 up up aenet --> fab1.0
fe-2/0/5 up up
fe-2/0/5.0 up up aenet --> fab1.0
--snip--
fab0 up up
fab0.0 up up inet 30.17.0.200/24
fab1 up up
fab1.0 up up inet 30.18.0.200/24
--snip--
{primary:node0}
root@SRX210-A>If any of the child links of the fabric link,
fabX, show in a down state, this would show the interface that is physically down on the node. This must be restored to enable communications.The control link is the most critical to verify, and it varies per SRX platform type. On the branch devices, the interface that is configured as the control link must be checked. The procedure would be the same as any physical interface. Here an example from an SRX210 was used, and it shows that the specified interfaces are up:
{primary:node0}
root@SRX210-A> show interfaces terse
Interface Admin Link Proto Local Remote
--snip--
fe-0/0/7 up up
--snip--
fe-2/0/7 up up
--snip--
{primary:node0}
root@SRX210-A>On the data center SRXs, there is no direct way to check the state of the control ports; since the ports are dedicated off of switches inside the SRX and they are not typical interfaces, it’s not possible to check them. It is possible, however, to check the switch that is on the SCB to ensure that packets are being received from that card. Generally, though, if the port is up and configured correctly, there should be no reason why it won’t communicate. But checking the internal switch should show that packets are passing from the SPC to the RE. There will also be other communications coming from the card as well, but this at least provides insight into the communication. To check, the node and FPC that has the control link must be known. In the following command, the specified port coincides with the FPC number of the SPC with the control port:
{primary:node0}
root@SRX5800-1> show chassis ethernet-switch statistics 1 node 0
node0:
------------------------------------------------------------------
Displaying port statistics for switch 0
Statistics for port 1 connected to device FPC1:
TX Packets 64 Octets 7636786
TX Packets 65-127 Octets 989668
TX Packets 128-255 Octets 37108
TX Packets 256-511 Octets 35685
TX Packets 512-1023 Octets 233238
TX Packets 1024-1518 Octets 374077
TX Packets 1519-2047 Octets 0
TX Packets 2048-4095 Octets 0
TX Packets 4096-9216 Octets 0
TX 1519-1522 Good Vlan frms 0
TX Octets 9306562
TX Multicast Packets 24723
TX Broadcast Packets 219029
TX Single Collision frames 0
TX Mult. Collision frames 0
TX Late Collisions 0
TX Excessive Collisions 0
TX Collision frames 0
TX PAUSEMAC Ctrl Frames 0
TX MAC ctrl frames 0
TX Frame deferred Xmns 0
TX Frame excessive deferl 0
TX Oversize Packets 0
TX Jabbers 0
TX FCS Error Counter 0
TX Fragment Counter 0
TX Byte Counter 1335951885
RX Packets 64 Octets 6672950
RX Packets 65-127 Octets 2226967
RX Packets 128-255 Octets 39459
RX Packets 256-511 Octets 34332
RX Packets 512-1023 Octets 523505
RX Packets 1024-1518 Octets 51945
RX Packets 1519-2047 Octets 0
RX Packets 2048-4095 Octets 0
RX Packets 4096-9216 Octets 0
RX Octets 9549158
RX Multicast Packets 24674
RX Broadcast Packets 364537
RX FCS Errors 0
RX Align Errors 0
RX Fragments 0
RX Symbol errors 0
RX Unsupported opcodes 0
RX Out of Range Length 0
RX False Carrier Errors 0
RX Undersize Packets 0
RX Oversize Packets 0
RX Jabbers 0
RX 1519-1522 Good Vlan frms 0
RX MTU Exceed Counter 0
RX Control Frame Counter 0
RX Pause Frame Counter 0
RX Byte Counter 999614473
{primary:node0}
root@SRX5800-1>The output looks like standard port statistics from a switch. Looking in here will validate that packets are coming from the SPC. Since the SRX3000 has its control ports on the SFB, and there is nothing to configure for the control ports, there is little to look at on the interface. It is best to focus on the result from the
show chassis cluster control-plane statistics command.If checking the interfaces yields mixed results where they seem to be up but they are not passing traffic, it’s possible to reboot the node in the degraded state. The risk here is that the node may come up in split brain. Since that is a possibility, it’s best to disable its interfaces, or physically disable all of them except the control or data link. The ports can even be disabled on the switch they are connected to. This way, upon boot, if the node determines it is master it will not interrupt traffic. A correctly operating node using the minimal control port and fabric port configuration should be able to communicate to its peer. If, after a reboot, it still cannot communicate to the other node, it’s best to verify the configuration and cabling. Lastly, the box or cluster interfaces may be bad.
Learn more about this topic from Junos Security.
Junos® Security is the complete and authorized introduction to the new Juniper Networks SRX hardware series. This book not only provides a practical, hands-on field guide to deploying, configuring, and operating SRX, it also serves as a reference to help you prepare for any of the Junos Security Certification examinations offered by Juniper Networks. Network administrators and security professionals will learn how to use SRX Junos services gateways to address an array of enterprise data network requirements -- including IP routing, intrusion detection, attack mitigation, unified threat management, and WAN acceleration.

Help


