본문 바로가기

Ryu's Tech

스위치 에러 자동 탐색 및 자동 복구 Switch Errdisabled & auto Recovery

L2 스위치를 운영하던 중에 아래와 같은 잦은 링크 상태 변경으로 포트가 link-flap disabled 상태로 빠져버렸었습니다.


Oct 11 14:13:52.119 KST: %LINEPROTO-5-UPDOWN: Line protocol on Interface GigabitEthernet1/0/20, changed state to up
Oct 11 14:13:55.942 KST: %LINEPROTO-5-UPDOWN: Line protocol on Interface GigabitEthernet1/0/20, changed state to down
Oct 11 14:14:00.597 KST: %LINEPROTO-5-UPDOWN: Line protocol on Interface GigabitEthernet1/0/20, changed state to up
Oct 11 14:14:03.544 KST: %LINEPROTO-5-UPDOWN: Line protocol on Interface GigabitEthernet1/0/20, changed state to down
Oct 11 14:14:13.317 KST: %LINEPROTO-5-UPDOWN: Line protocol on Interface GigabitEthernet1/0/20, changed state to up
Oct 11 14:14:16.798 KST: %LINEPROTO-5-UPDOWN: Line protocol on Interface GigabitEthernet1/0/20, changed state to down
Oct 11 14:14:21.412 KST: %LINEPROTO-5-UPDOWN: Line protocol on Interface GigabitEthernet1/0/20, changed state to up
Oct 11 14:14:26.147 KST: %LINEPROTO-5-UPDOWN: Line protocol on Interface GigabitEthernet1/0/20, changed state to down
Oct 11 14:14:30.792 KST: %LINEPROTO-5-UPDOWN: Line protocol on Interface GigabitEthernet1/0/20, changed state to up
Oct 11 14:14:36.947 KST: %LINEPROTO-5-UPDOWN: Line protocol on Interface GigabitEthernet1/0/20, changed state to down
Oct 11 14:14:41.519 KST: %LINEPROTO-5-UPDOWN: Line protocol on Interface GigabitEthernet1/0/20, changed state to up
Oct 11 14:14:47.816 KST: %LINEPROTO-5-UPDOWN: Line protocol on Interface GigabitEthernet1/0/20, changed state to down
Oct 11 14:14:52.430 KST: %LINEPROTO-5-UPDOWN: Line protocol on Interface GigabitEthernet1/0/20, changed state to up
Oct 11 14:14:55.854 KST: %LINEPROTO-5-UPDOWN: Line protocol on Interface GigabitEthernet1/0/20, changed state to down
Oct 11 14:15:00.488 KST: %LINEPROTO-5-UPDOWN: Line protocol on Interface GigabitEthernet1/0/20, changed state to up
Oct 11 14:15:06.214 KST: %LINEPROTO-5-UPDOWN: Line protocol on Interface GigabitEthernet1/0/20, changed state to down
Oct 11 14:15:10.807 KST: %LINEPROTO-5-UPDOWN: Line protocol on Interface GigabitEthernet1/0/20, changed state to up
Oct 11 14:15:14.021 KST: %LINEPROTO-5-UPDOWN: Line protocol on Interface GigabitEthernet1/0/20, changed state to down
Oct 11 14:15:18.572 KST: %LINEPROTO-5-UPDOWN: Line protocol on Interface GigabitEthernet1/0/20, changed state to up
Oct 11 14:15:22.876 KST: %LINEPROTO-5-UPDOWN: Line protocol on Interface GigabitEthernet1/0/20, changed state to down
Oct 11 14:15:27.495 KST: %LINEPROTO-5-UPDOWN: Line protocol on Interface GigabitEthernet1/0/20, changed state to up
Oct 11 14:15:38.422 KST: %LINEPROTO-5-UPDOWN: Line protocol on Interface GigabitEthernet1/0/20, changed state to down
Oct 11 14:15:42.988 KST: %LINEPROTO-5-UPDOWN: Line protocol on Interface GigabitEthernet1/0/20, changed state to up
Oct 11 14:16:00.159 KST: %LINEPROTO-5-UPDOWN: Line protocol on Interface GigabitEthernet1/0/20, changed state to down
Oct 11 14:16:04.731 KST: %LINEPROTO-5-UPDOWN: Line protocol on Interface GigabitEthernet1/0/20, changed state to up
Oct 11 14:16:16.627 KST: %LINEPROTO-5-UPDOWN: Line protocol on Interface GigabitEthernet1/0/20, changed state to down
Oct 11 14:16:17.634 KST: %LINK-3-UPDOWN: Interface GigabitEthernet1/0/20, changed state to down
Oct 11 14:16:22.211 KST: %LINK-3-UPDOWN: Interface GigabitEthernet1/0/20, changed state to up
Oct 11 14:16:24.219 KST: %LINEPROTO-5-UPDOWN: Line protocol on Interface GigabitEthernet1/0/20, changed state to up
Oct 11 14:16:25.782 KST: %LINEPROTO-5-UPDOWN: Line protocol on Interface GigabitEthernet1/0/20, changed state to down
Oct 11 14:16:33.169 KST: %LINK-3-UPDOWN: Interface GigabitEthernet1/0/20, changed state to down
Oct 11 14:16:36.609 KST: %LINK-3-UPDOWN: Interface GigabitEthernet1/0/20, changed state to up
Oct 11 14:16:38.701 KST: %LINK-3-UPDOWN: Interface GigabitEthernet1/0/20, changed state to down
Oct 11 14:16:45.574 KST: %LINK-3-UPDOWN: Interface GigabitEthernet1/0/20, changed state to up
Oct 11 14:16:47.608 KST: %LINK-3-UPDOWN: Interface GigabitEthernet1/0/20, changed state to down
Oct 11 14:16:52.563 KST: %PM-4-ERR_DISABLE: link-flap error detected on Gi1/0/20, putting Gi1/0/20 in err-disable state

 

 

장비에서 위와 같은 link up-down이 지속적으로 발생하게 되면 스위치에서는 자동으로 이 포트를 link-flap err-diabled 상태로 down시켜버립니다.

 


Switch#show errdisable flap-values
ErrDisable Reason    Flaps    Time (sec)
-----------------    ------   ----------
pagp-flap              3       30
dtp-flap               3       30
link-flap              5       10

10초안에 5번의 포트 상태변경이 일어나게 되면 이와같이 down시켜버리게 되는데요.

 


Switch#show errdisable detect ?
  |  Output modifiers
  <cr>

Switch#show errdisable detect
ErrDisable Reason            Detection    Mode
-----------------            ---------    ----
arp-inspection               Enabled      port
bpduguard                    Enabled      port
channel-misconfig (STP)      Enabled      port
community-limit              Enabled      port
dhcp-rate-limit              Enabled      port
dtp-flap                     Enabled      port
gbic-invalid                 Enabled      port
inline-power                 Enabled      port
invalid-policy               Enabled      port
link-flap                    Enabled      port
loopback                     Enabled      port
lsgroup                      Enabled      port
mac-limit                    Enabled      port
pagp-flap                    Enabled      port
port-mode-failure            Enabled      port
pppoe-ia-rate-limit          Enabled      port
psecure-violation            Enabled      port/vlan
security-violation           Enabled      port
sfp-config-mismatch          Enabled      port
small-frame                  Enabled      port
storm-control                Enabled      port
udld                         Enabled      port
vmps                         Enabled      port
Switch#

 

이를 통해서 어떤 것들을 스위치에서 자체적으로 detect 할지를 알 수 있고 

 


Switch(config)#errdisable detect cause ?  
  all                  Enable error detection on all cases
  arp-inspection       Enable error detection for arp inspection
  bpduguard            Enable error detection on bpdu-guard
  dhcp-rate-limit      Enable error detection on dhcp-rate-limit
  dtp-flap             Enable error detection on dtp-flapping
  gbic-invalid         Enable error detection on gbic-invalid
  inline-power         Enable error detection for inline-power
  link-flap            Enable error detection on linkstate-flapping
  loopback             Enable error detection on loopback
  pagp-flap            Enable error detection on pagp-flapping
  pppoe-ia-rate-limit  Enable error detection on PPPoE IA rate-limit
  security-violation   Enable error detection on 802.1x-guard
  sfp-config-mismatch  Enable error detection on SFP config mismatch
  small-frame          Enable error detection on small_frame             
Switch(config)#no errdisable ?
  detect        Error disable detection
  flap-setting  Error disable flap detection setting
  recovery      Error disable recovery

Switch(config)#          

 

no command 를 통해 disable 시킬 수 도있습니다.

 

 

또한 이런식으로 err-disabled 된 포트는 포트에서 직접 shutdown / no shutdown 커맨드를 통해 다시 포트를 살려줄 수 있는데 문제가 그대로 있다면 당연히 그대로 err-disabled 상태로 넘어가게 되겠죠.

 

그리고 이러한 err-disabled 포트를 자동으로 recovery 할 수 있는 기능도 있습니다.

 


Switch(config)#do sh err recovery
ErrDisable Reason            Timer Status
-----------------            --------------
arp-inspection               Disabled
bpduguard                    Disabled
channel-misconfig (STP)      Disabled
dhcp-rate-limit              Disabled
dtp-flap                     Disabled
gbic-invalid                 Disabled
inline-power                 Disabled
link-flap                    Enabled
mac-limit                    Disabled
loopback                     Disabled
pagp-flap                    Disabled
port-mode-failure            Disabled
pppoe-ia-rate-limit          Disabled
psecure-violation            Disabled
security-violation           Disabled
sfp-config-mismatch          Disabled
small-frame                  Disabled
storm-control                Disabled
udld                         Disabled
vmps                         Disabled

Timer interval: 60 seconds

Interfaces that will be enabled at the next timeout:

 

 

이런식으로 각 상태에 따라서 자동으로 포트를 살려 줄 수 있습니다.

그리고 delay 는 30초에서 24시간까지 초단위로 설정이 가능하고 default-value 는 30 sec 이고 전부 disabled 상태입니다.

제 경우에는 link-flap만 enabled 시켜두었고 timer를 60초로 설정해 둔것입니다.

 

 


Switch(config)#errdisable recovery cause ?
  all                      Enable timer to recover from all error causes
  arp-inspection           Enable timer to recover from arp inspection error disable state
  bpduguard                Enable timer to recover from BPDU Guard error
  channel-misconfig (STP)  Enable timer to recover from channel misconfig error
  dhcp-rate-limit          Enable timer to recover from dhcp-rate-limit error
  dtp-flap                 Enable timer to recover from dtp-flap error
  gbic-invalid             Enable timer to recover from invalid GBIC error
  inline-power             Enable timer to recover from inline-power error
  link-flap                Enable timer to recover from link-flap error
  loopback                 Enable timer to recover from loopback error
  mac-limit                Enable timer to recover from mac limit disable state
  pagp-flap                Enable timer to recover from pagp-flap error
  port-mode-failure        Enable timer to recover from port mode change failure
  pppoe-ia-rate-limit      Enable timer to recover from PPPoE IA rate-limit error
  psecure-violation        Enable timer to recover from psecure violation error
  security-violation       Enable timer to recover from 802.1x violation error
  sfp-config-mismatch      Enable timer to recover from SFP config mismatch error
  small-frame              Enable timer to recover from small frame error
  storm-control            Enable timer to recover from storm-control error
  udld                     Enable timer to recover from udld error
  vmps                     Enable timer to recover from vmps shutdown error

Switch(config)#errdisable recovery cause

 


Switch(config)#errdisable recovery interval ?
  <30-86400>  timer-interval(sec)

Switch(config)#errdisable recovery interval

 

이런 커맨드를 통해서 recovery 를 설정할 수있습니다.

 

혹은 recovery가 아니더라도 link-flap에 대한 임계치를 설정할 수 도 있습니다.

 


WR_SS_5F-B_C2960_2(config)#errdisable ?                
  detect        Error disable detection
  flap-setting  Error disable flap detection setting
  recovery      Error disable recovery
WR_SS_5F-B_C2960_2(config)#errdisable flap-setting ?
  cause  Set error disable flap parameters for application

WR_SS_5F-B_C2960_2(config)#errdisable flap-setting cause
WR_SS_5F-B_C2960_2(config)#errdisable flap-setting cause ?
  dtp-flap   Set the variables related to detection of dtp flaps
  link-flap  Set the variables related to detection of link flaps
  pagp-flap  Set the variables related to detection of pagp flaps

WR_SS_5F-B_C2960_2(config)#errdisable flap-setting cause link
WR_SS_5F-B_C2960_2(config)#errdisable flap-setting cause link-flap ?
  max-flaps  maximum flaps allowed before setting to errdisable

WR_SS_5F-B_C2960_2(config)#errdisable flap-setting cause link-flap max
WR_SS_5F-B_C2960_2(config)#errdisable flap-setting cause link-flap max-flaps ?
  <1-100>  flap count

WR_SS_5F-B_C2960_2(config)#errdisable flap-setting cause link-flap max-flaps 10 time ?
  <1-120>  flap count time

 

WR_SS_5F-B_C2960_2(config)#errdisable flap-setting cause link-flap max-flaps 10 time 100

 

이렇게 설정하면 10초에 100번의 flap에 대해서 err-disabled 상태로 빼겠다는 겁니다.

 

 

 

-------- 깜빡하고 캡쳐 안한 부분이 있는데 show interface status err 커맨드로 err-disabled 된 포트를 확인할 수 있습니다.

그냥 show ip int brief를 보면 down 이라고만 표시되어 err-disabled 상태인지 확인하기 어려우므로 show int status 를 통해 확인하는 것이 더 좋습니다.