<h2>Data Center Issues </h2> | cardumen.com server status

Data Center Issues

Report 20:32 GMT: We appear to be having a strange network issue in the Los Angeles Datacenter. We are working on it and will update as soon as we know something.

Report 22:42 GMT: Data center fully functional.

Report 23:13 GMT: At approximately 12:55PM PST (GMT -8:00) we noticed a routing abnormality. One of our Datacenter floors was fully operational while the other was partially inaccessible. The one that was having issues (11th floor space), many clients could connect just fine, while others, including myself could not. We immediately had people working on it to try to identify the issue. 15 Minutes later we posted on the forum as we saw this issue being a very widespread one since more and more tickets were coming in. We monitor all parts of our network both internally and externally, as well as our servers, so we know when things happen, etc..

We then began a two pronged approach to determine what the issue was. We were looking into network changes (IE config changes on the switches) as well as any possible hardware problems. At 1:30PM we determined this issue to be a hardware issue. We felt that a distribution switch (one that feeds the switches customers are connected to) was dying. Rich was there and I asked him to run a battery of tests. After he ran the tests which included consoling into the distribution switches, we determined that that that switch was operating correctly, and began checking for any code changes. At 2:15PM we grabbed a standby distribution switch (which we have for these cases) . We were then checking the code and routing tables of the distribution switches and the core network switches.

At 3:00PM, Ryan (our main network guru) logged into our core switches and determined that the hardware routing table was full, so it couldn't install the 11th floor routes into its memory, including the arp routes. He then filtered out all routes and 5 minutes later everything came back online. Once that was done, we waited 5 more minutes and then did a reboot of the core network switch and implemented a table limit of 239k route limit installed to prevent the same issue from ever happening again.

Note: The outage is not reflected in the Alertra report because not all access to the data center was cut off. Alertra's server was able to access our server during the episode which means that not all the web visitors were cut off.

captainccs

April 13, 2007

Websites

Bahia Redonda Marina Intl.
Bahia Redonda en español
BMW Method
cardumen.com
Software Times™

Internet

The Internet Health Report

cardumen.info

Data Center Issues

captainccs

Websites

Internet

Blog Index

December 31, 2013

November 30, 2013

Keigla Boat's Services

Server Outage Report

Server Outage Report

December 31, 2012

September 30, 2012

July 31, 2012

June 30, 2012

May 31, 2012

April 30, 2012

March 31, 2012

February 29, 2012

January 31, 2012

December 31, 2011

November 30, 2011

October 31, 2011

September 30, 2011

August 31, 2011

July 31, 2011

June 30, 2011

May 31, 2011

April 30, 2011

March 31, 2011

February 28, 2011

January 31, 2011

December 31, 2010

November 30, 2010

October 31, 2010

September 30, 2010

August 31, 2010

July 31, 2010

June 30, 2010

May 31, 2010

April 30, 2010

March 31, 2010

February 28, 2010

January 31, 2010

December 31, 2009

November 30, 2009

October 31, 2009

September 30, 2009

August 31, 2009

July 31, 2009

June 30, 2009

May 31, 2009

April 30, 2009

March 31, 2009

February 28, 2009

MySQL 5.0.67

January 31, 2009

Upgrade to MySQL 5.1.X

December 31, 2008

Novembre 30, 2008

Main Server PHP 5 upgrade

PHP 5 upgrade

October 31, 2008

PHP 5 upgrade

September 30, 2008

August 31, 2008

Data Center Move

July 31, 2008

MySQL Malfunction

June 30, 2008

¡Nos Mudamos!

May 31, 2008

April 30, 2008

Mail server down

March 31, 2008

February 29, 2008

January 31, 2008

Double Bandwith Gift

December 31, 2007

November 30, 2007