Data Center Issues
Report 20:32 GMT: We appear to be having a strange network issue in the Los Angeles Datacenter. We are working on it and will update as soon as we know something.
Report 22:42 GMT: Data center fully functional.
Report 23:13 GMT: At approximately 12:55PM PST (GMT -8:00) we noticed a routing abnormality. One of our Datacenter floors was fully operational while the other was partially inaccessible. The one that was having issues (11th floor space), many clients could connect just fine, while others, including myself could not. We immediately had people working on it to try to identify the issue. 15 Minutes later we posted on the forum as we saw this issue being a very widespread one since more and more tickets were coming in. We monitor all parts of our network both internally and externally, as well as our servers, so we know when things happen, etc..
We then began a two pronged approach to determine what the issue was. We were looking into network changes (IE config changes on the switches) as well as any possible hardware problems. At 1:30PM we determined this issue to be a hardware issue. We felt that a distribution switch (one that feeds the switches customers are connected to) was dying. Rich was there and I asked him to run a battery of tests. After he ran the tests which included consoling into the distribution switches, we determined that that that switch was operating correctly, and began checking for any code changes. At 2:15PM we grabbed a standby distribution switch (which we have for these cases) . We were then checking the code and routing tables of the distribution switches and the core network switches.
At 3:00PM, Ryan (our main network guru) logged into our core switches and determined that the hardware routing table was full, so it couldn't install the 11th floor routes into its memory, including the arp routes. He then filtered out all routes and 5 minutes later everything came back online. Once that was done, we waited 5 more minutes and then did a reboot of the core network switch and implemented a table limit of 239k route limit installed to prevent the same issue from ever happening again.
Note: The outage is not reflected in the Alertra report because not all access to the data center was cut off. Alertra's server was able to access our server during the episode which means that not all the web visitors were cut off.
captainccs
April 13, 2007
Websites
Bahia Redonda Marina Intl.Bahia Redonda en español
BMW Method
cardumen.com
Software Times™
Internet
The Internet Health ReportBlog Index
December 31, 2013
November 30, 2013
Keigla Boat's Services
Server Outage Report
Server Outage Report
December 31, 2012
September 30, 2012
July 31, 2012
June 30, 2012
May 31, 2012
April 30, 2012
March 31, 2012
February 29, 2012
January 31, 2012
December 31, 2011
November 30, 2011
October 31, 2011
September 30, 2011
August 31, 2011
July 31, 2011
June 30, 2011
May 31, 2011
April 30, 2011
March 31, 2011
February 28, 2011
January 31, 2011
December 31, 2010
November 30, 2010
October 31, 2010
September 30, 2010
August 31, 2010
July 31, 2010
June 30, 2010
May 31, 2010
April 30, 2010
March 31, 2010
February 28, 2010
January 31, 2010
December 31, 2009
November 30, 2009
October 31, 2009
September 30, 2009
August 31, 2009
July 31, 2009
June 30, 2009
May 31, 2009
April 30, 2009
March 31, 2009
February 28, 2009
MySQL 5.0.67
January 31, 2009
Upgrade to MySQL 5.1.X
December 31, 2008
Novembre 30, 2008
Main Server PHP 5 upgrade
PHP 5 upgrade
October 31, 2008
PHP 5 upgrade
September 30, 2008
August 31, 2008
Data Center Move
July 31, 2008
MySQL Malfunction
June 30, 2008
¡Nos Mudamos!
May 31, 2008
April 30, 2008
Mail server down
March 31, 2008
February 29, 2008
January 31, 2008
Double Bandwith Gift
December 31, 2007
November 30, 2007
October 31, 2007
September 30, 2007
August 31, 2007
July 31, 2007
June 30, 2007
cPanel upgrade
May 31, 2007
April 30, 2007
Blogger replaced
Data Center Issues
March 31, 2007
February 28, 2007
January 31, 2007
December 31, 2006
November 30, 2006
October 31, 2006
September 30, 2006
August 31, 2006
Servidor de Emergencia
August 2, 2006
Home