Last Sunday’s extended service outage… Logging!

I promised to blog about last Sunday’s extended service outage and what caused it. First, here’s what you KNOW not to do, right?

1. Do not make a configuration change to a production service without first doing it in a development or preproduction environment. No matter HOW small. I mean it.

2. Never EVER make more than one change at a time. One line. One setting. One clustered application server. ONE. Only ONE. (Okay…CAVEAT: Unless you do it first in an identical non-production environment and document exactly what you did and in what order so that you can do exactly that way in production).

So, breaking these two rules (because I didn’t do EXACTLY this in a non-production environment first), here’s what I did:

1. I swapped out a config.xml file correcting an SSL reference and removing double references to one of our Nodes. No big deal by itself.

2. I swapped out the original log4j.properties file written in SP3 with my edited version, which changed the logging for the webct.log from size-based to time based. Here’s the log4j.properties snippet:

# A2 is set to be a file appender
# Original line commented out:
log4j.appender.A2=org.apache.log4j.RollingFileAppender
log4j.appender.A2=org.apache.log4j.DailyRollingFileAppender
log4j.appender.A2.File=logs/webct.log
# Next line rolls the log at midnight and noon everyday
log4j.appender.R.DatePattern=’.’yyyy-MM-dd-a
# log4j.appender.A2.MaxFileSize=5MB
log4j.appender.A2.MaxBackupIndex=20
log4j.appender.A2.Append=true
#log4j.appender.A2.layout=org.apache.log4j.PatternLayout
#log4j.appender.A2.layout.ConversionPattern=%d %-5p [%t] [%3x] %-17c{2} –
%m\r\n
log4j.appender.A2.layout=org.apache.log4j.xml.XMLLayout

3. I also changed the webserver.log and WebCTManagedNodeN.log files in the WeblogicConsole from “BySize” to “ByTime” based rotation.

Results:

The cluster won’t start, but will throw an error about this line:
log4j.appender.A2.MaxBackupIndex=20   …so, don’t do that. (And Randal Dalhoff had hinted at this in an email to me earlier in the week!)

And, BEA 9.2 has a known issue with #3. (CR287029), the HTTP logging part (webserver.log) which has to do with the fact that the extensions field (under Advanced options) as implemented with a custom HTTP log file (.jar, ie, the Bb Vista app) is used, and when logging is changed through the UI, blanks out the extensions field causing java null pointer exceptions, which also will prevent your Nodes from starting.  …so, don’t do that either.

Thanks to Joel Diamant-Helpern of Bb support for finding the BEA bug here.

If you do want to change your webserver.log and your WebCTManagedNodeN.log to rotate based on time rather than size, OPEN the Advanced options for HTTP and insert your ELF fields there.

These would be supported by the .jar:

date time time-taken c-ip x-weblogic.servlet.logging.ELFWebCTSession sc-status cs-method cs-uri-stem cs-uri-query bytes x-weblogic.servlet.logging.ELFWebCTExtras

Advertisements

One response to “Last Sunday’s extended service outage… Logging!

  1. Wow… Thanks for #3. Our logs are read by AWStats. Apparently, the rolling by size creates a problem that rolling by date would help solve. Not being able to start up the nodes would be really bad.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s