ORACLE board: June 2013

07 June 2013

How to find which process caused cluster node reboot

Determining Which Process Caused Reboot
Log File Locations for Processes Causing Reboots.
• oclskd
– /log//client/oclskd.log
• ocssd
– /var/log/messages
– /log//cssd/ocssd.log
• cssdagent
– /log//agent/ohasd/oracss
dagent_root
• cssdmonitor
– /log//agent/ohasd/oracss
dmonitor_root
• hangcheck-timer
– /var/log/messages

Determining Which Process Caused Reboot
First, determine the time of the node reboot by using the uptime command and subtracting the
up time from the current system time. The reboot time will be used when examining log files.
When the OCSSD daemon is responsible for rebooting a node, a message similar to “Oracle
CSSD failure. Rebooting for cluster integrity” is written into the system messages log at
/var/log/messages. The cssd daemon log file that is located at
/log//cssd/ocssd.log may also contain messages similar to
"Begin Dump" or "End Dump" just before the reboot.
If hangcheck-timer is being used, it will provide message logging to the system messages log
when a node restart is initiated by the module. To verify whether this process was responsible
for the node reboot, examine the /var/log/messages file and look for an error message
similar to: "Hangcheck: hangcheck is restarting the machine."
Other useful log files include the Clusterware alert log in /log/
and the lastgasp log in /etc/oracle/lastgasp or /var/opt/oracle/lastgasp.
If no indication of which process caused the reboot can be determined from these files,
additional debugging and tracing may need to be enabled.
Note: The oclsomon and the oprocd background processes have been eliminated in Oracle
Database 11g Release 2.

Standby redologs

Quote from documentation:

The standby redo logs are populated with redo information as fast as the primary redo logs, rather than waiting for the redo log to be archived and shipped to the standby database. This means that the standby redo log has more current information than the log apply mechanism because it took a "shortcut" and was written to the standby, bypassing the traditional archiving and FTP to the standby database.