vardhan.vishnu: how to inspect the cause of automatic rebooting of debian gnu/linuxlenny server

source : debian-user-digest # 1380,1382.

[ A ] : You have several options :

[ 1 ] : attach a serial console, configure your kernel to send console messages to it - somewhat of a pain, but when the kernel panics, some things ONLY go to the console ( like a stack backtrace ) -- there's a serial console howto on tldp.org that's a good start -- note that you'll need a console that saves stuff ( e.g., a PC running a terminal program ) -- for my servers, I purchased a rather nice box from Lantronix that turns a serial port into something you can ssh into via the net ( server - serial port - Lantronix serial-to-ethernet-box - the net - ssh terminal program )

note : getting this all to work gets a little tricky, be prepared for a night in the server room, and be sure to have a rescue disk - getting the right incantations into /boot/grub/menu.lst can be tricky ( moreso in my case, as I'm running Xen, which adds another level of complexity and configuration to the boot process )

you can set your machine to reboot, or not ( see 2., below ) - either way, you'll capture something on the serial terminal

[ 2 ] : set your machine to NOT reboot on crash -- that way a crash will leave you at a point where you might be able to nose around system state with a debugger ( note : you'll need an attached console, preferably a serial terminal for this )

take a look in /etc/sysctl.conf for lines like
[sourcecode language="bash" gutter="false" autolinks="true"]
kernel.panic = 20
kernel.panic_on_oops = 1
[/sourcecode]

or nose around in /proc/sys/kernel ( do a google on /proc/sys/kernel for details )

see http://www.pc-freak.net/blog/how-to-automatically-reboot-restart-debian-gnu-lenny-linux-on-kernel-panic-some-general-cpu-overload-or-system-crash-2/ for some good background on how this all works

[ 3 ] : install a kernel that supports crash dumps and a crash dump utility - note that this is a bit of a pain in Linux, less so in BSD and Solaris based systems - with this approach you can set your machine to save a crash dump, then reboot - you can analyze the dump at your leisure -- sort of important if people depend on your server being up

[ B ] : Besides software issues there are also hardware glitches that can lead in a system reboot. Computer overheating ( beeping included ), bad ram modules or a faulty power supply can make that "black magic" to happen. Can you get a pattern for the restart ( it happens always at the same time, while performing concrete tasks... ) or it happens randomly ? Maybe logs would tell...

This work by maniac.vardhan is licensed under a Creative Commons Attribution-NonCommercial-NoDerivs 3.0 Unported License.

vardhan.vishnu

how to inspect the cause of automatic rebooting of debian gnu/linuxlenny server

0 comments :: how to inspect the cause of automatic rebooting of debian gnu/linuxlenny server

Post a Comment

Blog Archive

Pages