Rescuing a Headless Server with python -m http.server

This is the late night story of the most bizarre troubleshooting solution I've encountered.

The Problem

Late at night, I was doing some routing updates to my headless Media Server / NAS. Since there was a kernel update a reboot was in order. A couple of seconds later I try to reconnect to the server via ssh to verify all was well, and I get the dreaded message

ssh: connect to host XXX.XXX.XXX.XXX port 22: Connection refused

Well shit...

Headless no more

I think to myself, great... I should be getting to bed soon but something went sideways with the update. Thankfully the server itself lives in the guest bedroom, so I pickup a spare keyboard and monitor and drag it there. I find an unused HDMI cable and plug it in....

Nothing.

I think to myself, ok. Let's try to see if rebooting helps, so I hard reboot the server. Not even the UEFI boot logo, great.

At this point, I have no ssh access, no console access (that I can see) and a keyboard.

First I check the router and DHCP server, the server appears to have networking. ping appears to confirm this. But nmap shows all ports closed. There should be a lot of open ports with services on them.

I think, well worst case I can try to login to the server and issue commands blindly. Firstly, a clean shutdown command. I blindly type in my non-root credentials and issue sudo shutdown -h now, type my password and a couple of seconds later clean shutdown!

Success I thought, all isn't lost.

Driving Blind

So now we can issue some commands, but we're driving blind here.

Next I try to issue the eject command, to see some other kind of physical response to my commands. Do the same song and dance: Log in, issue a command, try to see some response.

It worked!

Now at this point I knew the following:

  • I have network access, but no SSH
  • I am able to issue commands as root thanks to sudo and my login credentials

So my immediate thought was, let's see if we have python!

A long time ago, I learned a neat trick about python. It has a built in HTTP Server you can use to serve your current directory. In the past it has been a handy tool for easily transferring files between VM's or computers in a local network.

My theory was, if I can start a python HTTP Server I might be able to get some kind of output from the machine. Progress!

Again, blindly typing in python -m http.server I check on another machine with nmap if any ports opened up, and to my surprise port 8000 is now open.

Success!

Not so blind after all

Upon opening my browser and pointing it at the machine I noticed the first strange thing, there were barely any files open. My normal user had a full home directory, this one is bare! Crap, was there a filesystem error? When was the last backup run!?!?

I took a step back and said, ok. How can I get more details out of here, let's run some commands and pipe their output to a file I can read via the HTTP Server.

So I did that, ran whoami, df -h and checked their output. I was root and df -h showed all partitions normally, except for a bind partition used as a sftp jail. (But I didn't notice it at first)

Coast is semi clear then, no drive failures or data loss so far!

Next up was dumping the system logs to see where we were.

Weird Log Errors

After dumping the last boot logs into a file and opening them up on my other computers I saw a weird error towards the end

XXXXXX XXXXXXXX systemd[474]: emergency.service: Executable /bin/plymouth missing, skipping: No such file or directory
-- Subject: Process /bin/plymouth could not be executed
-- Defined-By: systemd
-- Support: https://lists.freedesktop.org/mailman/listinfo/systemd-devel
--
-- The process /bin/plymouth could not be executed and failed.
--
-- The error number returned by this process is ERRNO.

plymouth? Don't know what that service is or does, but let's search for it. A quick search revealed a similar set of symptoms on Stack Exchange. For a different Linux distro but it was worth a shot!

At that moment I realized that I was missing an entry in the df -h output from before. The bind mount! I must have never restarted the server after setting that up!

A quick cat /etc/fstab > fstab.txt && python -m http.server confirms this. Now how to edit a file, blind. I don't trust my vim skills that much to live edit a file that important blindly.

A quick search leads me to a quick head -n -2 /etc/fstab > fstab-edit.txt. Another start of the HTTP Server confirms the edit worked, and a quick sudo mv fstab-edit.txt /etc/fstab gets the system back up and running.

And just in time before I lose all my sleep for the night!

Copyright © Andres Ruiz 2020