π

Recovering Corrupted Files in Btrfs

Show Sidebar

I have not published posts in the last two weeks, despite having had quite a few queued up. The reason was a very unexpected data loss, robbing me a lot of energy and hours of time. Thankfully I was able to recover most of my work, and here is how.

The Incident

As I opened my laptop one Friday afternoon and wanted to check the git repo with my notes, it reported an odd error of corrupted objects. I inspected the situation and soon found that some files in the git object store and some files in the worktree of the repository suddenly reported a size of zero. I was bewildered and soon aghast, realizing that these were the exact files that had uncommitted changes.

I then realized that just before I had accidentally crashed my computer by unplugging an external monitor. It suspect git was still doing something in the repo as I pushed right before unplugging and had the files opened. And apparently btrfs did not act as a failsafe here, allowing an incomplete write through.

Thus the git repository was unusable, I had not enabled btrfs snapshots yet, the last full backup was a few days ago as I was on the go, and Syncthing was out of sync - my home computer left behind, my phone suffering from a longtime issue in syncthing-android.

Discovering Tools

First I tried undelete.sh, a script for restoring old file versions via btrfs. Since btrfs is a copy-on-write filesystem, I expected to easily be able to restore the previous file versions. After letting it search to the deepest level on multiple files, only very few included useful content - most were still of size zero, others completely off, and a few an old version.

For one less frequented file, I was able to restore data from previous syncthing conflict files. One new option I did not investigate further after an initial glance are the write backups of logseq, which I now use on my phone.

Since neither testdisk nor extundelete are compatible with btrfs despite a few threads suggesting otherwise, I ended up mainly with photorec. I scanned the decrypted partition special file restricted to all text file signatures since it has no inbuilt signature for org-mode, which is the relevant lost filetype. The issue with photorec of course is that it produces random filenames, so you have to work out the original names.

After an initial discovery process with grep I ran rmlint on the folder, since photorec frequently restored multiple identical versions. Then I built a little pipeline to find file candidates:

cat corrupted.txt | while read file
 do title="$(grep -iE --no-filename '#+title: ' $file)"
  echo $file
  if test -z "$title"
  then echo $file >>leftover.txt
  else mvconfl $(command grep --binary-files=without-match --recursive "^$title" -3 /mnt/backup/btrfs/tests/photorec-merge -l)
  "$file"
  fi
 done	  

I will not explain that here unless prompted for, but maybe it helps somebody nonetheless. mvconfl is a helper script I wrote along the way to easily handle the files with my synct-diff script. It can be found in my dotfiles. Lastly I needed a way to quickly compare multiple files, which emacs unfortunately does not support, but I found good old vimdiff supports comparing up to 8 files, an incredibly useful feat.

A few files were not discovered by photorec anymore, but I stumbled upon a tool called strings which finds readable ASCII sequences on disk. I let it run on my 2 TB drive, and it produced a 30 GB textfile:

 sudo strings -n 9 --include-all-whitespace /dev/mapper/luks-7aabd714-bb5a-4035-99bb-0090dab92fc4 >strings-full.txt	  

Within this file I had to do some more work to extract the useful fragments, but in the end, after hours of work and lessons learned, I was able to restore my data using these tools!

Now I can finally bombard you with my ramblings again :)

Lessons

Comment via email (persistent) or via Disqus (ephemeral) comments below: