Linux Ext2fs Undeletion mini-HOWTO: Recovering data blocks

10. Recovering data blocks

This part is either very easy or distinctly less so, depending on whether the file you are trying to recover is more than 12 blocks long.

If the file was no more than 12 blocks long, then the block numbers of all its data are stored in the inode: you can read them directly out of the stat output for the inode. Moreover, debugfs has a command which performs this task automatically. To take the example we had before, repeated here:


debugfs:  stat <148003>
Inode: 148003   Type: regular    Mode:  0644   Flags: 0x0   Version: 1
User:   503   Group:   100   Size: 6065
File ACL: 0    Directory ACL: 0
Links: 0   Blockcount: 12
Fragment:  Address: 0    Number: 0    Size: 0
ctime: 0x31a9a574 -- Mon May 27 13:52:04 1996
atime: 0x31a21dd1 -- Tue May 21 20:47:29 1996
mtime: 0x313bf4d7 -- Tue Mar  5 08:01:27 1996
dtime: 0x31a9a574 -- Mon May 27 13:52:04 1996
BLOCKS:
594810 594811 594814 594815 594816 594817 
TOTAL: 6

This file has six blocks. Since this is less than the limit of 12, we get debugfs to write the file into a new location, such as /mnt/recovered.000:


debugfs:  dump <148003> /mnt/recovered.000

Of course, this can also be done with fsgrab; I'll present it here as an example of using it:


# fsgrab -c 2 -s 594810 /dev/hda5 > /mnt/recovered.000
# fsgrab -c 4 -s 594814 /dev/hda5 >> /mnt/recovered.000

With either debugfs or fsgrab, there will be some garbage at the end of /mnt/recovered.000, but that's fairly unimportant. If you want to get rid of it, the simplest method is to take the Size field from the inode, and plug it into the bs option in a dd command line:


# dd count=1 if=/mnt/recovered.000 of=/mnt/resized.000 bs=6065

Of course, it is possible that one or more of the blocks that made up your file has been overwritten. If so, then you're out of luck: that block is gone forever. (But just imagine if you'd unmounted sooner!)

10.2 Longer files

The problems appear when the file has more than 12 data blocks. It pays here to know a little of how UNIX filesystems are structured. The file's data is stored in units called `blocks'. These blocks may be numbered sequentially. A file also has an `inode', which is the place where information such as owner, permissions, and type are kept. Like blocks, inodes are numbered sequentially, although they have a different sequence. A directory entry consists of the name of the file and an inode number.

But with this state of affairs, it is still impossible for the kernel to find the data corresponding to a directory entry. So the inode also stores the location of the file's data blocks, as follows:

The block numbers of the first 12 data blocks are stored directly in the inode; these are sometimes referred to as the direct blocks.
The inode contains the block number of an indirect block. An indirect block contains the block numbers of 256 additional data blocks.
The inode contains the block number of a doubly indirect block. A doubly indirect block contains the block numbers of 256 additional indirect blocks.
The inode contains the block number of a triply indirect block. A triply indirect block contains the block numbers of 256 additional doubly indirect blocks.

Read that again: I know it's complex, but it's also important.

Now, the current kernel implementation (certainly for all versions up to and including 2.0.30) unfortunately zeroes all indirect blocks (and doubly indirect blocks, and so on) when deleting a file. So if your file was longer than 12 blocks, you have no guarantee of being able to find even the numbers of all the blocks you need, let alone their contents.

The only method I have been able to find thus far is to assume that the file was not fragmented: if it was, then you're in trouble. Assuming that the file was not fragmented, there are several layouts of data blocks, according to how many data blocks the file used:

0 to 12

The block numbers are stored in the inode, as described above.

13 to 268

After the direct blocks, count one for the indirect block, and then there are 256 data blocks.

269 to 65804

As before, there are 12 direct blocks, a (useless) indirect block, and 256 blocks. These are followed by one (useless) doubly indirect block, and 256 repetitions of one (useless) indirect block and 256 data blocks.

65805 or more

The layout of the first 65804 blocks is as above. Then follow one (useless) triply indirect block and 256 repetitions of a `doubly indirect sequence'. Each doubly indirect sequence consists of a (useless) doubly indirect block, followed by 256 repetitions of one (useless) indirect block and 256 data blocks.

Of course, even if these assumed data block numbers are correct, there is no guarantee that the data in them is intact. In addition, the longer the file was, the less chance there is that it was written to the filesystem without appreciable fragmentation (except in special circumstances).

You should note that I assume throughout that your blocksize is 1024 bytes, as this is the standard value. If your blocks are bigger, some of the numbers above will change. Specifically: since each block number is 4 bytes long, blocksize/4 is the number of block numbers that can be stored in each indirect block. So every time the number 256 appears in the discussion above, replace it with blocksize/4. The `number of blocks required' boundaries will also have to be changed.

Let's look at an example of recovering a longer file.


debugfs:  stat <1387>
Inode: 148004   Type: regular    Mode:  0644   Flags: 0x0   Version: 1
User:   503   Group:   100   Size: 1851347
File ACL: 0    Directory ACL: 0
Links: 0   Blockcount: 3616
Fragment:  Address: 0    Number: 0    Size: 0
ctime: 0x31a9a574 -- Mon May 27 13:52:04 1996
atime: 0x31a21dd1 -- Tue May 21 20:47:29 1996
mtime: 0x313bf4d7 -- Tue Mar  5 08:01:27 1996
dtime: 0x31a9a574 -- Mon May 27 13:52:04 1996
BLOCKS:
8314 8315 8316 8317 8318 8319 8320 8321 8322 8323 8324 8325 8326 8583
TOTAL: 14

There seems to be a reasonable chance that this file is not fragmented: certainly, the first 12 blocks listed in the inode (which are all data blocks) are contiguous. So, we can start by retrieving those blocks:


# fsgrab -c 12 -s 8314 /dev/hda5 > /mnt/recovered.001

Now, the next block listed in the inode, 8326, is an indirect block, which we can ignore. But we trust that it will be followed by 256 data blocks (numbers 8327 through 8582).


# fsgrab -c 256 -s 8327 /dev/hda5 >> /mnt/recovered.001

The final block listed in the inode is 8583. Note that we're still looking good in terms of the file being contiguous: the last data block we wrote out was number 8582, which is 8327 + 255. This block 8583 is a doubly indirect block, which we can ignore. It is followed by up to 256 repetitions of an indirect block (which is ignored) followed by 256 data blocks. So doing the arithmetic quickly, we issue the following commands. Notice that we skip the doubly indirect block 8583, and the indirect block 8584 immediately (we hope) following it, and start at block 8585 for data.


# fsgrab -c 256 -s 8585 /dev/hda5 >> /mnt/recovered.001
# fsgrab -c 256 -s 8842 /dev/hda5 >> /mnt/recovered.001
# fsgrab -c 256 -s 9099 /dev/hda5 >> /mnt/recovered.001
# fsgrab -c 256 -s 9356 /dev/hda5 >> /mnt/recovered.001
# fsgrab -c 256 -s 9613 /dev/hda5 >> /mnt/recovered.001
# fsgrab -c 256 -s 9870 /dev/hda5 >> /mnt/recovered.001

Adding up, we see that so far we've written 12 + (7 * 256) blocks, which is 1804. The `stat' results for the inode gave us a `blockcount' of 3616; unfortunately these blocks are 512 bytes long (as a hangover from UNIX), so we really want 3616/2 = 1808 blocks of 1024 bytes. That means we need only four more blocks. The last data block written was number 10125. As we've been doing so far, we skip an indirect block (number 10126); we can then write those last four blocks.


# fsgrab -c 4 -s 10127 /dev/hda5 >> /mnt/recovered.001

Now, with some luck the entire file has been recovered successfully.

10. Recovering data blocks

10.1 Short files

10.2 Longer files