Sunday, December 02, 2007

Recently, my 30 year old fridge decided it had had enough, and it promptly died.

In the process, (I think the motor bearings ceased), it overloaded the circuit, and caused the circuit breaker to trip.

Unfortunately, the timing of this caused the power to go out just as a disk write was occurring on one of the disks in my media server, which caused it to get reasonably bad corruption, rendering it unmountable.

I didn't find any of this out until about 10.15, when I woke up, late for work, because my alarm clock hadn't gone off.

During boot, I would get:

[17179594.952000] hdd: dma_intr: status=0x51 { DriveReady SeekComplete Error }
[17179594.952000] hdd: dma_intr: error=0x40 { UncorrectableError }, LBAsect=4349, high=0, low=4349, sector=4343
[17179594.952000] ide: failed opcode was: unknown
[17179594.952000] end_request: I/O error, dev hdd, sector 4343
[17179594.952000] JBD: IO error reading journal superblock
[17179594.952000] EXT3-fs: error loading journal.

I added a "spare" disk I had hanging around (another disk that's on its way out, and I need to send back, since it was the only thing big enough), and attempted to use dd-rescue to get a copy of the contents of the drive.

I figured I could then muck around, and attempt to e2fsck that file, or loop mount it or something. This wasn't very successful, the first 50mb of the drive looks toasted, which took about an hour to read through, and then the rest of the drive was reading at about 1mb/sec, so after 4 1/2 hours or so, I had an 11gb file, which I couldn't do anything with.

I'd previously used Stellar Phoenix for recovering FAT/NTFS partitions in a similar state (since it looks like it was mainly the superblock/journal that was stuffed, and not the data), I found they had a Linux tool, but ironically, it runs on Windows.

I had a disk configured for doing recovery a while ago, but I couldn't find it. I had to get another spare disk (a 120gb I used to use in my tivo), wiped it, and installed windows 98 on it, and the Stellar Phoenix Linux program.

I gave up on the dd_rescue, and moved the drive to the windows machine. I ran Stellar across it, it immediately found the drive, logical drive, and listed stacks of files. This was looking somewhat promising.

The eval version of the software doesn't allow you to recover any files though, so I found an "alternate" version of the software, that does, and I rescanned the disk, and was able to recover a handful of files.

Most of them were corrupt, and when I checked the file listing in Stellar, files were either 0 bytes, or massively huge (a 6gb text file?). This wasn't working.

I rescanned the disk, doing an advanced scan, which I was hoping would find the alternate superblocks, and try reading the file listing from one of them, but after 42 hours, it hadn't finished, and I was fed up with waiting, since I didn't think it would make any difference to the file listing.

In the meantime, I had gone and bought some new hard drives. Plan B was to rebuild the data that was on the disk, from a backup from 3 months ago, and the contents of my ipod, but that would be a real pain, and it would require undeleting some data I moved from another disk to the corrupted one recently.

I had done some googling, and found processes to recover disks using debugfs etc.

So I gave up on windows, and moved the disk back to the linux machine.

Just trying to mount it would get me:

[17180880.456000] VFS: Can't find ext3 filesystem on dev hdd1.

What I ended up doing, was using dumpe2fs on the corrupted partition, to get a list of the alternate superblocks, this worked:

Backup superblock at 32768, Group descriptors at 32769-32783
Backup superblock at 98304, Group descriptors at 98305-98319
Backup superblock at 163840, Group descriptors at 163841-163855

(etc, with 11 more).

I then had to pick out a backup superblock, convert it from the 1k block partition to 4k, since the partition has 4k blocks on it (this is just multiplying it by 4).

819200 x 4 = 3276800

I then pass an option to mount, to use the alternate superblock. The first few times I tried to do this, it looked like it was trying, but then came back telling me things like the magic didn't match, or it was invalid etc.

[17180969.348000] EXT3-fs: Magic mismatch, very weird !

When I found a non corrupt superblock backup, checking dmesg, I saw it was trying to load the journal, which was corrupt:

[17181822.756000] hdd: dma_intr: status=0x51 { DriveReady SeekComplete Error }
[17181822.756000] hdd: dma_intr: error=0x40 { UncorrectableError }, LBAsect=4349, high=0, low=4349, sector=4343
[17181822.756000] ide: failed opcode was: unknown
[17181822.756000] end_request: I/O error, dev hdd, sector 4343
[17181822.756000] JBD: IO error reading journal superblock
[17181822.756000] EXT3-fs: error loading journal.

I tried passing "noload" to skip loading the journal, but that didn't work.

I then tried forcing it to mount as an ext2 partition, and bang, it mounted.

mount -v -t ext2 /dev/hdd1 /disks/mp3 -o sb=3276800

dmesg says:

[17179910.436000] EXT2-fs warning (device hdd1): ext2_fill_super: mounting ext3 filesystem as ext2
[17179910.436000] EXT2-fs warning: mounting unchecked fs, running e2fsck is recommended

I then used rsync to copy selected directories from the corrupted disk to a new disk I had mounted on the machine.

And I'll finish off with a rant, yes, I should backup properly, yes, I should get a UPS (again, the last one caught on fire), but.. CRAPPY SEAGATE DRIVE AGAIN.

Update: Just for fun, I decided to see if I'd get anywhere using dd_rescue, and e2fscking the image on a different drive.

I found and used dd_rhelp, instead of dd_rescue directly, since it works a bit more to my liking.. when dd_rescue hits a hard bit, it skips it, keeps going, and at the end, comes back to have another go at the dodgy reading bits.

This got me about 99.99% of the drive in an image. I ran e2fsck on this, and it said the journal was corrupt, so I trashed it, converting the drive to ext2, it then "fixed" the problems, and recreated the journal, converting back to ext3.

I loop mounted the fixed image, and found that while everything was there, it was all under lost+found, and would have been a bit of a pain to work out what it was, move it out, and rename it, but could have done it, if the alternate superblock mounting hadn't worked.

There is another method again, I found here, using debugfs, but I didn't try it.

Monday, October 08, 2007

Here's how to almost workingly build wolfgl, on Ubuntu Feisty Fawn.

I had been playing wolf3d under dosbox, but the performance of it was a little bit annoying, pausing every now and then, and music stuttering, which fiddling with the cpu cycles didn't really help with, I decided to find a native port.

I found wolfgl, however it hasn't been compiled for years, and there was no source available. I grabbed the binaries I could find, and tried running it, but could only get LIBC errors.

"version GLIBC_2.0 not defined in file with link time reference"

I found this page, where it says you can run it like "LD_ASSUME_KERNEL=2.4 ./wolfgl". That didn't work for me either.

A bit later, I found this page, a gentoo build page.. hmm, how can there be a gentoo build page if there's no source to this? also, it was last updated only a few months ago, that's more promising than the sourceforge page, which hasn't been updated in a few years, and the homepage, not touched since 1999.

I found the port page here, and downloaded the ebuild file.

While trying to work out how ebuilds work, ie where I could get the source from, that gentoo would, I found a forum thread, saying there were working debian binaries here.

I tried downloading and installing from there, however I needed libx or something installed, and there's no such package anymore. I installed the replacement packages, but it wasn't happy with that.

I suppose I could have tried just extracting the binaries from the debs, but I figured I'd just end up with library issues anyway, and dealing with the fact they were compiled in 2003.

I went back to the ebuild file, read the doco here, and worked out where it was getting the source from, checking it out of cvs.

I installed cvs, and then set my cvsroot: "export", followed by "cvs -z5 co wolfgl". That was a good start, I had a source tree.

I tried compiling it, however received errors about "invalid lvalue in increment", and "invalid lvalue in unary ‘&’".

Googling turned up this page, where it seemed that the code was a bit sloppy, not being written as syntactically correct as gcc4 requires.

I thought about hacking the source, to fix what was defined on the above page, but then I remembered reading the changelog for the wolfgl port to gentoo, and that it had a couple of patches, which were also defined in the ebuild page.

I went about finding the patches, they were attached to bug reports, here, and here.

I downloaded the patches 1, 2, 3, 4, and applied them.

This caused more issues, first, patch 1, for gcc4 compiling, has to be applied with patch -p1, whereas the others are -p0, and one of them wouldn't apply because the chunk failed on a bunch of files in the common directory.

I had a quick look with vi, to find out what it was trying to change, and realised the files were in dos format, so I tried running dos2unix across all the files in the common directory, and running the patch again, which worked.

I was getting pretty close now I thought. Compilation failed with an "ld: cannot find "-lXext"" error or some such.

I found I needed the "libxext-dev" package installed. After installing that, the compilation completed.

I copied it into the directory with the dos version of the game I had been playing under dosbox, and tried to run it, immediately resulting in a segfault, nice.

Running strace across the binary told me it was trying to open "vswap.wl6", but I had "VSWAP.WL6", ah, case sensitivity.

I copied that across to a lowercase version of the filename, and it got a step further, complaining it couldn't find another wl6 file. I then copied all the uppercase versions of the files to lowercase versions.

The game started, woo, but then the video was all scrambled, hmm. I noticed during patching that a few hunks were offset by a few lines, maybe that's the issue.

The game kept going though, and it got to the menu, however there was some funkiness with the sprites. Instead of the spinning logo or whatever it is, next to the selected item on the menu, it was using a sprite of BJ Blazkowitz, and it changed into a gun side on, there was alos a gold key at the top of the screen for some reason.

Trying to start a game anyway, (after getting through the select episode/difficulty menus, with same sprite funkiness), resulted in the game segfaulting.

Hmm, it must work on gentoo, why can't I compile it? Maybe it needs to be compiled without the sprite patch being applied.

Monday, August 13, 2007

Built a new machine using an M2N-MX motherboard, tried to boot up Ubuntu, received error about "MP-BIOS bug: 8254 timer not connected to IO-APIC".

Looked in the bios, and was able to disable "ACPI APIC Support", which was able to get the machine to boot.

I found this page, which had instructions for fixing the problem properly, by disabling the HPET table. This was only available after updating the bios.

Since there wasn't windows on the machine, using the winbios updater wasn't an option, I downloaded "dosslack" (here), used dd to write a floppy, and then copied the bios updater and updated rom file form asus to the floppy.

After updating the bios, there was a new option available in the bios, to disable the "MCP61 ACPI HPET TABLE".

After doing that, the machine was working a bit better, but there were sound issues. There was a high pitched whine produced by the sound card.

I googled around, and found references to changing the modprobe configuration to load the intel sound driver with a fix parameter.

I created /etc/modprobe.d/modprobe.conf.dist containing:
options snd-hda-intel position_fix=1 model=3stack

After rebooting, the sound was fixed.

There were also issues with the RAM, it's supposed to support Dual channel, and run at 800MHz, but was only running single channel. It also wouldn't run at 800MHz unless forced in the bios.

According to the motherboard manual, the ram was in the correct slots for dual channel, but it wouldn't do it. After moving the ram into the "wrong" slots for dual channel, it automatically activated.

Installing VMWare gave some issues, it required having a patch applied before it would compile, and getting the ipod to connect was a hassle, required vmware to be shutdown, then the ipod connected, then vmware restarted, and the vm booted up, and the ipod attached to the vm.

There were also issues with the video in vmware, switching to full screen would result in a horrible interlaced looking display, whereas in a window it was fine. I found a page that detailed editing the vmware preferences file, and changing pref.autoFitFullScreen = "fitHostToGuest" to pref.autoFitFullScreen = "fitGuestToHost" which fixed that.

In sorting the music to copy into the ipod, there were some m4a and wma files that required conversion to mp3, so they could be tagged with musicbrainz tagger, I found a script to do that here, which I modified slightly to work recursively, not convert already converted files, remove the temporary wave file fifo, and run lame with a lower priority:

# Dump m4a to mp3

find . -name "*.m4a"|while read i
if [ -f "$i" ]; then
dest=`echo "$i"|sed -e 's/m4a$/mp3/'`
if [ ! -f "$dest" ]; then
rm -f "$i.wav"
mkfifo "$i.wav"
mplayer "$i" -ao pcm:file="$i.wav" -vc null -vo null -quiet &
nice lame "$i.wav" "$dest"
rm -f "$i.wav"

The above can have all references to m4a change to wma to convert wma files.

Due to vmware only emulating usb1.1 (or appearing to), loading 30gb of mp3s with itunes was going to take too long. I tried using rhythmbox, however this was running really slow, and also tried to get sharepod working in the windows vm.

It turns out that rhythmbox was copying the files so slowly because I'd unloaded ehci_hcd while trying to get the ipod to attach to the vm properly, so once I reloaded this, I was able to use gtkpod to load all the songs quite quickly.

The issues I had with sharepod.. only the latest version is available from the author, and once I installed the .net framework in the vm to get it to actually try to load, I was faced with a scary looking security exception error.

I found ways to overcome this, using the .net framework configuration tool, however this tool did not exist, and the msc file wasn't there either, so I couldn't run it directly.

I upgraded .net to v3 in desperation, and still didn't get the config tool. I then found I could change the security manually, by bringing up a command prompt, changing into the C:\WINDOWS\Microsoft.NET\Framework\v2.0.50727 directory, and running "CasPol.exe -m -ag 1 -url "file:////\\[ubuntu machine]\[share]\*" FullTrust -exclusive on".

Sharepod 3 then worked properly. Sharepod 2, or whatever it was I used to use, just hangs when trying to load (after the 3 or 4 ocx and dll files it requires are installed and registered), sharepod 1.7 only has the ability to copy music out of the ipod, and not in (or so it seems).

Thursday, July 12, 2007

I decided to try migrating one of my real linux machines into a VM. It's a machine that doesn't do much (ssh/imap server), and is an old P166 with 64MB ram, and a 4GB hard drive, that's almost full.

The process I ended up using to migrate the machine, was to create a new VM under vmware, with 64MB ram, and a 4GB disk, same as the real machine.

After that, I used LTSP that I also have installed on the VMware server to boot it up, and I configured LTSP to just boot to a prompt, instead of trying to start X.

(you edit the lts.conf, and change the SCREEN_01 line to = shell).

I then had a VM that booted up a linux kernel, and gave me a root shell. I tried to startup sshd so I could rsync the real machine into the VM, but this failed because I had no host keys for the VM.

I ran ssy-keygen on the LTSP/VMware machine, and generated the host keys under the filesystem that's exported to the LTSP client, and I was able to start sshd.

I then copied the root user's public key from the real machine into the root user's authorized_keys file under the LTSP export, so rsync could ssh to the LTSP client/VM.

Once I had this working, I used the VM to partition the virtual disk, create a filesystem on it, and mount it.

This had a bit of a trap in it.. after I partitioned the disk, the device nodes to represent the new partitions didn't appear. I had to manually go into the /dev directory, and "mknod sda1 b 8 1" then "mknod sda2 b 8 2" (found these here).

All of the above is probably caused by using LTSP, instead of knoppix or something, but I don't have a physical cdrom connected, so I would have had to use a different machine to make an iso, and then work out how to make vmware use an iso as a cdrom drive or something.

Once I'd made the nodes, I could "mke2fs /dev/sda1" and "mount /dev/sda1 /mnt".

I was then able to rsync the real machine into the VM's disk..

sudo rsync -av --exclude=/proc --exclude=/sys --exclude=/dev / root@[VM ip]:/mnt

(this was run on the real machine).

After all the files were copied, I went about attempting to "fix" the grub install, so the VM could boot off it's virtual disk.

This was a bit tricky, first I chrooted into the mounted virtual disk, and went to run "grub-install /dev/sda", but I realised that wouldn't work because the /dev under the chroot was empty.

I exited out of the chroot, and remounted /dev under the mnt, "mount -o bind /dev /mnt/dev", and chrooted again.

I ran "grub-install /dev/sda", but that told me "/dev/sda does not have any corresponding BIOS drive."

I had to "grub-install --recheck /dev/sda", which told me:
"Probing devices to guess BIOS drives. This may take a long time.
/dev/hda1: Not found or not a block device."

hmm, where's it getting /dev/hda1 from?

I edited /boot/grub/menu.lst, and changed all the hda1 references to sda1, since now the machine had a scsi disk, not a real ide disk, but that didn't help.

Eventually I discovered it was because I'd rsync'd the /etc/mtab file from the real machine across, so it thought /dev/hda1 was mounted. I edited /etc/mtab, and changed it to /dev/sda1 from /hda1, and then edited the /etc/fstab, so it would mount / properly when it booted.

I ran "grub-install /dev/sda", and got:

Searching for GRUB installation directory ... found: /boot/grub
Installation finished. No error reported.
This is the contents of the device map /boot/grub/
Check if this is correct or not. If any of the lines is incorrect,
fix it and re-run the script `grub-install'.

(fd0) /dev/fd0
(hd0) /dev/sda

Ok, looking good. I exited the chroot, and rebooted the VM.

It booted up properly, first go. Only issue was that I had no eth0, yet the kernel was saying it detected the VMware PCnet eth device, as eth0.

ifconfig -a showed me I had an eth1 (but no eth0). I think this is because I compiled the kernel, and compiled support for the Intel e100 NIC directly into the kernel, so it was binding to eth0, even though it wasn't there under the VM.

I reconfigured /etc/network/interfaces, to use eth1 instead, and restarted networking, and it seemed to be ok.

The whole process wasn't too painful, just took a while to wait for the rsync, and there were a couple of head scratchers with the hda1/grub issue.

Saturday, April 28, 2007

here's another quick one.. I posted my blog update, for the first time since I had to get a new shell account, and move all my content, and I found that my blog hadn't updated.

It turns out I used to have an index.html symlink to the real html file that blogger updates, but in the process of downloading and uploading all my content, that symlink got turned into a normal file.

I then had to work out how to create a symlink with cpanel, the software I have to use on my hosting now, because it doesn't provide shell access (very annoying).

Anyway, in the end, I found that I had to create a cron job to create the symlink, and then just delete it.

I found details of it here:

Create symlink without SSH

A trick I used on a site with cpanel is to create the symlink using the cron function (its clunky, but it works).

In cpanel, you go to the cron jobs page and schedule your command to run pretty much right away. For example if your main site is and you want a second site then the symlink can be created using something like:

ln -s /full_path_to_html_dir/ /full_path_to_html_dir/foo

PS: Don't forget to remove your cron entry after it has run, or it will keep doing it!

This is a fairly quick one.. I decided to scan around for any open wireless, using Kismet. I found an AP with a hidden essid. Locking onto the channel for a little while resulted in finding the essid.

I configured the wireless interface to hook on to it, forced the channel manually, and bang, I was on. They even had dhcp running, too easy.

I wanted to see what AP I was connected to, so I went to the IP of the gateway, and was presented with a login to a Netgear DG834G. I tried logging in as admin/password, and was allowed in.

There's no fun here, it's just too easy.

I then wondered if it was possible to extract the guy's adsl password, just out of interest. I found this blog, which contained details.

All I had to do was click on this, and then wait a few seconds, and click on this, and I had a file containing the guy's login and password.

I then worked out what the first link was doing, saw "grep ppoa_ /tmp/nvram", and wondered if it was possible to do "cat /tmp/nvram", and drop the whole nvram file out.. yep. Not much more in there is useful though.

There's not even MAC filtering on this AP, I was at least thinking I'd have to clone a MAC address before I could connect, it was all just too easy. 1.5Mbit ADSL connection too.

Wednesday, January 03, 2007

I had another go with the SRAM card I bought yonks ago, I still haven't managed to access it in my laptop.

I found this page, with some details on editing the /etc/pcmcia/config file.

So I tried that, I added:

card "SRAM Card"
version "SMART Modular Technologies", " 4MB FLASH Card "
bind "memory_cs"

but had no luck.

I give up. I can't believe it's this hard, I should just be able to plug the card in, and have /dev/mem0c0c or something turn up.

Monday, October 02, 2006

I'm trying to get kismet running on the Toshiba Portege I installed Dapper on recently.

It's not turning out to be too easy.

The problem is that the wireless adapter in the laptop, a Toshiba Wireless MiniPCI card, which is just basically a pcmcia Lucent/Agere wireless card with a cardbus controller on a MiniPCI card, won't run in monitor mode.

Initially the card had firmward 8.10 on it, and attempting to run kismet would result in an error about monitor mode being buggy, and not being enabled.

There's references to the firmware versions here.

I found some windows firmware updaters (eventually, since Agere redesigned their site, and don't make any reference to these cards anymore), here, and I attempted to downgrade the firmware to something earlier than 8.xx, however didn't have any luck there. The firmware loaders are for pcmcia cards, and don't detect the MiniPCI card properly, and refuse to update it.

I managed to find a zip file with firmwares that will load here, however unfortunately they are even higher 8.xx versions.

There seems to be a way to force it, detailed here, fiddling in the registry. It has a link to a generic firmware collection, however that server seems to be gone.

I googled around, and found a useful howto, here, that makes references to earlier orinoco kernel modules here, that have work arounds for the buggy monitor mode in the 8.xx firmware, so I went about trying to use those instead.

I downloaded the version that matched the kernel I was using, dapper's stock 2.6.15 686 kernel, however it wouldn't compile, because there's something funny about the pcmcia stuff in the ubuntu kernel.

The newer versions of the patched orinoco drivers wouldn't compile either.

I edited the make file, and stopped it from trying to compile the pcmcia orinoco driver, however because the MiniPCI is a pcmcia card, this didn't work very well.

Strangely, the hostap driver loaded, in both ap and client modes, I ended up with both eth1 and wlan0, and it seemed to be connected to itself.

Monitor mode still wasn't available, so I didn't muck around with this too much.

I decided that I'd try a vanilla kernel, so I downloaded that, 2.6.17, configured it, and waited several hours for it to compile.

I then built it as a deb package, installed it, which seemed to work ok, except for the video being corrupt while the kernel initially boots.

I then went about compiling the hacked orinoco drivers again, and while there was warnings, they compiled properly.

I installed them, and rebooted, the card came up, and had monitor mode available, finally. Kismet ran, and detected a few aps, so it was all working.

I quit kismet, and then discovered that the card didn't want to work properly. The driver must have been spewing errors, because the kernel logger pegged the CPU, and took the system load to 4.

I took the interface down, and it stopped. I then tried using dhclient to reconfigure it against my AP, and received errors.

I tried ejecting the card, even though it's a MiniPCI card, not a pcmcia card, but the pcmcia driver interacts with it. This worked.

I then inserted it again, and the laptop immediately locked up hard.

I think my option now is to try to find a way to change the firmware in linux, via the pcmcia interface, since the windows firmware loaders refuse to, and try to load firmware 6.xx or 7.xx, where monitor mode works properly, without having to use hacked kernel modules to work around broken firmware.

Or else I could replace the MiniPCI card with a non crap one, or just use a pcmcia card, like the Prism cards I've got, but that requires using an external antenna, and fiddling.