I don’t often blog (to say the least) but I thought I’d write up a little saga that I’m actually still in the middle of (but I think is sorted). I’m going to start just by writing it all down while I remember – I’ll hopefully come back to put links in, and later to write this up more formally (a FOSS4G paper, maybe!).
Last year I ran a new module on our MSc GIScience for the first time call Geospatial Information Services (GIServices from here on). The aim of the module is to introduce students to OGC web services, interoperability, Google mashups, etc. and the new, Web-based ways of “doing GIS”. As part of this, the practicals are to populate a spatial database (PostGIS), connect an OWS server (Geoserver) to create WMS and WFS services and then connect desktop (QGIS) and web (OpenLayers) clients to the services. As you can see, all done with an open source, OSGeo stack. The demonstration data was all gathered from data.gov.uk and so is Open Data too – a fully redistributable practical set.
To support this, I need each student in the class to have access to a machine that’s set up as a web server with PostGIS & Geoserver, and a way of testing clients. Initially, last year, I decided that a neat way to do this would be to create an Oracle VirtualBox virtual machine (VirtualBox is also open source, and pretty solid) that each student could have a copy of. I managed to create this is such a way that each lab machine had the original source VM image which was not updated – the VM differences were written to the student’s directory on the School of Geography’s SAN. In theory then they should be able to switch machines and still pick up where they left off. The VM I used was the OSGeo Live 5.0 system which is fantastic as it comes already configured with the services I needed (and a lot more).
This was only partly successful. Firstly, it work ok with one person in the lab who sticks to the same machine. It doesn’t scale very well with multiple students (network bandwidth to the SAN – I should have seen that coming). There’s also another subtle problem that each VM source image on each different machine ended up with a different UUID (because of how I installed it, I presume) so swapping machines didn’t work as VirtualBox didn’t recognise the source images as the same.
An issue that I also didn’t solve was the network access in the VM. I wanted each VM to get its own IP address so the physical host machine could be used as a client to the VM’s web server. However getting the VM to acquire an address from the university’s locked down DHCP services was a battle too far, and we stuck to localhost testing of the VM services – a little disappointing but not the end of the world. (I’ve had some success with this since, ask me if you’re interested).
The UUID problem I could probably have fixed but the network to the SAN wasn’t easily fixed and I felt that this was not reliable enough. So half way through last year’s course we swapped to using bootable USB memory sticks. We bought a stack of Kingston DataTraveller 100 G2 16GB memory sticks for the purpose (lots of room for the data – you can squeeze OSGeo Live 5.0 onto a 4GB stick).
So, how to set up the memory sticks? Well thankfully there were instructions as to how to create an OSGeo Live 5.0 bootable USB stick. I had some problems making this work at first (which are now irrelevant so I won’t go into here). Eventually I achieved this. Slightly annoyingly the “persistence file” that allows the Live Xubuntu linux on which OSGeo Live is built to save data is capped at 4GB because of FAT32 file limits on the USB stick, so a lot of the 16GB of the stick was left unusable from OSGeo Live. However this was enough for the practicals (just – as long as download ZIP files were deleted as the students went on with the work). I also had to set the university’s proxy settings in the running OSGeo Live system (unfortunately this is a bit of a hassle in Xubuntu (as opposed to plain Ubuntu) as it involves editing linux config files), and I copied some of the data from the previous steps in the VM into the memory stick system to give the students a leg-up towards where they had already got to in class.
At the end of this I had a “master” USB drive prepared for the class. Then it was a matter of cloning this to the rest of the drives. I tried “Clonezilla” but settled on another package, OSFClone to do the job. It could do direct drive-to-drive USB cloning, preserving the bootability of the target drive. I spent a day cloning USB drives in the background to other work.
And it all worked! There was the odd problem in class when students filled the persistence files by not deleting ZIP files but overall it was pretty good – the OSGeo stack all worked well. What suffered however was really student confidence (not marks, interestingly – about the usual histogram for such a course). There was too much technology in the way of the lessons, between setting up the VM just right and then switching to the USBs. And I had a lot of work mid-semester to construct the USBs – quite a number of late nights!
The plan for GIServices this year is to repeat the practical content but sticking with the USB sticks from the start. Last year the USB sticks were given to the students in exchange for a deposit for roughly the value of the stick (10 pounds!). The students had the choice of returning the stick & getting the money back, or keeping the stick and forfeiting the money. In the end no-one tracked me down to get their money back. I see this as hopefully a good thing: the students go away with a full, bootable “GIS in a box” with example data too.
This year therefore we’ve bought a stack more DataTraveller 100 G2 sticks. Same stick, same process, n’est ce pas? Non.
It seems that for some reason this year’s batch of sticks are not all of exactly the same capacity (possibly I should have complained but I’m out of time for that). The variation is a fraction of a GB (though I remember when 100MB was a lot of disk space!) but it’s enough that drive-to-drive cloning won’t reliably work as sometimes the target is smaller than the source!
I also wanted to recover the “missing” space of the USB stick to be useable in the OSGeo Live system, on top of the persistence file.
As a result of all this I’ve created a new “master” USB stick this year. And since I’m doing that I’ve upgraded to OSGeo Live 6.0.
After some experimentation, the partition map for the USB sticks using an MBR / MSDOS boot sector, it has an ~9GB primary FAT32 partition (for the OSGeo Live system + 4GB persistence file which contains an ext2 file system), blocked at the start of the drive. It has a ~5GB extended partition containing a FAT32 logical partition, blocked at the end of the stick’s drive map. This leaves a small unallocated space between the primary and extended partitions that can account for the varying stick capacities.
Here’s the partition map in gparted (I have to say, I’m not an expert at partitioning and copied the partition flags from a working partitioned disk – I’m not sure if I need ‘lba’ on the first partition or elsewhere. parted will warn about poor alignment of partitions when you create them, and in this case I get no warnings. I used parted and not gparted to create the partitions as it could be scripted and gave better feedback on the choices I was making. I check what’s been created in gparted):
The OSGeo Live 6.0 system is then installed in the first, 9GB partition according to the updated instructions for this version of the Live system (in this case, I used OSGeo Live 6.0 burnt to a DVD-ROM to do the installation)
In the OSGeo Live 6.0 system, I’ve made three adjustments on this occasion (by booting the master USB stick and making changes before cloning). I’ve copied in some source data; I’ve set up the proxies, and I automount the 5GB logical partition under “/giservices” to make it automatically accessible from OSGeo Live. Another advantage of the 5GB partition is that it can be simply accessed both in OSGeo Live and when the stick is accessed from a Windows machine. (The persistence file’s ext2 system is not simply accessible from Windows). This means that results and data can be transmitted simply from OSGeo Live to Windows (and back).
So, that’t the “master” drive. Now I need to clone this drive to all the others, handling the difference in stick capacities. Well for this I’m back to using a two step process. I’ve used Clonezilla to first take images of the two FAT32 partitions (the primary and logical partitions), and stored these on the internal hard drive of the PC. To create a clone, I boot into an OSGeo Live system (could be any Ubuntu-derived live system), and used “parted” to set up the same partition structure with empty FAT32 file systems as on the master stick (the unallocated space will vary in size with the target stick’s capacity). I then use Clonezilla to restore the partition images to the target stick. This overwrites the empty FAT32 partitions and in fact restores the UUIDs of the original partitions too (handy for that automount). It’s a little slow – about 30 minutes per stick. It also makes sense to create the partition maps for all the sticks first, then boot into Clonezilla and do all the restoring.
At the end of it though, I do have a stack of USB drives with OSGeo Live 6.0, with the extra 5GB partition automatically mounted at boot. For some reason the OSGeo Live 6.0 boot seems to be a lot slower than for 5.0 (several minutes, versus about 1 minute) but we can live with that – it seems to be fine when it’s running.
I’ll add an update when the class has been using them, and when the bugs have crawled out of the woodwork. Now to rewrite the practical documents!…
PS: If anyone wants more details (e.g. of a little script to feed into parted to automate the USB drive partitioning), let me know.
UPDATE (5 Feb)
Well, there’s one small problem. The partitioning scheme doesn’t quite do what I wanted. It’s fine in the OSGeo Live system – the logical partition automounts fine. However Windows 7 won’t mount that extra partition, only the primary. It’s visible in & understood by the Win7 Disk Management tool but just won’t mount – it seems that Win7 doesn’t support any more than the first partition on a removable flash drive. MacOS 10.7 (Lion) mounts both partitions. I’ll add a note about Win XP (I expect this will be ok – XP is less fussy about partitions.)
If there’s no way round this in Windows 7 (as it seems) then it may actually be better to have a single partition, create a data directory on it and find a way to mount that directory in the OSGeo Live file system. (Normally, the physical file system on the first partition (as opposed to the persistence file’s virtual file system) is mounted read-only under /cdrom in the OSGeo Live system).