Monthly ArchiveJanuary 2006
Dcache &Work Derek on 19 Jan 2006
Baby sat SC3 rerun:
rebooted gftp0447
discovered what seemed to be two dcache-pool services running on csfnfs63, at least lots of java process were still there after a service dcache-pool stop, killing all them and restarting dcache-pool seems to have got rid of the poolRestarted messages in the PoolManger logs, however csfnfs63 is still taking data in much faster than any other disk server.
Meta-Work Derek on 18 Jan 2006
Contrast
Ha, last night I was moaning to myself about a file downloading at 2.3kb/s being too slow, today I’m moaning to myself about our network traffic only being 800Mb/s.
Dcache &Work Derek on 17 Jan 2006
Ops meeting
SC3 phone conference
Rebooked ops meeting for next 52 weeks
Kept an eye on dCache hosts, nfs39 showing too many open files error so restarted with updated ulimit -n value – must remember to reinstate that after future upgrades.
Computer Stuff Derek on 16 Jan 2006
Installing Debian testing onto a RAID 1 mirror of an Asus A7V8X with Promise PDC20376 Controller
Got two 250GB SATA hard disks delivered from dabs on Saturday (I’m sure I vowed never to order from them again without getting it delivered to work, but what the heck), this was to replace one in a server that died, probably after the power dropped a bit too often.
Physical installation was painless, software installation less so.
The motherboard has a PDC20376 controller, which has some sort of RAID functionality, so I configure the controller to treat the two drives a a RAID 1 mirror, download and burn the debian stable netinst image and start installation. Everything going smoothly, both disks detected….oh hang on, both disks? Surely it should only see one “disk”?
Decide to stop and retry with testing netinst image – same result, sees both disks but no mirror.
Regroup at this point by doing a web search on the controller – ahha RAID functionality not supported by the driver, disappointing, but not a death knell – this is only going to be a linux system so we can live with software raid.
So go back into controller bios and delete array. Reinstall debian testing, get to partioning, apparently its unable to create partitions on the software RAID meta-device, hum and haw for a bit before deciding I can live with one muckle partition. New problem – no swap, realise that RAID 1 mirroring swap is pointless so repartition giving 1 GB swap on each disk and RAID 1 mirroring rest of disks. Rest of installation painless from then on, until system gets to point it needs to reboot.
System spits out CD, powers down, powers back up, does usual BIOS check stuff and then ….. nothing, not even an insert system disk and press a key prompt. Have moment of inspiration and go into RAID controller setup and create two RAID 0 arrays, each containing one disk, set the first to be bootable and then reboot the system. Success! The system happily boots and does some more setup and finally lands me at a shell. Do a cat /proc/mdstat to check raid is okay – informs me that array is unclean and is resyncing, slightly worrying, but assume a shutdown somewhere wasn’t clean and leave it, resync completes fine. Decide to reboot system as a further check, all okay – array comes back up clean.
Dcache &Work Derek on 13 Jan 2006
Friday 13th
Sat in on 2 Tier 2 Deployment meeting sessions: SC4 preparations and What the Tier 1 can do for Tier 2s
csfnfs62′s pools were showing too many files open errors on pool usage pages – restarted dcache-pool service
esr, t2k and ilc directories on dCache not correctly setup, deleted and recreated properly – this sparked by usage from esr testing people
Thursday 12th
Attended GridPP 15
Sc3 rerun started, CERN network on end of OPN had expanded and we’d hadn’t been told/realised, so had to update routing on dCache boxes
Initial rate of 30MB/s – fairly poor
Gridftp doors all fell over when concurrent transfer raised from 12 to 30, rebooted by GP.
Took decision to add in all possible pools & servers to SC3 activity, on doing so rate leapt up to over 100MB/s
Wednesday 11th
Attended GridPP 15
Built slony rpm
Documented slony building procedure on Gridpp wiki
Dcache &Work Derek on 10 Jan 2006
Cleared out some stores which had got stuck where the file appeared to not have been stored when it actually had
Downloaded Slony and began building it
Asked for GGUS access to see ticket
Had missed a new queue from Grid/Non-Grid stats so had to regenerate.
Dcache &RT &Work Derek on 09 Jan 2006
Reenabled some pools on nfs60 that had gone funny and were holding login slots on gridftp doors open
Cleaned up after our helpdesk and the CA helpdesk decided to spam each other
Ran 2005 Grid vs Non-Grid CPU usage stats
Attended talk on SPEC benchmarks
Did most of work on Laptop as PSU in Desktop was playing up – but is now fixed