Monthly ArchiveOctober 2005
Dcache & Work Derek on 31 Oct 2005
- Team weekly meeting - mail sc-tech about not getting all fiel in castor - dcache transfers
- Restarted PinManager 3 times, dropping pins and pinrequests tables from db finally seem to fix problem
- Restarted various pools to get free disk space
- Spoke to ST and MJB about moving disk SRM to dedicated box -got okay - applied for pnfs.gridpp.rl.ac.uk, alerted CMS.
Dcache & Work Derek on 28 Oct 2005
Review Meeting with ST
Restarted more pools to free up disk space and clear logfiles.
Set all cms pools to 50 movers
Attempted debug of hung CMS job
Noticed Atlas have not much space for transfers in dCache
Reinstalled scrooge
Dcache & Work Derek on 27 Oct 2005
Checked logs for source of crash on dcache last night - nothing found
Attended meeting about helpdesk workflow - need to investigate auto taking tickets on reply and blocked tickets versus stalled tickets
Monitoring socket buffer overruns on disk servers - seem very high wonder if related to transfer failures, tried increasing read buffer on csfnfs56 but no apparent effect
Found minor bug in ops console, fixed and reinstalled scrooge to test
Dcache & Work Derek on 26 Oct 2005
Checked 6300 odd tape writes to ensure no gaps in pathtape data - none found
Attended meeting on UB stats
Restarted nfs62 and 63 dcache pool repeatedly -seeing logs fill up with delete messages
Fixed few remaining problems with stores on 60 and 62
Reinstalled scrooge to check ssh auth works
Dcache & Work Derek on 25 Oct 2005
Not going mad apparently - pathtape behaviour had changed, has now been changed back, but due to a minor error onthe way ~10000 files got a pathtape id but didn’t think they did making them fail continually, cue some delicate shell surgery to untangle that mess. Still need to check that all successful writes from before the pathtape crash actually have pathtape entries and that they match up.
Games & General Derek on 24 Oct 2005
Bleurgh
Took Friday off sick. The first day I’ve been off work through illness in over 3 years - i.e. since starting there. A particularly nasty cold it was, spent Friday in bed reading and filling the contents of handkerchiefs, recovered enough on Saturday to play Quake 4 when it popped through the letterbox and was sufficently recovered to make a supply run to Sainsburys on Sunday.
Quake 4, by the by, is definately better than Doom 3, flashlights on yer weapons - need I say more. No “monster cupboards” either.
Work is work, was in a meeting last week where the outcome was that we plan to ditch months of (my) work and move to a different system, for the best of reasons, though its a very tight timescale, so the status quo may continue. Difficult to pin down how I feel about this - a weird concotion of relief at not having to run the bugbear for ever, frustration that I oculdn’t get it to a state where I could leave it and a numb feeling that if I’d twiddled my thumbs continously for the past months we’d be in some ways in a better situation.
And I note that Caity is worryingly demonstrating traits normally found only in my blood relatives, in particular the continued mocking of myself for a minor linguistic gaffe. Scary.
Also,my letting agents haven’t yet cashed the cheque I sent for the admin costs of my lease renewal, I’ll prod them tomorrow to make sure they got it.
Work Derek on 24 Oct 2005
Friday 21 August
Ill
Monday 24 August
pathtape broke on Friday, but is back now, however our hsm interface script is now failing talking to pathtape, but according to the pathtape admin it should never have worked…
Approx 12000 files waiting to flush to tape
Discovered reason for marley’s unstableness - had managed to unclip the heatsink from the processor while attaching the new fan, clipped it back on and it seemed to work again
Dcache & Work Derek on 20 Oct 2005
LHCb transfers failing - looks like FTS timing out transfers and not shutting down transfers properly, seems to have started working again though
CASTOR giving turls that aren’t running gridftp is reported to be fixed, but CMS not doing CERN-RAL transfers so unable to confirm
Trying to get ops consoles to pop up dialog box for ssh passphrase.
Dcache & Work Derek on 19 Oct 2005
GridPP storage phone conference
fire alarm
Attended meeting about use of CASTOR in SC4
Tested poweroff script -discovered unresponsive APC
Began writing spec file for poweroff script
Added lockfile to PNFS permissions changer after finding two running concurrently
Dcache & Work Derek on 18 Oct 2005
Response from SC support person last night - probable CASTOR config problem, its been handed over to CASTOR support, no repsonse or change of behaviour yet
Enabled more sites in R-GMA database, bounced one odd one to R-GMA developers
Query came in from Spain about using our SE for CMS Production work, said okay but cc’d UK CMS as it could conflict with SC3 activity, which started involved discussion. In course of this, discovered that CMS has 4TB of tape allocated for their uses and want to write 18TB, having written 4TB so far. So worked out how to cut off a vo from tape access, not pretty but it’ll work - change the write preference on the link from the
CMS have got temporary allocation from User Board to have a tape allocation of 25TB.
Considering writing SURE tests for experiment using too much tape