Feed on Posts or Comments

Monthly ArchiveOctober 2005



Dcache &Work Derek on 17 Oct 2005

Replaced cpu fan on marley -ops reported loud continous noise from it, seemed to lockup couple of times after I finished though
Getting turls ofr hosts at cern apparently not running gridftp servers – reported to lcg-support and when I got no aut reponse hep-service-level-2.
Noticed that few hosts are logging to logger2 – bring up at next Monday morning meeting.
Attended Tech Talk – XFS
Did some R-GMA site registrations.

Work Derek on 14 Oct 2005

Made toast agenda w3c compliant
Updated gftp boxes to latest kernel
Added two sites to r-gma registry
Restarted tomcat gridice-mds and lcg-archiver on lcgmon01

Dcache &Work Derek on 13 Oct 2005

Reply from dCache developers about gridftp problem – going to discuss it, get back to me
Cancelled TOAST meeting
Restarted PinManager twice, restarted srm when it appeared to be stuck.
Got pxe floppy from http://www.rom-o-matic.net for fezzwig – tulip NIC
Adjusted ks configuration for ops consoles to cope with fezzywig’s 10GB disk, reinstalled scrooge to check it still works on it

Dcache &Work Derek on 12 Oct 2005

Reply back from CERN about short Gridftp transfers to the effect that they don’t see any problems.
Main SRM dropped offline several times – created fresh postgres database and moved both srms over to it- seems to have alleviated the problem
Reported gridftp transfer problem to dCache developers
Tried reducing transfers on pools to see if it eliminates problem
Tried 10 srm copies from cern to see if I see the same error
Attended GridPP storage phone conference
Restarted CMS pools as the failed transfers don’t roll back leaving locked files behind, a restart removes them as they don’t exist in pfs database
Rebooted logger1 after kernel upgrade.
Changed permissions fixer on dcache.gridpp to cd to actual vo directory to try and reduce load on admin pnfs database, later fixed it to pipe stderr from find command to /dev/null

Dcache &Work Derek on 11 Oct 2005

CERN-RAL link back up- at least CERN had reinstated their routing, so I reinstated ours. Unfortunately hit local network glitch where only some of the systems could see the UKLight gateway, but networking managed to fix it
Still seeing gridftp truncation – mailed lcg-sc-support
Added two sites to registry

Dcache &Work Derek on 10 Oct 2005

Found job output for duff Babar job for TA
Trying to understand dcap hangs seen by TB – might be down to a non-existent file which was being accessed, I could access files fine, checked two directories: out of ~88000 files, 10 weren’t there, when it looked like they were, yay for virtualisation.
Debugging timeouts with JC and TB of CMS transfers from CERN, getting short reads from one host
UKLight snafu – implemented ping test to CERN, and wrote two scripts to add/remove routing on a whim over 21 systems
Added SJW to Resource group in helpdesk
Caught up on email

Books &Films &Music Derek on 06 Oct 2005

Leaf on the wind

So Serenity. Good film, go see.

(Yes even you philstine, sceptical Cambridge-dwelling folk – Go. See.)

It stands reasonably well on its own if you haven’t seen the tv series, I think.

But did they *have* to do *that* and at that point? I mean…

Anyway, also picked up the remaining two books in the Baroque cycle while I was in Reading and also the Lord of the Rings trilogy soundtrack. and then back home for pizza.

Very nearly bought a set of playing cards of 52 stag night pranks, but didn’t; relax Graeme (you too Annabel).

Work Derek on 06 Oct 2005

Holiday Thursday & Friday

Dcache &Work Derek on 04 Oct 2005

Trying to gauge how much srm db relates to reality – possibly srm request can be marked as failed whe the user has quite happily gotten their file, this involves more poking around in srm database – one user who the database claimed had 140 succesfull get requests to 3000 odd failed said he didn’t see this.
While doing so noticed that CMS were dong transfers, noticed that some at least were failing due to not getting the full hostname in the TURL from castorgridsc ‘s SRM, mailed lcg-sc-support to report it.

Created example home directory on scrooge and tarred it up to copy on from touch.

Dcache &Work Derek on 03 Oct 2005

Beginnings of a query for srm stats:

select to_char(epoch_to_timestamptz(CAST((creationtime/1000) AS integer)),’YYYYIW’),state from getfilerequests where extract(week from epoch_to_timestamptz(CAST((creationtime/1000) AS integer))) <= extract(week from now());

This uses a function epoch_to_timestamptz() which I’ve added to the database taken from PostGreSQL General Bits #40 to basically cast unix epochs into pg’s timestamp type. Now if only the column I actually do the selects on was seconds since epoch, instead of milliseconds since epoch that it actually is.

Had (second-hand) reports of difficulties transferring between CNAF and RAL, and while I could find an incantation of a transfer command that looked it should work but didn’t, however was unable to confirm with user as they’re on holiday.

Also attended Technical Talk by GFT on SATA, SAS and FC

« Previous Page