Category ArchiveRT
Dcache & RT & Uncategorized Derek on 20 Feb 2006
20/02/2006/
Shutdown startup: vacuumed postgres db’s on 350 and pnfs - lots of disk space reclaimed, upgraded pnfs to 8.1.3. Installed slony on pnfs and setup replication to 438 - still needs logrotation and startup scripts to be done, would have taken less time than it did but vi decide to be too clever by half and not show me that the files I was editing were msdos style and not unix. CB was trying to dcap write access working, pointed him towards gsidcap but our system was still in pieces at that point so couldn’t really help out that much. Altered pools once they came up to correct settings for multiple io mover queues. Checked gftp doors now using new gftp queue.
Deleted 200+ tickets from helpdesk after mailstorm due to batch scheduler wierdness.
Dcache & RT & Work Derek on 16 Feb 2006
26/01/2006 - 16/02/2006
16th February
Built postgres 8.1.3 for SL3, helped OS with dCache PoolManager
Mailed Zeus about 2 zero length files
Did various RGMA requests
15th February
Continued configuring new SL4 postgres server
Built slony 1.5 for postgres 8.1.3
GridPP-Storage phone conf
Shadowed ST doing relocateable WN upgrade to LCG 2.7.0
14th February
Mailed CERN about link - turned out to be CERN configuration issue
Mailed Lancaster about pingable host on their end of UKLight for more monitoring
Installed new SL4 postgres server
13th February
Monday Morning Ops meeting
Tweaked RT’s web ui on replies to not attempt to set Ticket owner to current owner - was interacting with autotaking
Restarted gftp servers - all stuck at max transfers
Noticed UKLight down, mailed Site Networking
Setup multiple io queues on disk servers - began restarting quiet ones - leave rest till powerdown
Configured gftp servers to use gftp queue
10th February
Meeting with ST- reviewed Job plan
Setup autotaking of tickets on reply in RT
Reviewed SFT failures for RC report
9th February
TOAST meeting
Mailed TB about huge number of errors reported in dCache logs from file acces from lcgui02 - looks like files not being closed properly - but still not really resolved.
Added query for grid v non-grid usage to T 1 metrics page on wiki
8th February
266,270 couldn’t access yumit - turned out to be nscd still using ip address of old system - nscd -i hosts got things working again
Helpdesk fell over - rebooted
7th February
Installed new certificates - but left keys encrypted causing gridftp transfers to fail for 4 hours - fixed
Checked GridPP-Storage table’s Tier 1 historical numbers for RAS
Supplied UKLight plots to MJB
6th February
Bulk requested 8 host certificates, provided feedback on experience to JJ and MV
Supplied gridusage plots to ST
Sent around updated TOAST agenda
3th February
Holiday
2nd February
Holiday
1st February
Holiday
31st January
Reported 2 problems with yum it to CC
Talking with PS, decide that RT < -> UKIROC Footprints problem was down to problematical site mail server, configured helpdesk to not use that mail server.
30th January
Monday morning ops meeting
Asked ca people about bulk cert request script
Mailed CC & ST about yumit not displaying packages in host detail
Updated scarf helpdesk aliases to point to HPCSG’s footprints box
27th January
Supplied RAS with Grid vs Non-Grid CPU time totals
26th January
Mailed GC some questions for CHEP
Dcache & RT & Work Derek on 09 Jan 2006
Reenabled some pools on nfs60 that had gone funny and were holding login slots on gridftp doors open
Cleaned up after our helpdesk and the CA helpdesk decided to spam each other
Ran 2005 Grid vs Non-Grid CPU usage stats
Attended talk on SPEC benchmarks
Did most of work on Laptop as PSU in Desktop was playing up - but is now fixed
Dcache & RT & Work Derek on 15 Nov 2005
dCache srm very slow - script checking 39_2 replication appears to be hurting system, cancelled it grepped destination pools manually then diffed against the list from 39_2 and then ran the replication check on those that were in 39_2 but not on destination pools, so 60 files rather than 13000 files, all 60 files are not in dCache anymore so no problem.
Moved nfs39_1 & 2 to lhcb, moved nfs51_1-4 to read-only.
Updated job count to exclude dteam job looks liked it reduced usage by about a 1000 jobs a month.
Added display of stalled jobs to helpdesk index page.
Dcache & RT & Work Derek on 26 Sep 2005
Annouced downtime for SRM on Thursday fro upgrade
Discussed with RAS and JC possibilty of getting GGUS, Footprints and RT to talk to each other without creating extraneous tickets anywhere.
Began creating plots of srm stats using existing stats creation framework.
Checked logfiles to see if any discernible reason why connection from an SE at Glasgow was not getting data when using FTS, verified that FTS tranfers RAL-RAL work successfully, I suspect firewall issues somewhere.
Dcache & RT & Work Derek on 17 Aug 2005
Requested new certificates for the two jra1 box and the new CMS dCache disk. Phone support to CB, who’s installing/configuring dCache. Helpdesk box issue /tmp filled with far too many files - stopping e-mail getting through as RT couldn’t create tmp file, cleared out, need to install tmpwatch or something. Request to do testing to dpm at Glasgow from Jeremy, do some srmcp’s tomorrow, but FTS server is at risk ths week and people at Glasgow more available for monitoring next week, so leave mass transfers until then.
New syslog box is configured with DHCP address, which is wrong, should only access DHCP on re-install, need to look at installation scripts.
Edited dCache PoolManager to add settings for H1, fixed small mistake made by ST when creating pools on nfs58
Dcache & RT & Work admin on 03 Aug 2005
Worked with Phillippa to understand why Tickets from our helpdesk weren’t getting into Footprints, tunred out replies were going out with a different e-mail address to the address footprints had sent to, changed the outgoing e-mail on the Tier1a-lcg queue.
Attended SC conference call
Mailed Andrew and Jeremy SC3 problem report
Added Chris Kruk to sg CVS
Running more transfer tests in loopback - it appears that we have problems with 32 transfers in and out, whether this 32 transfers of one type or 64 in total is unknown so far.
Ran pnfs-poolcheck on CMS area after nfs42 issure- 11 files not in pool,but still need to find and report to CMS