Feed on Posts or Comments

Category ArchiveRT



Dcache & RT & Uncategorized Derek on 20 Feb 2006

20/02/2006/

Shutdown startup: vacuumed postgres db’s on 350 and pnfs - lots of disk space reclaimed, upgraded pnfs to 8.1.3. Installed slony on pnfs and setup replication to 438 - still needs logrotation and startup scripts to be done, would have taken less time than it did but vi decide to be too clever by half and not show me that the files I was editing were msdos style and not unix. CB was trying to dcap write access working, pointed him towards gsidcap but our system was still in pieces at that point so couldn’t really help out that much. Altered pools once they came up to correct settings for multiple io mover queues. Checked gftp doors now using new gftp queue.
Deleted 200+ tickets from helpdesk after mailstorm due to batch scheduler wierdness.

Dcache & RT & Work Derek on 16 Feb 2006

26/01/2006 - 16/02/2006

16th February

Built postgres 8.1.3 for SL3, helped OS with dCache PoolManager
Mailed Zeus about 2 zero length files
Did various RGMA requests

15th February

Continued configuring new SL4 postgres server
Built slony 1.5 for postgres 8.1.3
GridPP-Storage phone conf
Shadowed ST doing relocateable WN upgrade to LCG 2.7.0

14th February

Mailed CERN about link - turned out to be CERN configuration issue
Mailed Lancaster about pingable host on their end of UKLight for more monitoring
Installed new SL4 postgres server

13th February

Monday Morning Ops meeting
Tweaked RT’s web ui on replies to not attempt to set Ticket owner to current owner - was interacting with autotaking
Restarted gftp servers - all stuck at max transfers
Noticed UKLight down, mailed Site Networking
Setup multiple io queues on disk servers - began restarting quiet ones - leave rest till powerdown
Configured gftp servers to use gftp queue

10th February

Meeting with ST- reviewed Job plan
Setup autotaking of tickets on reply in RT
Reviewed SFT failures for RC report

9th February

TOAST meeting
Mailed TB about huge number of errors reported in dCache logs from file acces from lcgui02 - looks like files not being closed properly - but still not really resolved.
Added query for grid v non-grid usage to T 1 metrics page on wiki

8th February

266,270 couldn’t access yumit - turned out to be nscd still using ip address of old system - nscd -i hosts got things working again
Helpdesk fell over - rebooted

7th February

Installed new certificates - but left keys encrypted causing gridftp transfers to fail for 4 hours - fixed
Checked GridPP-Storage table’s Tier 1 historical numbers for RAS
Supplied UKLight plots to MJB

6th February

Bulk requested 8 host certificates, provided feedback on experience to JJ and MV
Supplied gridusage plots to ST
Sent around updated TOAST agenda

3th February

Holiday

2nd February

Holiday

1st February

Holiday

31st January

Reported 2 problems with yum it to CC
Talking with PS, decide that RT < -> UKIROC Footprints problem was down to problematical site mail server, configured helpdesk to not use that mail server.

30th January

Monday morning ops meeting
Asked ca people about bulk cert request script
Mailed CC & ST about yumit not displaying packages in host detail
Updated scarf helpdesk aliases to point to HPCSG’s footprints box

27th January

Supplied RAS with Grid vs Non-Grid CPU time totals

26th January

Mailed GC some questions for CHEP

Dcache & RT & Work Derek on 09 Jan 2006

Reenabled some pools on nfs60 that had gone funny and were holding login slots on gridftp doors open
Cleaned up after our helpdesk and the CA helpdesk decided to spam each other
Ran 2005 Grid vs Non-Grid CPU usage stats
Attended talk on SPEC benchmarks
Did most of work on Laptop as PSU in Desktop was playing up - but is now fixed

Dcache & RT & Work Derek on 15 Nov 2005

dCache srm very slow - script checking 39_2 replication appears to be hurting system, cancelled it grepped destination pools manually then diffed against the list from 39_2 and then ran the replication check on those that were in 39_2 but not on destination pools, so 60 files rather than 13000 files, all 60 files are not in dCache anymore so no problem.
Moved nfs39_1 & 2 to lhcb, moved nfs51_1-4 to read-only.
Updated job count to exclude dteam job looks liked it reduced usage by about a 1000 jobs a month.
Added display of stalled jobs to helpdesk index page.

Dcache & RT & Work Derek on 26 Sep 2005

Annouced downtime for SRM on Thursday fro upgrade
Discussed with RAS and JC possibilty of getting GGUS, Footprints and RT to talk to each other without creating extraneous tickets anywhere.
Began creating plots of srm stats using existing stats creation framework.
Checked logfiles to see if any discernible reason why connection from an SE at Glasgow was not getting data when using FTS, verified that FTS tranfers RAL-RAL work successfully, I suspect firewall issues somewhere.

Dcache & RT & Work Derek on 17 Aug 2005

Requested new certificates for the two jra1 box and the new CMS dCache disk. Phone support to CB, who’s installing/configuring dCache. Helpdesk box issue /tmp filled with far too many files - stopping e-mail getting through as RT couldn’t create tmp file, cleared out, need to install tmpwatch or something. Request to do testing to dpm at Glasgow from Jeremy, do some srmcp’s tomorrow, but FTS server is at risk ths week and people at Glasgow more available for monitoring next week, so leave mass transfers until then.
New syslog box is configured with DHCP address, which is wrong, should only access DHCP on re-install, need to look at installation scripts.
Edited dCache PoolManager to add settings for H1, fixed small mistake made by ST when creating pools on nfs58

Dcache & RT & Work admin on 03 Aug 2005

Worked with Phillippa to understand why Tickets from our helpdesk weren’t getting into Footprints, tunred out replies were going out with a different e-mail address to the address footprints had sent to, changed the outgoing e-mail on the Tier1a-lcg queue.
Attended SC conference call
Mailed Andrew and Jeremy SC3 problem report
Added Chris Kruk to sg CVS
Running more transfer tests in loopback - it appears that we have problems with 32 transfers in and out, whether this 32 transfers of one type or 64 in total is unknown so far.
Ran pnfs-poolcheck on CMS area after nfs42 issure- 11 files not in pool,but still need to find and report to CMS

RT & Work admin on 01 Nov 2004

Altered ESC NGS approval template to include all correspondence

Dcache & RT & Work admin on 01 Nov 2004

Friday

Began setting up regexp for ParseNewMesagesforCC for helpdesk

RT & Work admin on 29 Oct 2004

Thursday

Set up security helpdesk queue for Tier1a.

Next Page »