RAL Tier1 weekly operations castor 13/09/2010

From GridPP Wiki
Jump to: navigation, search

Work previous week

  • Matthew:
    • Debugging gridftp problems on SL5 and found fix by downgrading VDT
    • Testing transfers on preprod
    • 2.1.9 change control document
  • Shaun:
    • Testing frewall by pass rules
    • xrootd testing (with and w/o ALICE security)
    • Moved instances to local nameserver
    • ASGC support
    • Monitoring LHCb production instance
  • Chris:
    • ..
  • Richard:
    • Helped Cheney with quattor issues building head nodes for facilities instance
  • Brian:
    • ..
  • Jens:
    • ..

Operations Issues

  • Still transfer problems with SL5 disk servers. lcg-cp works but lcg-cr doesn't for ATLAS
  • LHCb is still throttled
  • testing gridftp-internal externally showed that there is a configuration problem in the site firewall causing some transfers to fail
  • There are transfer problems to a new batch of disk servers at NDGF affecting only RAL


  • Wrong firewall settings are preventing a number of new (ssv06-***) disk servers transferring externally

Blocking issues

  • Any ongoing production problems at present will jepordize the timeline for starting 2.1.9 upgrades at the end of this month.

Planned, Scheduled and Cancelled Interventions

Entries in/planned to go to GOCDB

Description Start End Type Affected VO(s)
Update LHCb to use local firewall 13/03/2010 10:00 13/03/2010 12:00 At-risk LHCb
Update Gen to use local firewall 14/03/2010 10:00 14/03/2010 12:00 At-risk Gen
Update CMS to use local firewall 15/03/2010 10:00 15/03/2010 12:00 At-risk CMS
Update ATLAS to use local firewall 16/03/2010 10:00 16/03/2010 12:00 At-risk ATLAS

Advanced Planning

  • Upgrade to 2.1.9 2010


  • Castor on Call person: Shaun
  • Staff absences:
    • ..