RAL Tier1 weekly operations Grid 20091214

From GridPP Wiki
Jump to: navigation, search

Summary of Previous Week

Developments

  • Alastair
    • Deployed 2 Disk servers.
    • Contacted Panda/Ganga developers to improve error information for ATLAS jobs at RAL.
    • Tested poweruser analysis at RAL, found problem with CERN WMS.
  • Andrew
    • Completed November accounting
    • Updated trace-job.pl to run on CREAM CE
    • Updated FTS configuration due to a missing endpoint
    • Continued work on CMSSW TTreeCache & read coalescing patches IO testing
    • Testing of PhEDEx dev instance on lcgvo0599
    • Deteled CMS data from /store/unmerged; carried out PhEDEx storage consistency check
    • Completed preliminary CMS computing model spreadsheet
  • Catalin
    • worked on MySQL migration plan
    • worked on LHCb VOBOX quattorising
    • had discussions on FronTier issues
    • still waiting from LFC@CERN feedback for recovery and consistency checks
  • Derek
    • Implemented Change Control process on dev helpdesk
  • Matt
    • Prepared Tier-1 review presentation
    • Added caching CIP plugin on site BDIIs
  • Richard
    • Attended Cheney's NRPE training
    • CASTOR activities:
      • Completed the "data configurator" tool for sending config files to quattorised CASTOR servers
      • Continued activity on SLC 4.8 templates
      • Wrote a script to complete the post-install setup of CASTOR machines in new pps instance
  • Mayo
    • Created admin UI for metric system and wrote system user documentation
    • created user account for Sarah Pearce to enable testing with regads to the possible gridpp extension
    • Attended Cheney's NRPE training
    • Worked on automating tape robot spreadsheet project

Operational Issues and Incidents

Description Start End Affected VO(s) Severity Status

Plans for Week(s) Ahead

Plans

  • Alastair
    • Try and fix Poweruser issues
    • Look into "slow" FTS rates in UK Cloud.
  • Andrew
    • Continue CMSSW TTreeCache IO & read coalescing patch testing
    • Attend PPD Christmas lunch
  • Catalin
    • continue work on MySQL migration
    • LHCb VOBOX
    • decomission old SL4 ALICE VOBOXes
  • Derek
    • Rollout change control process on production helpdesk
    • Test, implement and document proposed disaster mitigation for lcgcenfs
  • Matt
    • Test new production CIP on test site BDII
    • Tier-1 Review
    • GridPP4 input
  • Richard
    • Write detailed plan for proposed BDII changes during January
    • CASTOR activities:
      • Add in configuration data to pps machines built via kickstart scripts
      • Set up database connections on new pps machines
      • Continue activity on SLC 4.8 templates
  • Mayo
    • Work on Metric system: adding change password feature for users / report printing features
    • Work on possible exstention of system to include Gridpp
    • Continue working on automated spreadsheet project

Resource Requests

Downtimes

Description Hosts Type Start End Affected VO(s)

Requirements and Blocking Issues

Description Required By Priority Status
LHCb SL5 64bit VOBOX deployment using Quattor 25 Nov 2009 Medium Quattor recipe not yet available (RT#53392)
Hardware for testing LFC/FTS resilience High DataServices want to deploy a DataGuard configuration to test LFC/FTS resilience; request for HW made through RT Fabric queue
Hardware for PPS High We have made a commitment to test PPS pre-releases, and have no hardware dedicated for this.
Hardware for Grid Services testbed Medium

OnCall/AoD Cover

  • Primary OnCall: Catalin (Mon, Wed-Sun)
  • Grid OnCall:
  • AoD: