RAL Tier1 weekly operations castor 08/03/2010

Summary of Previous Week

Matthew:
- 2.1.8/2.1.9 strategy and presented new features to Liaison meeting
- Planning for CASTOR session at GridPP24
- Database DR
- Kicked off plans for moving forward to new production database hardware
- Installed lcg_utils on castoradm3 for stress testing
- Depmon (and backup CASTOR on Day) duties
- Wrote presentation for T1 Away Day
Shaun:
- Assisted with disk server deployment problems
- Fixed t2k tape recall problem
- Implemented tweak to address CMS job problems
- CODD (Friday)
Chris:
- Continuing testing number of job slots per protocol basis
- Doing some work on Quattor Tape and Disk Server
- Start preparing test infrastructure for castor upgrades
- Implemented fix for Atlas for LFS events
- Castor on Duty person
- Friday off
Cheney:
- ..
Tim:
- Hardware installs
- CS1818 problem investigation
- Pre-prod VDQM (big-id) problems.
- T10KB drives on Pre-prod
Richard:
- Worked on new version of pre-prod benchmarking tool
Brian:
- ..
Jens:
- Expounding on the Correct Interpretation(tm) of information

Matthew:
- ..
Shaun:
- COD
- Castor Monitoring prototyping
- Testing distribution of new tnsnames file
Chris:
- Continue testing number of job slots per protocol basis. Waiting for LHCB to test rootd
- Do some work with polymorphic machines
- Prepare cold stand-by central server
- Do some work on Quattor Tape Server
- Preparing test infrastructure for castor upgrades
Cheney:
- ..
Tim:
- T10KB drive testing on Pre-prod
- Getting new tape servers into operation
Richard:
- Complete new version of pre-prod benchmarking tool and create a Wiki page to document it
Brian:
- ..
Jens:
- Getting preprod and/or cert cipped. Pick up CIP 2.2.0 again.

Large number of jobs failing due to saturation of access to small number of hot files. New service class with replica=30 added using same disk pool as cmsFarmRead to deal with this.
1 faulty Atlas tape identified (cs1818)
problems of missing RPMs on redeployed disk servers after going into production. Final disk server signoff introduced by CASTOR team members when deploying new disk servers to production.
Another BigID occurence, this time on Preprod VDQM (first time on this schema)

Entries in/planned to go to GOCDB

none