RAL Tier1 weekly operations Grid 20110228
From GridPP Wiki
Operational Issues
Description
|
Start
|
End
|
Affected VO(s)
|
Severity
|
Status
|
|
|
|
|
|
|
Downtimes
Description
|
Hosts
|
Type
|
Start
|
End
|
Affected VO(s)
|
Blocking Issues
Description
|
Requested Date
|
Required By Date
|
Priority
|
Status
|
|
|
|
|
|
Developments/Plans
Highlights for Tier-1 Ops Meeting
Highlights for Tier-1 VO Liaison Meeting
Detailed Individual Reports
Alastair
- Working on ATLAS permission change. [On hold]
- Setting up xrootd for ATLAS at RAL.
- Talking to ALICE
- Looking into upgrading castor client on all WN.
- Disk pool merging and DB change.
- Cleaning up dark data
- writing change control
- Preparing for Beauty 2011 conference.
- Setting up laptop again...
Andrew
- Capacity planning [Ongoing]
- Sorting out CMS file deletion permission problems
- VOBOX proxy renewal checker/restarter Nagios plugin [Done]
- CMS data ops
- Reprocessing at RAL, FNAL, ASGC, IN2P3 (variety of issues)
- Investigating jobs lost in CREAM CE
Catalin
- two new VOS to be added to the LFC
- involved with CREAM CEs installation and configuration [ongoing]
- GGUS issue with pheno affecting lcgwms03 [done]
Derek
-
Matt
- Researching Hadoop (HDFS). [New]
- Prep for Tier-1 Resources meeting. [New]
- Quattorise lcgfts02. [New]
- Contact NFS users. [Ongoing]
- Second phase of migration of FTS agents to Quattorised h/w. [Done]
- Test new MAUI configuration for gridWN queue. [Done]
- Update CRL checks Nagios plugin. [Done]
- Deploy Derek's CA patch for Quattor. [Done]
Richard
- Top level BDIIs now updated to level 21. [Done]
- Moved 2 site BDIIs into UPS room for increased resilience. [Done]
- Moved 1 top BDII into UPS room for increased resilience. Now need to move one more top BDII. [Ongoing]
- Trying out new hypervisor (hv-10) to see how much performance has improved (have moved an existing VM across to the new h/v) [Ongoing].
- Building an ARGUS server using the new QWG templates [Ongoing]
- Working on the "team status page" being developed as an action from team awayday [Ongoing]
- Reviewing G/S process documentation [Ongoing]
- CASTOR items:
- Developed a script to stress test FTS xfers in/out of preprod instance. [Ongoing]
VO Reports
ALICE
ATLAS
CMS
- RAL is back near the top of the CMS site readiness rank for Tier 1s.
- FNAL intend to setup 300 workernodes to use CVMFS instead of NFS for software areas on Thursday. Note that CMS don't want all Tier 1s to rush out and use CVMFS yet.
LHCb
OnCall/AoD Cover
OnCall Rota
- Primary OnCall: Catalin
- Grid OnCall:
- AoD: