From GridPP Wiki
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Operational Issues
Description
|
Start
|
End
|
Affected VO(s)
|
Severity
|
Status
|
Job status monitoring from CREAMCE
|
2-Feb-2010
|
|
CMS
|
medium
|
[10-Feb-2010] WMS patch available soon; CREAMCE new version available soon [07-Apr-2010] CMS tests have shown that WMS patches resolve the problem; still waiting for patch to be installed on the production WMSs in Italy
|
Downtimes
Description
|
Hosts
|
Type
|
Start
|
End
|
Affected VO(s)
|
|
|
|
|
|
|
Blocking Issues
Description
|
Requested Date
|
Required By Date
|
Priority
|
Status
|
HW needed to test Dataguard technology for LFC/FTS
|
19 May 2010
|
15 June 2010
|
Medium
|
[24-05-2010]HW available; needs to be deployed by Fabric and then handed over to Dataservices
|
#61658: HW request for CMS Squid VOBOX
|
30 June 2010
|
|
Medium
|
[30-06-2010]Request made
|
Developments/Plans
Highlights for Tier-1 Ops Meeting
- CE03 now deployed
- Ongoing work to finalise close of SL4 batch service.
- Working on failover CMS Phedex vobox
- Grid Team thin on ground this week (A/L & WLCG workshop)
Highlights for Tier-1 VO Liaison Meeting
Detailed Individual Reports
Alastair
- Working on ATLAS software server on /afs [ongoing]
- Written script to identify unavaliable files when a disk server is taken out of production. [testing]
- Looking into Slow LHCb transfers between SARA and RAL. (fix with James T now)
- Working to improve pbsjobs database to allow easier monitoring of production work.
- Working on ATLAS Frontier service, monitoring and backup.
Andrew
- Adjustments to TFC & testing of new service class (cmsTemp) using a backfill workflow [Done]
- Put in H/W request for Fabric team for new CMS VOBOX for Squid / PhEDEx failover [Done]
- Writing call-out documentation for restarting PhEDEx on another VOBOX [Ongoing]
- Updated FTS services.xml; added new domain to RGMA ACL; updated Maui fairshares [Done]
- Accounting
- June accounting [Ongoing]
- Investigated CESGA/PBS differences due to dates used in queries [Done]
- CMS data ops
- Accounting for previous rereco/skims
- Data rereco at KIT
- MC rereco at RAL & CNAF
Catalin
- test LFC deployment using quattor [ongoing]
- LFC talk for NGS [done]
- Frontier monitoring [ongoing]
- Alice castor+xrootd issues [ongoing]
Derek
- Testing glexec update [ongoing]
- Setting up NGS UEE on worker nodes
- Deployed lcgce03 [done]
- Implementing new updated change control process on dev helpdesk
- Attending WLCG Workshop Wed-Fri
Matt
Richard
- Planning updates to RAL top-level BDIIs
- Further work on the "team status page" being developed as an action from team awayday
- Reviewing G/S process documentation
- Developed a tool to help with automating the wiki page on grid middleware versions
- Writing a Nagios plugin to check the "deltas" in # of entries in RAL BDII servers
- CASTOR items:
- Carried out latest phase in pre-prod upgrade
- Re-ran 2.1.8 functional tests on latest pre-prod s/w after latest re-config
- Started running stress tests
- Next Week
- Finishing off 2.1.7 metrics documentation
- Continuing to run stress tests on pre-prod
- 4.5 days A/L
Mayo
- Implement David Meredith's feedback into Certificate viewer [Done]
- integrate certificate viewer module with existing NGS certificate wizard code[Done]
- Create Handover Documentation for finished projects [ongoing]
- Enter job plan into ssc [Done]
- Create Certificate Query class for David Meredith [Done]
VO Reports
ALICE
- waiting for CREAM-CE 1.6 deployment at RAL
- cannot roll out new xrootd version (20100510-1509_dbg) on Castor 2.1.7
ATLAS
CMS
- Data loss: 877 files were lost from gdss67
LHCb
OnCall/AoD Cover
- Primary OnCall:
- Grid OnCall: Derek
- AoD: