RAL Tier1 weekly operations Grid 20100517
From GridPP Wiki
Revision as of 12:32, 19 May 2010 by Matt hodges (Talk | contribs)
Contents
Operational Issues
Description | Start | End | Affected VO(s) | Severity | Status |
---|---|---|---|---|---|
Job status monitoring from CREAMCE | 2-Feb-2010 | CMS | medium | [10-Feb-2010] WMS patch available soon; CREAMCE new version available soon [07-Apr-2010] CMS tests have shown that WMS patches resolve the problem; still waiting for patch to be installed on the production WMSs in Italy |
Downtimes
Description | Hosts | Type | Start | End | Affected VO(s) |
---|---|---|---|---|---|
Blocking Issues
Description | Requested Date | Required By Date | Priority | Status |
---|
Developments/Plans
Highlights for Tier-1 Ops Meeting
- Handover BDII services to Richard
- Disk deployment meeting on Tuesday
- Upcoming meetings
- wLCG T0/T1/T2 workshop (July 7-9, Imperial)
- EGI Technical Forum (September 14-17, Amsterdam)
Highlights for Tier-1 VO Liaison Meeting
- Deployed 35 disk servers into production for ATLAS
- Requested deployment of V08/V09 servers to nonProd to meet 2010 wLCG pledges
- Testing CREAM CE 1.6 (required by ALICE)
- Test FTS2.2.4 upgrade
Detailed Individual Reports
Alastair
- Working on ATLAS software server upgrade
- Deploying 35 disk servers into production for ATLAS
- Working on testing ATLASGROUP disk at RAL.
- Looking into ATLAS PFC (Pool File Catalogue) problems.
Andrew
- APR [Ongoing]
- Tidying up APEL problems (replacing missing data on 19-20th March; fixing SpecInt2000 for April, May)
- April accounting [Done]
- Added a new endpoint to FTS (for T2_EE_Estonia) [Done]
- Installing & setting up PhEDEx on SL5 VOBOX [Ongoing]
- Writing change-control & new service checklist documents for PhEDEx
- Migration to use of FTS groups in FTS "cloud" channels [Ongoing]
- CMS data ops
- Backfill at RAL & PIC [Ongoing]
Catalin
- Atlas Frontier server updates
- work on CMS Phedex Nagios monitoring [ongoing]
- configure squid on LHCb VOBOX [ongoing]
- gLite updates on LHCB VOBOX [ongoing]
- LFC/FTS replication (w/ Carmine) [ongoing]
- job plans [ongoing]
- WMS reconfiguration (ops/lcgadmin, fusion) [done]
Derek
- Intervention on lcgce06 for glexec [Done]
- Intervention on lcgce07 for glexec
- Sync of templates with QWG for glite 3.1 and 3.2 [done]
- Testing CREAM CE 1.6
Matt
- Job Plans
- Test FTS2.2.4 upgrade
- Handover BDII services to Richard [Done]
- APRs [Done]
- Request disk deployments to meet 2010 wLCG pledges [Done]
- Capacity Planning (meeting with Andrew L) [Done]
- Site BDII performance problems [Done]
- Propose to UB schedule for decommissioning of SL4 capacity [Done]
Richard
- APR [Done] - Job Plan [ Completing / SSC-ing ]
- Looking at the site-bdii timeout problem
- Working on proposal on intra/inter -team communication to meet an action from the team awayday
- Reviewing G/S process documentation
- Further Nagios items from the to-do list (https://wiki.e-science.cclrc.ac.uk/web1/bin/view/EScienceInternal/NagiosTasksToDo)
- CASTOR items:
- Writing up results from p/p stress tests
- Preparing ground for using a per-instance nameserver (rather than the central one)
Mayo
- Implement feedback into TSBN web interface [Done]
- Set up scripts that update TSBN interface to run as scheduled jobs on a windows machine
- Certificate viewer for NGS cert wizard first prototype [Done]
- Implement David Meredith's feedback into Certificate viewer
- Write a script to turn PDU ports on/off [Done]
- Write script to control ports on multiple PDUs
- Create Handover Document tation for finished projects
VO Reports
ALICE
ATLAS
CMS
- This week moving to multiple primary datasets due to the recent (and upcoming) increases in luminosity. This means each Tier-1 will get at least one PD. Acquisition era changing from Commissioning10 to Run2010A.
LHCb
OnCall/AoD Cover
- Primary OnCall:
- Grid OnCall: Derek
- AoD: