J 0 Production Metrics   Status Date 01-Apr-08     90 days
Owner: Jeremy Coles
Number Title Due Date Status Links More…
1 0.100 Fraction of UK sites in Production On going OK    
I
Metric  Number of GridPP sites in certified status. Require 70%, 85% and 100% at end of Year-1, 2, 3 85% 100%        
  Data Manual - Information available from GOCDB   07Q3[1]        
  Link http://goc.grid-support.ac.uk/gridsite/gocdb/            
2 0.101 Number of registered users (DNs excluding DTEAM) On going OK I  
I
Metric  Sum of members within supported VOs should increase. 1831 5633      
Data Manual - Information from gridmap file 06Q2 07Q3[2]        
Link https://wiki.gridpp.ac.uk/wiki/RAL_Tier1_Metrics_for_GridPP            
3 0.102 Number of active users (DNs, excl DTEAM) On going OK I  
I
Metric  Number of different DN's used to submit jobs to Grid should increase 246 282        
Data Analysis of Tier-1 job manager logs[3] 06Q3 07Q3[4]        
Link https://wiki.gridpp.ac.uk/wiki/RAL_Tier1_Metrics_for_GridPP            
4 0.103 Number of supported VOs On going OK I  
I
Metric  Sum of unique VOs supported across GridPP sites. Target  9, 12, 15 at end of Year-1, 2, and 3 respectively 20 31        
Data Manual - Tier-2 quarterly reports and Tier-1   07Q3[5]        
Link http://wiki.gridpp.ac.uk/wiki/GridPP_approved_Vos See this also          
5 0.104 Number of LCG/EGEE Job Slots Published by UK On going OK    
I
Metric  Average number of jobs slots published in Quarter. Target currently 20% of EGEE Total 5400 8200        
Data Currently Manual - gtstat history from Min Tsai   07Q3        
Link http://goc.grid.sinica.edu.tw/gstat//UKI.html            
6 0.105 Fraction of LCG/EGEE Jobs Slots Used On going NOT OK    
I
Metric  Average percentage of available job slots used during last quarter 70% 51%        
Data Email from JC 31/Jan/07   07Q3        
Link http://goc.grid.sinica.edu.tw/gstat//UKI.html            
7 0.106 GridPP KSI2K Available On going OK    
I
Metric  Total GridPP KSI2K nominally available at the end of the last quarter 5747[6] 9900        
Data Manual sum over entries in quarterly reports   07Q3        
Link PMB-58-Tier-2_v1.0  and Tier1Planning26b.xls            
8 0.107 GridPP KSI2K Available to EGEE/LCG On going OK    
I
Metric  GridPP KSI2K  available to EGEE/LCG. In Tier1Plan27 target is ~50%, 60%, 70% of GridPP Tier-1 KSI2K in 2005,6,7 but Tier-2 may bias this? 5940 9450        
Data Manual sum over entries in quarterly reports   07Q3        
Link https://www.gridpp.ac.uk/deployment/status/reports/reports.html            
9 0.108 GridPP disk storage available On going OK    
I
Metric  Total TB of disk storage nominally available from GridPP at the end of the last quarter 900[7] 1480         Much closer than historically
Data Manual sum over entries in quarterly reports   07Q3        
Link https://www.gridpp.ac.uk/deployment/status/reports/reports.html            
10 0.109 GridPP disk storage available to LCG/EGEE On going OK    
I
Metric  Total TB of disk storage nominally available from GridPP at the end of the last quarter. In Tier1Plan27 target is ~60%, 75%, 75% of GridPP Tier-1 in 2005,6,7 but Tier-2 may bias this? 888 1300        
Data https://www.gridpp.ac.uk/deployment/status/reports/reports.html   07Q3        
Link Manual sum over entries in quarterly reports            
11 0.110 GridPP Tape storage available On going OK    
I
Metric  Total TB of Tape storage nominally available from GridPP at the end of the last quarter 528.275[8] 849        
Data Tier1 quarterly report Tier1Plan.xls     07Q3        
Link                  
12 0.111 GridPP Tape storage available to LCG/EGEE. On going OK    
I
  Total TB of Tape storage nominally available from GridPP at the end of the last quarter. Target is 75%, 75%, 85% in 05/6/7 according to Tier1Plan 636.75 750         Too much used by babar
Data         07Q3        
Link                  
13 0.112 Fraction of available T1 KSI2K used in quarter On going OK    
I
Metric  Fraction of available KSI2K reserved by jobs in quarter 70% 80%        
Data Metric 0.106 or 0.107 and GOC accounting for LCG. We need to know CPU time and wall clock time for LCG jobs AND any others!                
Link Tier1 OC report (Table2)       06Q4        
14 0.113 Fraction of available T1 Disk used in quarter On going OK    
I
Metric  Fraction of available disk used in quarter 70% 95%        
Data         06Q4        
Link Tier1 OC report (Table2)                
15 0.114 Fraction of available Tape used in quarter On going OK    
I
Metric  Fraction of available tape used in quarter 70% 100%        
Data Tier-1 OC report (table-1)       06Q4        
Link Data out of date (2004). Need better data                
16 0.115 Number of sites publishing LCG accounting data On going OK    
 
I
Metric  Number of sites publishing LCG accounting data. Target is 50%, 80% 95% by May 05, Sep 05, Jan 06 19 19        
Data GridPP accounting web-pages       07Q3        
Link                  
17 0.116 Percentage of total jobs run via the Grid On going OK    
 
I
Metric  Number of Grid jobs submitted to resources as a percentage of the total jobs submitted to the resources (for Tier-1 only) 60% 60% I      
Data Tier-1 OC report Need to discuss targets     06Q4        
Link Need better accounting                
18 0.117 Job failure rates   OK    
 
I
Metric  Percentage of total jobs submitted  failing due to the (GridPP) infrastructure            
Data No definitive number yet available  [9]              
Link http://ccjra2.in2p3.fr/EGEE-JRA2/QAmeasurement/showstatsVO.php?type=rb&host=prodglobal                
19 0.118 UK contribution to LHC experiments   OK    
 
I
Metric  Percentage of CPU resources provided to LHC V0s in last quarter 13%[10] 15%        
Data Manual sum over data provided by accounting portal        07Q3        
Link http://goc.grid-support.ac.uk/gridsite/accounting/                
20 0.119 UK contribution to non-LHC experiments   OK    
 
I
Metric  Percentage of CPU resources provided to Non-LHC V0s in last quarter 15% 23%        
Data Manual sum over data provided by accounting portal                 
Link http://goc.grid-support.ac.uk/gridsite/accounting/       07Q3        
21 0.120 T1 participation in GOC service challenges On going OK    
 
I
Metric  T1 able to participate in GOC services challenges when required OK OK        
Data Production Manager       07Q3[11]        
Link j.coles@RL.AC.UK                
22 0.121 T2s participation in GOC service challenges On going OK    
 
I
Metric  T2s able to participate in GOC services challenges when required OK OK        
Data Production Manager       07Q3[12]        
Link j.coles@RL.AC.UK                
23 0.122 GridPP participating in EGEE security challenges On going OK    
 
I
Metric  There are GridPP sites actively participating in EGEE security challenges OK OK        
Data Production Manager       07Q3        
Link j.coles@RL.AC.UK                
24 0.123 T1 participating in 3D database phases On going OK    
 
I
Metric  T1 is able to participate in the 3D-database challenges OK OK        
Data Tier-1Manager       07Q3        
Link R.A.Sansum@rl.ac.uk                
25 0.124 GridPP security audit On going OK    
 
I
Metric  GridPP security challenges carried out successfully OK OK        
Data Production manager/GridPP security officer       06Q4[13]        
Link j.coles@RL.AC.UK                
26 0.125 UB schedule implemented and upheld On going OK    
 
I
Metric  Quarterly review against agreed schedule shows that allocations are being met or deviate in an agreed manner OK OK        
Data Tier-1 quarterly report      07Q3        
Link Tier-1 board Chair                
27 0.126     OK    
 
I
Metric  To be removed from Project Map (see email JC 23/6/05) Too late to change for OC.            
Data                  
Link                  
28 0.127 T1 meeting pre-production service commitments On going OK    
 
I
Metric  Testbed up-to-date with required packaged within 1 month of request OK[14] OK        
Data         07Q3        
Link S.Traylen@rl.ac.uk                 
29 0.128 T1 meeting JRA1 commitments On going OK    
 
I
Metric  Testbed machines in use by JRA1 testing team Ok OK        
Data                  
Link                  
30 0.129 T1 meeting "other" user commitments   OK    
 
I
Metric  Agreed "ad-hoc" services are being provided to groups at specificed levels[15] OK OK        
Data         07Q3        
Link                  
31 0.130 GridPP LCG middleware testbed operational On going OK    
 
I
Metric  Testbed service nodes and Tier-2 site nodes available for M/w installation Yes Yes        
Data Operational machines are listed in the deployment pages   07Q3[16]        
Link http://www.gridpp.ac.uk/wiki/Category:UKI_Testzone            
32 0.131 Tier-1 service disaster recovery plans up to date On going NOT OK    
 
I
Metric  Plans documented and updated every 6 months Yes No        
Data Last modified date stamp on document   07Q3        
Link              
33 0.132 Production service risks and issues log available and up to date On going OK    
 
I
Metric  Risks and issues log available on web-site and up-to-date Yes Yes        
Data Last modified date stamp on document   07Q3[17]        
Link http://wiki.gridpp.ac.uk/wiki/Deployment_Issues            
34 0.133 Deployment team meetings  On going OK    
 
I
Metric  Deployment team meetings take place on average biweekly 90 90        
Data Manual review of UK agenda page   07Q3        
Link http://agenda.cern.ch/displayLevel.php?fid=338[18]            
35 0.134 UK wide deployment support active On going OK    
 
I
Metric  UK wide (TB-SUPPORT) meetings happen once per month (11 per year). 36 36        
Data Manual review of UK agenda page   08Q1        
Link http://agenda.cern.ch/displayLevel.php?fid=338            
36 0.135 Quarterly operational performance review On going OK    
 
I
Metric  Tier-2 quarterly reports available each quarter OK OK        
Data Review documents published in deployment area   08Q1        
Link https://www.gridpp.ac.uk/deployment/status/reports/reports.html            
37 0.136 Tier-1 delivering to LCG MoU On going OK    
 
I
Metric  Tier-1 responded to all LCG problems covered by the LCG MoU in the agreed times over the last quarter OK OK        
Data Tier-1 quarterly report    06Q4        
Link              
38 0.137 Tier-2s delivering to LCG MoU On going OK    
 
I
Metric  Tier-2 responded to all LCG problems covered by the LCG MoU in the agreed times over the last quarter OK OK        
Data Tier-2 quarterly reports   06Q4[19]        
Link              
39 0.138 Site operating system upgrades On going OK    
 
I but an understandable issue - see email from JC 2/10/07
Metric  Operating system upgrades (at non-shared) sites are carried out at 80% of sites within 2 months of requested move OK OK        
Data Weekly EGEE reports   08Q1[20]        
Link https://cic.in2p3.fr/index.php?id=roc&subid=roc_report            
40 0.139 GridPP deployment web-pages up-to-date On going OK I  
 
I
Metric  Deployment web-pages updated within last 3 months OK Yes        
Data Web-page time stamps indicate updates   07Q3[21]      
Link http://www.gridpp.ac.uk/deployment/introduction.html            
41 0.140 Training needs addressed On going OK    
 
I
Metric  Sysman meetings and Training events held. - 2 per year currently 2 2        
Data Email from JC 31/Jan/07   07Q3[22]        
Link Link1 Link2 Link3            
42 0.141 GridPP helpdesk functioning adequately On going OK    
 
I
Metric  95% CIC on duty tickets dealt with by sites within specified periods  Yes Yes        
Data Email from JC 31/Jan/07 90% Availability in 07Q3        
Link P.J.Strange@rl.ac.uk                
43 0.142 Fraction of Site Functional Tests (NOW SAM) passed over the last quarter by T1 On going OK    
 
I
Metric  Achive >= Average for all Tier-1s[23] 88% 90%        
Data         07Q3[24]        
Link j.coles@RL.AC.UK                
44 0.143     OK    
 
I
Metric  NOW included implicitlyin 0.142            
Data                  
Link j.coles@RL.AC.UK                
45 0.144 Average number of sites per quarter available in VO selections (N/a)   OK    
 
I
Metric  Not yet available          
            07Q3[25]        
                     
46 0.145 Number of GridPP (site) system security incidents in the last quarter On going OK       I
Metric  Number of security incidents logged. Would like zero but must realistically expect some. 4 0        
            07Q3        
                     
47 0.146 Number of EGEE Grid security incidents in the last quarter On going OK    
 
I
Metric  Number of security incidents logged. Would like zero but must realistically expect some. 2 1        
            07Q3        
                     
48 0.147 Sites comply with LCG/EGEE security policy On going OK    
 
I
Metric  Comply with LCG/EGEE security policy updates within one month of release OK OK        
            07Q3        
                     
49 0.148     OK    
 
I
Metric               
                     
                     
50 0.149     OK    
 
I
Metric               
                     
                     
51 0.150     OK    
 
I
Metric               
                     
                     
52 0.151     OK    
 
I
Metric               
                     
                     
53 0.152     OK    
 
I
Metric               
                     
                     
54 0.153     OK    
 
I
Metric               
                     
                     
55 0.154     OK    
 
I
Metric               
                     
                     
56 0.155     OK    
 
I
Metric               
                     
                     
57 0.156     OK       I
Metric               
                     
                     

[1]
David Britton:
05Q4: Manchester was down for installation of new hardware. Not clear whether Bristol-HP should really be included.
[2]
David Britton:
[3]
David Britton:
DN supression is active so we need one source which most logically is the Tier-1. This number should increase if the grid is being adopted and working. It may only show a slight variation for the next 8 months as only the production managers seem to be active
[4]
David Britton:
[5]
David Britton:
Email from JC Jan25/06
[6]
David Britton:
From Tier1Plan36 and SL Tier-2 spreadsheet (email 23/Jan/06)
[7]
David Britton:
From Tier1Plan35 and SL Tier-2 spreadsheet (email 23/Jan/06)
[8]
David Britton:
This is 85% of request in Dec 06
[9]
JC:
Agreed we should aim for these sorts of figures ( <30%, <15%, <5% at end of years 1,2 and 3 respectively) but how to measure it? Experiment jobs fail for different reasons than DTEAM jobs. Anyway, by selecting stable sites the experiments are already getting above 90%. Depends on VO. Could set up standard job?
[10]
David Britton:
Email from JC 8/Feb/06
[11]
David Britton:
05Q4 - there are no GOC services challenges at present so this is OK
[12]
David Britton:
05Q4 - there are no GOC services challenges at present so this is OK
[13]
David Britton:
This is really part of .122 now.
[14]
JC:
The service was being defined to be almost the same as the JRA1 testbed cluster so there was no advantage in having it. Then the service challenge work took off and it was decided with few benefits and a draw on manpower we would reassess RAL participation later in the year. However, I understand that Imperial is now actively involved anyway!
[15]
JC:
Difficult to quantify. This was referring to groups like the Storage group who need machines running for their own work. Providing they can continue to work their requirements are met. I was also thinking about the QCDGrid area. I think the Tier-1 has more of a problem in this area. Deployment needs to monitor this but perhaps the project as a whole is less interested. D0 is not satisfied.
[17]
David Britton:
JC needs to send a new link
[18]
David Britton:
Count Deployment meetings in last 3 months and add to total. Sometimes a TB-Support meeting is actually also a deployment meeting if dates coincide
[19]
David Britton:
MOU not signed yet so by definition there is no problem
[20]
David Britton:
No upgrades this Q so everything fine.
[21]
David Britton:
See Link for example
[22]
David Britton:
Course to be arranged in early 2006
[23]
David Britton:
As tests refined (become harder) then expect success rate to fluctuate especially as changes propagete through. However, set target of 33% right now and will review. We are also developing  (see email from TD mid june 05) a Tier2 monitoring sheet based on this which will help define targets.
[24]
David Britton:
Date not currently avalable due to changeover of SFTs and CIC web implementation. However we know it's OK.
[25]
David Britton:
Still not available in 05Q3 but should be available via EGEE by December.