Storage Accounting

From GridPP Wiki
Revision as of 16:30, 16 August 2007 by Greig cowan (Talk | contribs)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

GridPP have deployed a storage accounting system that covers all sites in EGEE. The current version can be found here. Comments, feedback and questions should be directed towards User:Greig_cowan.

Methodology

We collect storage information per VO per site by performing an ldap query of a top level BDII. The information is inserted directly into a MySQL table according to this schema:

+------------------+--------------+-------------+-------------+
| Column name      | Type         | Primary key | Can be NULL |
+------------------+--------------+-------------+-------------+
| RecordIdentity   | VARCHAR(255) | Yes         | No          |
| ResourceIdentity | VARCHAR(255) | No          | Yes         |
| Grid             | VARCHAR(50)  | No          | Yes         |
| ExecutingSite    | VARCHAR(50)  | No          | Yes         |
| VO               | VARCHAR(50)  | No          | Yes         |
| SpaceUsed        | INTEGER      | No          | Yes         |
| SpaceAvailable   | INTEGER      | No          | Yes         |
| Total            | INTEGER      | No          | Yes         |
| Unit             | VARCHAR(50)  | No          | Yes         |
| SEArchitecture   | VARCHAR(50)  | No          | Yes         |
| Type             | VARCHAR(50)  | No          | Yes         |
| srmType          | VARCHAR(50)  | No          | No          |
| EventDate        | DATE         | No          | No          |
| EventTime        | TIME         | No          | No          |
| MeasurementDate  | DATE         | No          | No          |
| MeasurementTime  | TIME         | No          | No          |
+------------------+--------------+-------------+-------------+

The accounting front-end dynamically generates historical rrd plots showing how the used storage at a site (or within a ROC) has changed over time. The user can select the VOs that they are interested in. This system is the first to provide a method of viewing how storage space is being used by VOs at sites within EGEE.

History

GridPP had a basic storage monitoring system that was started back in spring 2005. This proved useful in following the deployment of storage resources at sites and particularly the roll-out of SRM middleware (dCache and DPM). Like the current storage accounting system, the used and available storage data was extracted from the Grid information system (querying each site BDII rather than a single top level BDII as happens now). However, as sites acquired more storage and we became more familiar with the SRM middleware, sites began to dedicate storage to particular VOs, meaning that just looking at the numbers for dteam was insufficient. Also, the original system only collected information for GridPP and stored historical information (with a 24hr resolution) in a flat file. Some of the improvements that the new system has over the original are listed here:

  • Collects used and available space information for all VOs supported at a site.
  • Collects used and available space information for all sites within EGEE.
  • Distinguishes between disk and tape storage.
  • Queries a top level BDII for the information rather than each individual site BDII. This makes the system more robust to site failures and downtimes.
  • Inserts the data into a MySQL database rather than a flat file.
  • Provides a user front end to the database.
  • Historical disk usage can be displayed for individual sites or at ROC level.
  • Historical disk usage can be displayed over a range of time periods.
  • Current disk usage information is provided at ROC level and can be downloaded in CSV format.
  • The interface allows a custom number of VOs to be displayed at any one time.

Open issues

Overview

There are a couple of issues that we need to be aware of.

First of all, I think there is a problem with the vocabulary we are using to describe storage. "Available" seems to mean different things to different people. This is how I see it:

Capacity: this is the total amount of disk that is currently online and managed by the SE for all VOs. This is the number that Steve reports from the QRs.

Used: this is the total amount of disk that actually has data stored on it and is accessible by the experiments.

Available: this is the space which does not contain any data but which is available for use by a VO. For example, at Edinburgh we have allocated Atlas 5.5TB. Of this total allocation, they have used 5.4TB, meaning that they now only have 0.1TB available to them. Maybe this name should be changed to "Free", but that is not what is being used in the GLUE schema. That is why in the storage accounting table we report "Allocated" and define this to be the sum of GlueSAStateSpaceUsed and GlueSAStateSpaceAvailable.

Secondly, there are some caveats that you should be aware of.

For DPM sites with the new GIP plugin (i.e. all GridPP DPM sites) the used space is correctly reported per VO. This is not necessarily true for the available space. Take UKI-SCOTGRID-GLA as an example. All VOs can write into one of its DPM disk pools. The current plugin then states that all VOs have 22TB of available space. In a sense, this is correct: all VOs have the potential of writing 22TB of data into the DPM. What they can't do is all write 22TB at the same time. Therefore, it is incorrect to add up these "available" numbers to get the total available space.

For dCache, the situation is slightly different. For those sites running the standard dCache GIP plugin and who have *dedicated* resources for each VO, then the used and available numbers are correct. ScotGRID-Edinburgh is an example where this is the case. If there are no dedicated resources, then the same numbers will be advertised for both used and available space. For example, UKI-LT2-IC-HEP does this for all VOs apart from atlas, cms and dzero. The remaining VOs all have 0.21TB used and 4.31TB available (when in reality only one or two of these VOs have actually used any space).

You should be aware that when looking at the storage accounting page, you should select to look at ALL VOs, not just the LHC VOs.

In general, the used numbers as reported by the accounting should be accurate. In many cases (particularly for DPM sites) the allocated numbers (=used+available) are higher than they should be due to overcounting. One thing that we could implement in the system is a filter that scans the available spaces at a site for values that are identical. This would be a sign that the VOs are sharing resources. This is something that I will discuss with Dave.

Finally, all of this will change once we move to SRM2.2 due to the introduction of space reservation for VOs and VOMs roles. This has the potential for introducing hard limits to the amount of space that a VO has available to it (free space) so may make gathering accounting information easier.

Details

For a full description and discussion of the open issues (and their solutions) see the LCG Savannah project page.

1. Storage display is slow to load up, most notable at the tree-branch level.

2. The total storage numbers for each VO (allocated = used + available) should not be summed to give a Total storage for each site. Unless a site has dedicated resources for each VO, VOs share SE disk space. This means that while XTB may be available to them, it is not available to all VOs simultaneously. By adding the allocated numbers together, we will end up n-tiply counting the storage. This is not something that we have control over, but is something that we should be aware of.

3. The deployment of SRM 2.2 and the use of the GLUE 1.3 to describe the storage spaces is something that will have to be monitored closely and suitable changes made to the system. With GLUE 1.3 it should be possible to account for storage at the level of a particular space token description (i.e. ATLAS_AOD) rather than just at the VO level. User accounting will not be possible with this schema, nor is it desired. The grid information system is for resource discovery, not accounting.

Features Wish List

1. [Dave Kant; Feb 16th 2006] Autoupdate the storage trees.

Completed actions

1. Changing format of RecordIdentity in StorageRecords to include site name. This prevents collisions when two (or more) sites share the ame SE.

2. Script moved to goc02.grid-support.ac.uk. Stability issues should significantly reduce since RGMA is not involved anymore. Now writing directly into MySQL database.

3. CSV output of current disk usage data in table.

4. Implement a fix to preventing n-tiply counting an SE that is shared among sites within the same Tier-2 (i.e. IC-HEP and IC-LeSC). This is not actually required. IC-HEP and IC-LeSC were reporting the same SE as two unique SEs in the BDII which is incorrect.

5. Made the LHC-view the default view when loading the accounting page.

6. [Kors Bos; Jan 9th 2006] I would also like an LHC view for the storage accounting pages. I do not want to see all the VOs the T1s are supporting.

7. [Olivier van der Aa; Feb 21st 2007] Suggested that it would be good if people could download the historical information in CSV format. For example, a table with one column for each day and one row for each VO. (I'm not sure how easy this would be to implement and there does not appear to be an obvious use case at the moment, but it is something to consider. It should be noted that RGMA can be used to access the raw data if it is required).

8. Include link to this page or Greig's email address on the storage accounting page so that users can give feedback.

9. In the RRD plots for the last day, there is always a drop in the used storage during the last data interval. This drop disappears once a new data point is available. Need to understand where this artifact comes from. SOLUTION: Artifact was removed when we moved to taking data every 8 hours.

10. Its true that the PPS site should be in the PPS tree and the production sites in the Production tree. Sites like EFRA-JET need to go somewhere, perhaps another sub-branch of the tree? GridPP should advise us here. SOLUTION: The PPS sites have been moved from the production part of the tree while EFRA-JET remains in SouthGrid.

11. Basically, the old sites appear in the tree because they are listed in GOCDB. We need to check that the tree only picks up production sites that are certifified. I think I need to modify the tree code to include only those sites where "status=certified". SOLVED.

12. Data for the ralpp VO is not being correctly graphed. The correct data is in the database, so there must be some problem with the SQL statement that is selecting the columns during the creation of the rrd plot. SOLVED

13. Per-site view when you are looking at the overall Tier-2 or ROC level. OK to lose the VO information in this case. SOLVED: Graphs now available

[John Gordon; Jan 6th 2007] If I look at a region like France I can see the aggregates but I cannot see which sites contribute to it. During the current data checking phase this is an issue. At the moment one needs to drill down to each site in turn. Could we have an option like CPU where one can see each site in a different colour. I suppose one would have to loose the display of VO but still allowing the selection of VO would be good. Ie allow selection of ATLAS but display each site in the region in the coloured stack.Dave, one issue that the pre-GDB meeting raised was reporting of installed capacity. The only way I can see to do this is to sum the used and available figures in the Storage UR. I understand that there is an issue of double counting when an SA is shared but I think we should start reporting this anyway. For the bigger sites and LHC VOs this may not be much of an issue and if we expose the issue by comparing our numbers with the manual disk reporting then we are more likely to get it fixed.

14. Greig needs to find out from PMB/Jeremy about precisely what information would be required.

15. Should we dump the first 3 months of data from the database now that the system reliability has greatly improved? SOLVED: Initial data now dumped.