Glasgow Accounting Documentation

From GridPP Wiki
Jump to: navigation, search

The Glasgow Scotgrid online accounting webpage provides immediate access to performance statistics of all grid jobs that have successfully ran on the Glasgow system.

Whilst the current implementation of the online accounting provides satisfactory results it is to some extent only a first iteration; recently a few suggestions have been put forward to improve usability have not yet been implemented. This documentation has been prepared to give an overview of the current version and outline outstanding work for whoever is asked to review and extend the code already in place. It is assumed that the reader has some background in MySQL and PhP programming but not necessarily a firm grounding in Grid Computing. If this is not the case there are many online guides that offer excellent tutorials in MySQL and PHP.

This guide is intended to provide an overview of how the current web interface is implemented and the various .php files interact. This supplements the detailed comments that have been embedded in each of the scripts and php files.

Overview

Populating the MySQL database

The following cron job is run every hour on svr031

/etc/cron.d/rsync_and_pbsi:

PATH=/sbin:/bin:/usr/sbin:/usr/bin
10 * * * * root /usr/bin/rsync  -av svr016:/var/spool/pbs/server_priv/accounting/* /home/pbslogs > /dev/null
20 * * * * root /usr/local/bin/pbsinsertion.pl > /dev/null

It first rsyncs over the PBS log files from svr016 and stores them in /home/pbslogs on svr031. Then the script /usr/local/bin/pbsinsertion.pl is run which extracts any important information and stores it in the MySQL database where it can be accessed through the webpage.

pbsinsertion.py is discussed in much greater detail in the section Glasgow_Accounting_Documentation#Details_of_the_pbsinsertion.py_script.

PBS logs

The PBS logs contain a large number of records, each starting with a unique identifier of a particular job in the form of a job identifier number followed by a server number. Records come in two different types; start and end.

A start record is information that PBS writes when a job starts to run on a cluster. It contains information about when it started and on which host it ran.

The end records contain more useful (and interesting) information about what the job actually did and the resources used. This information includes:

  1. Whether the job successful
  2. The memory used
  3. The virtual memory used
  4. The CPU time (as measured by CPU cycles taken)
  5. The Wallclock time taken (as measured by an external clock)
  6. The VO the user belongs to
  7. The submission and job completion date

From this raw data it is possible to calculate quantities like efficiency (cpu time/wall clock time) which is very useful for measuring how well the Glasgow grid is performing.

The PBS logs do not however contain the DN of the user who submitted the job, this information is instead inserted by a second script /usr/local/bin/DNcode/parse-it.py. Unfortnately this script does not yet function correctly and fixing (or replacing) it remains one of the outstanding tasks to the reader. Only a small subset of this information has been successfully inserted to date.

parse-it.py

From an accounting point of view it is of interest to tie up the information store in the pbs log records with the DN of the user who submitted the job. The mapping from the DN to the pool account (which actually submits the job) is performed by Globus and is reported in /var/log/messages whenever it is successfully performed.

parse-it.py is a script which greps /var/log/messages files for key phrases indicating that a mapping between a DN and a glasgow pool account has been completed, storing this information in an array. A connection is then made to the MySQL database and an attempt is made to correctly populate the pbstable table. However for an as yet undetermined reason this script does not work all the time and in particular has trouble accessing files that may have been spread over a number of messages.$j.gz files.

Once properly implemented parse it may be ran nightly on a cron job with something like the following implementation

for j in $(seq 3 18); do
        echo Processing /var/log/messages.$j.gz and  /var/log/messages.$(($j-1)).gz;
        gunzip -c  /var/log/messages.$j.gz /var/log/messages.$(($j-1)).gz | ./parse-it.py;
done

parse-it.py can be found at /usr/local/bin/DNcode/ on svr031.

Local Accounting Webpage

The Glasgow accounting webpage allows the user to specify both a date range across which to query the database as well as which VOs in particular are to be included in the search. The MySQL database is then accessed through the PhP script and information such as efficiency, wallclock time and cpu time are extracted. This information is then presented in the form of pie and bar charts using the Jpgraph package.

The php files for the webpage are stored in /var/www/html/accounting/localaccounting on svr031 although subversion is used to manage the various versions created. To checkout the latest version stored in subversion you may use the command:

svn checkout  svn+ssh://username@grid01.ph.gla.ac.uk/SVN/scotgrid/uki-scotgrid-glasgow/accounting .

replacing username with your own.

Details of the pbsinsertion.py script

pbsinsertion.py is the interface between the pbs logs and the MySQL database. As such it requires knowledge of how to interface with the MySQL database within a python environment. pbsinsertion.py is run hourly and collects information from the pbs logs, storing them in a MySQL database. Information from the end records is held in the localaccounting table. If this table does not exist it is created and filled entirely from the first to last entry. If the table is partially filled the date of last update is checked and entries are parsed from this date onwards.

pbsinsertion.py also creates the colorstable table which is stores a unique colour for each of the VOs. This table is used to fill the entries in JPGraph to ensure consistency.

# Subroutine to construct and execute SQL command to yield all groups/VOs then create color 
table.
sub create_colorstable
{
  print "     Subroutine create_colorstable called. This gets all groups/VOs and creates color table.
 \n";
  $colorstableCreate = $dbh->prepare("create table colorstable
  (
  grp varchar(10) NOT NULL,
  PRIMARY KEY(grp),
  color varchar(7)
  )
  ");
  $colorstableCreate->execute();
  insert_colorstable();
}
# Subroutine to construct and execute SQL command to insert groups/colors into colorstable.
sub insert_colorstable
{
  print "          insert_colorstable called. This inserts groups/colors into colorstable. \n";
  $getgrps = $dbh->prepare("select distinct grp from pbstable order by grp");
  $getgrps->execute();

  while (@grps = $getgrps->fetchrow_array)
  {
#    print "          ... inside insert_colorstable \n";
    $md5sum = `echo $grps[0] | /usr/bin/md5sum`;
    $md5=substr($md5sum,0,6);
    $colorsentry = "'$grps[0]', '#$md5'";
    print $colorsentry;
    # construct and execute the insert command that shall inject each group/color pair into the color
 table
    $colorsinsertion = $dbh->prepare("insert into colorstable (grp, color) values
    (
    $colorsentry
    )");
    $colorsinsertion->execute();
  }

  print "\n";
}

pbsinsertion.py is also used to create the nodes table which stores information on each of the nodes available at glasgow, specifically the slots available on each node and the kspec factor used to compare nodes of differing processing power. At present all nodes have 4 slots and a kspec factor of 1.533. If more nodes are added this must be expanded.

# Subroutine to construct and execute SQL command to create kspec scale table
sub create_nodetable
{
    print "     Subroutine create_nodetable called. This creates files parsed table. \n";
  my $nodetableCreate = $dbh->prepare("CREATE TABLE node(node char(7), spec float, slots int)");
  $nodetableCreate->execute();

#yes i'm sure there's a much nicer way to do this, probably using printf but i'm not sure how exactly  
and i'm in a rush
    for($i=1; $i<141;$i++){
        $nodeentry = "'node00$i', '1.533', 4";
        if($i>9){
            $nodeentry = "'node0$i', '1.533', 4";
        }
        if($i>99){
            $nodeentry = "'node$i', '1.533', 4";
        }
        $nodetableInsertion = $dbh->prepare("insert into node(node, spec, slots) values ($nodeentry)");
        $nodetableInsertion->execute();
    }

}

Finally pbsinsertion.py creates the table date which for every date lists the total number of slots available.

# Subroutine to construct and execute SQL command to create date table listing the total slots 
active on any particular day
sub create_datetable
{
    print "     Subroutine create_datetable called. Adding node information for today. \n";

    if(!$datetable_exists){
        my $datetableCreate = $dbh->prepare("CREATE TABLE date(date datetime, total_slots int)");
        $datetableCreate->execute();
    }

    my (@s_date, $today) = @_;

#take start date and todays date and calculate the number of years they cover.

    $start_date = ParseDate("$s_date[1]/$s_date[0]/$s_date[2]"); #note american format
    $this_date = ParseDate("$today[1]/$today[0]/$today[2]"); #note american format

    $date_counter=$start_date;
    $testcounter=0;

    #loop over these days and at each date insert the corresponding date and the total number of 
slots available on that day
    while($date_counter le $this_date){

        $date_counter_output=&UnixDate($date_counter,"%Y-%m-%d ");
#       print "$date_counter_output \n";


#add on $i days to start date and store new date in $d
        $d_morning = $date_counter_output." 00:00:00";
        $d_night = $date_counter_output." 23:59:59";

#get the total number of slots from the database on date $d
        $s = 0;

        $getslots = $dbh->prepare("
        select slots, count(slots), slots*count(slots) as product from node where CommissionDate <=
 '$d_morning' and (DecommissionDate >= '$d_night' or DecommissionDate is null) group by slots
");
        $getslots->execute();
        while (@slots_array = $getslots->fetchrow_array)
        {
#           print ("i=$i, $slots_array['2'] \n");
            $s=$s+$slots_array['2'];
        }
#       print ("slots product total= $s \n");

        $dateentry = "'$d_morning', '$s'";
        $datetableInsertion = $dbh->prepare("insert into date(date, total_slots) values 
($dateentry)");
        $datetableInsertion->execute();

        $testcounter=$testcounter+1;
        $date_counter=&DateCalc($date_counter,"+ 1day",\$err);

        $err=$err;
    }
}

Details of the Local Accounting Webpage

Usage of Style Template

The standard style of the Glasgow University accounting page is achieved easily by including PageTemplateTop.php and PageTemplateBottom.php in first and last lines respectively of all html files. The following code has been prepared and stored in testindex.php to illustrate how to use these files and creates a webpage that looks like this.

<?php           include ("PageTemplateTop.php"); ?>
Not much here!
<?php           include ("PageTemplateBottom.php"); ?>

PageTemplateTop.php first sets up the desired stylesheet and the system of table headers and rows necessary to create the standard output. It then includes header.php which sets up the asthetics for the standard purple header across the top of each page, calling many of the jpg files stored in /var/www/html/accounting/localaccounting. PageTemplateTop.php also calls the file menucolumn.php which controls the menu on the left hand side and contains the code:

 <ul>
<li><a href="CPUQuery.php"><font color=6600CC size=4>CPU History Query</font></a></li>
<li><a href="CPUQuery_discrete.php"><font color=6600CC size=4>Historical Efficiency</font></a></li>
</ul>

If you wish to alter or add to the menu it is necessary to augment menucolumn.php.

Finally PageTemplateBottom.php closes all of the table rows that have been set up and provides an acknowledgement to the provider of this template.

Accessing the MySQL database

Pages that require access to the MySQL database contain the following code

include("MySQL_User_Password.php");

# connect to MySQL and use appropriate database
$link = mysql_connect($server, $user, $password);
if (!$link)
{
  die('Could not connect: ' . mysql_error());
}
mysql_query("use localaccounting");

MySQL_User_Password.php sets up the parameters necessary to access the database such as the database name and the user account. If the global access password for MySQL is ever changed it will necessary to edit this file accordingly.

CPUQuery.php

Accessing CPU History Query from the left hand column of the local accounting webpage takes the user to CPUQuery.php. He is then asked to select:

  1. A start and end date over which the query is to be performed
  2. The VO's he wishes to include
  3. The graphs to be displayed

There is also an option to display how the CPU efficiency varies with time in both bar and line graph format. If selected the user is invited to specify the bin size (day, week, month) that these graphs are prepared in. This page uses the post method to transmit the user submitted information to CPUResult.php.

CPUResult.php

Takes information from CPUQuery.php and uses it to query the MySQL database using the following code:

SELECT grp, SEC_TO_TIME(SUM(end-start)) as WallClockH, 
SEC_TO_TIME(SUM(TIME_TO_SEC(resources_used_cput))) as TotalCPUHours, 
SEC_TO_TIME(SUM((end-start)*spec)) as kSI2k, COUNT(jobNumber) as Jobs, 
sum(TIME_TO_SEC(resources_used_cput))/sum(TIME_TO_SEC(resources_used_walltime)) as efficiency FROM 
pbstable,node where dateTime >= '$startDate' and dateTime < '$endDate' and node.node = 
left(pbstable.exec_host,7) GROUP by grp

Once the necessary information has been calculated and displayed in a table format a call is made to pieQuery.php. Start and end date information is passed as well as the list of groups of interest to the user.

if($pie=="on"){
# Call, and pass the parameters to, the pie chart generating script. That shall return a stream.


$mode="pie";
  echo "<div align=\"center\"><img src=\"pieQuery.php?";
  for ($g=0; $g<$numGroups; $g++)
  {
    echo "groupsarray[]=$groups[$g]&";
  }
  echo "mode=$mode&startDay=$startDay&startMon=$startMon&startYea=$startYea&endDay=$endDay&endMon=$endMon
&endYea=$endYea\" name=\"pie chart\"/></d\
iv>";

}

CPUResultDN.php

CPUResultDN.php borrows a lot of the functionality of CPUResult.php but instead of a breakdown by VO it gives a breakdown by the various DNs. This page is accessed by clicking on a VO link in the results table in CPUResult.php.

CPUResultDN.php is not not working as intended because not every entry in the MySQL database has information on the DN who submitted the job. September 2007 contains enough DNs to provide a decent sampling of what the intended result should be, so scanning in this range is recommended. As a result the jobs submitted quantity is swamped by unknown meaning that no DN has been ascribed to the job. Once parse-it.py has been properly implemented this should resolve itself. I would also consider removing the legend from the plot since this obscures the final bin. The legend is not strictly here needed since the same colour scheme is used in all of the plots.

pieQuery.php

Whilst it is called pieQuery.php this file produces both bar and pie charts and so requires the include code:

include ("src/jpgraph.php");
include ("src/jpgraph_pie.php");
include ("src/jpgraph_pie3d.php");

include ("src/jpgraph_bar.php");

You can find the jpgraph online manual which gives useful worked examples of how to use the package to implement useful effects.

The output of pieQuery.php is determined by the parameter $mode which is passed to the page. The following snippet determines what MySQL query is performed for each mode

if($mode=="pie" || $mode=="bar" )$query = "SELECT grp, SEC_TO_TIME(SUM(end-start)), 
SEC_TO_TIME(SUM(TIME_TO_SEC(resources_used_cput))) as TotalCPUHours FROM pbstable where dateTime >= 
'$startDate' and dateTime < '$endDate' GROUP BY grp";

if($mode=="pie_eff" || $mode=="bar_eff")  $query = "SELECT grp, 
sum(TIME_TO_SEC(resources_used_cput))/sum(TIME_TO_SEC(resources_used_walltime)) as efficiency FROM 
pbstable where dateTime >= '$startDate' and dateTime < '$endDate' GROUP BY grp";

if($mode=="pie_wallclock" || $mode=="bar_wallclock")  $query = "SELECT grp, 
sum(resources_used_walltime) FROM pbstable where dateTime >= '$startDate' and dateTime < '$endDate' 
GROUP BY grp";

if($mode=="pie_cputime" || $mode=="bar_cputime")  $query = "SELECT grp, sum(resources_used_cput) FROM
 pbstable where dateTime >= '$startDate' and dateTime < '$endDate' GROUP BY grp";

if($mode=="pie_jobs" || $mode=="bar_jobs")  $query = "SELECT grp, COUNT(jobNumber) FROM pbstable 
where dateTime >= '$startDate' and dateTime < '$endDate' GROUP BY grp";

If the desired $mode is a pie chart the following clause is activated

if($mode=="pie" || $mode=="pie_cputime" || $mode=="pie_eff" || $mode=="pie_wallclock" || $mode=="pie_jobs"){

# Initialise and print the pie chart
#$graph  = new PieGraph (600,400, "auto");
$graph  = new PieGraph ($graphwidth , $graphheight, "auto");


...

Likewise if the mode is a bar chart type this clause is activated:

 if($mode=="bar" || $mode=="bar_cputime" || $mode=="bar_wallclock" || $mode=="bar_jobs" || $mode=="bar_eff"){
   $bargraph = new Graph($graphwidth , $graphheight,"auto");

...

The graph titles and axis labels are set using clauses like this

if($mode=="bar_cputime"){
  $bargraph->title-> Set("Total CPU time per selected VO \n between $startDay $startMon $startYea and
 $endDay $endMon $endYea");
   $bargraph->yaxis->title->Set("CPU time");
}

pieQueryDN.php

pieQueryDN.php is almost identical to pieQuery.php except the $query has been altered to search for DNs within a specified VO.

$query = "SELECT sub_dn, SEC_TO_TIME(SUM(end-start)), 
SEC_TO_TIME(SUM(TIME_TO_SEC(resources_used_cput))) as TotalCPUHours FROM pbstable where dateTime >= 
'$startDate' and dateTime < '$endDate' and grp = '$VO' GROUP BY sub_dn";

Alterations have also been made to the legends and graph titles.

barQuery.php

barQuery.php is similar to pieQuery.php except it only produces a grouped bar plot using the following declarations:

$gbplot = new GroupBarPlot($plots);
$graph->Add($gbplot);


barQueryDN.php

barQueryDN.php is similar to barQuery.php except thst it gives results on a 'per DN' basis for a specified VO. barQueryDN.php is to barQuery.php what pieQueryDN.php is to pieQuery.php.

To properly complete the current implementation of the webpage it will be necessary to alter the query in barQueryDN.php:

  if( $mode=="eff_notmerged")   $query = "SELECT sub_dn, 
sum(TIME_TO_SEC(resources_used_cput))/sum(TIME_TO_SEC(resources_used_walltime)) FROM pbstable where 
dateTime >= '$startDate' and dateTime < '$endDate' and grp = '$VO' GROUP BY sub_dn ORDER BY 
sub_dn";

The results for this appear to be unchanged from month to month, a bug which must be investigated.

CPUQuery_discrete.php

CPUQuery_discrete.php is similar to CPUQuery_discrete.php and is accessed by selecting the Historical Efficiency link on the side menu of any page on the accounting website.

Historical Efficiency allows the user to query the performance of the cluster at a particular instant in history rather than an overall performance between two fixed dates. It also asks a user to specify two timestamps, all jobs started between these two times are histogrammed by efficiency. The user may request that only a few VOs are included or that jobs under a certain wallclock time are ignored.

Finally the user is given the option to plot a histogram giving details of cluster efficiency for each of the VOs at repeated instances throughout history. This request is very time consuming however and is not recommended if >20 bins are needed to complete the task. It may be possible to optimise the MySQL query so that repeated requests are not required and so this is listed under future work.

CPUResult_Discrete.php

CPUResult_Discrete.php follows a similar structure to CPUResult_Discrete.php. The query central to producing the histogram of job efficiency is as follows:

  $cluster_query = "select count(TIME_TO_SEC(resources_used_cput)/TIME_TO_SEC(resources_used_walltime)*100) as 
efficiencycount from pbstable where TIME_TO_SEC(resources_used_cput)/TIME_TO_SEC(resources_used_walltime)*100 
>=$bin_counter and TIME_TO_SEC(resources_used_cput)/TIME_TO_SEC(resources_used_walltime)*100 < 
$bin_counter+$bin_width and '$startDateMysql'< dateTime and '$endDateMysql'> dateTime and 
TIME_TO_SEC(resources_used_walltime)> $min_wallclock and ($grp_condition)"; 

and is repeated for as many bins as are necessary to complete the histogram.

If desired the user may have elected to repeatedly query the efficiency of the farm over a period of months. However at present this is implemented by repeatedly performing a MySQL query for as many times as necessary. Each query can take a number of seconds and so repeated queries not only slow down the page generation time but also puts a heavy strain on the server. An outstanding task is to rework this query such that it can be performed in a single go.

#This will be veeeerrrrryyyyy slow since i'm doing a query once per loop date. Think of a better
 
way to do this.
while ( gmmktime($counter) <= gmmktime($enddate)  ){
#echo "loop number $g <br>" ;

  $counterDateMysql = date("Y-m-d H:i:s", $counter);
#echo "mysql date $counterDateMysql <br>";

#get the jobs and efficiency for each date  
  $cluster_query = "select COUNT(jobNumber) as Jobs, 
sum(TIME_TO_SEC(resources_used_cput))/sum(TIME_TO_SEC(resources_used_walltime))*100 as efficiency
from pbstable where date_sub(dateTime, interval resources_used_walltime day_second) 
<'$counterDateMysql' and dateTime > '$counterDateMysql'";    
  $cluster_result = mysql_query($cluster_query);
  
  $grprow = mysql_fetch_array($cluster_result);
  $jobs_timequery[$g]=$grprow[0];
  $eff_timequery[$g]=$grprow[1];
  if($jobs_timequery[$g]==0){
   $eff_timequery[$g]=0;
  }

The result of this query is displayed in text format and stored in a csv file that the user can download.

if (is_writable($filename)) {
#open $file in append mode. File pointer is at the bottom of the page

  if (!$handle = fopen($filename, 'w')) {
    echo "Cannot open file ($filename)";
    exit;
  }
#write some content
  if(fwrite($handle, $csv_header) === FALSE){
    echo "Cannot write to file ($filename)";
    exit;
  }
  echo "Wrote: <br><FIELDSET> $csv_header <br> ";
  for($i=0;$i<$g;$i++){
    fwrite($handle,$csv_strings[$i]);
    echo "$csv_strings[$i] <br>";
  }
  echo "</FIELDSET><p>to file <a href='$filename'>$filename</a>";
  fclose($handle);
}
 else{
 echo "The file $filename is not writable";
 }
}

Future Work

Change page aesthetics and improve usability

It may be easier for the user to have access to all possible queries on a single page rather than having to navigate through a number of menus first. A single and easy to use access point could be developed to improve the user friendliness of the site. If this proceeds then the menu on the left hand side will become redundant. Arguably it would be nice to create an accounting webpage in a generic 'Glasgow-Scotgrid' style, whatever that might eventually be.

Individual DNs

It is necessary to properly implement the tie up between the pbs log records and the DN of the user who submitted the job. As discussed earlier /usr/local/bin/DNcode/parse-it.py must be investigated and altered to properly insert the complete DN list into the database. Once this has been completed it should be set up as a nightly cron job or perhaps integrated into pbsinsertion.py.

CPUResult_Discrete.php efficiency

Optimise the MySQL query so that repeated $cluster_query requests are not required. This will dramatically improve response times and significantly reduce the load on the server for this query.

Address the issues with CPUResultDN.php

These are as listed in Glasgow_Accounting_Documentation#CPUResultDN.php