OptorSim

OptorSim is a Grid simulator designed to test dynamic replication strategies used in optimising the efficiency of a Grid. A Java Applet version of OptorSim can be seen by clicking on the screenshot below. This demonstration shows what happens to the Grid as jobs are submitted to it and our replication algorithms control where replicas of files are created and deleted.

click here to launch applet
Screenshot of the OptorSim Java Applet

The GridPP2 (UK Grid for Particle Physics) production plan describes a future Grid infrastructure. Sample jobs based on real high energy physics analysis are simulated. A job consists of a set of files which must be read in sequential order with calculations performed on the data in each file. A single Resource Broker assigns jobs to sites based on a scheduling algorithm which takes into account the cost of accessing data for the job in terms of time and the current workload at each site.

As the job runs at a site our replica optimisation algorithms decide whether or not it is worthwhile to create a copy, or replica, of each file it processes on local storage so that in future the file can be accessed faster. With limited storage space available it makes sense to only copy the most popular files, so this decision can be made by considering the previous history of files accessed at the site. An economic model we have developed for trading files has been shown to be the most effective at optimising Grid resources by reducing the total time to process all the simluated jobs, when compared to more traditional algorithms. Eventually these optimisation algorithms will be incorporated into the Replica Optimisation Service currently being developed by the EU DataGrid which will soon be deployed on the real live Grid.

The Java Applet is a demonstration of OptorSim showing what happens when 500 jobs are submitted to the GridPP testbed when our Economic Model algorithm is in use. All the files (which are all of size 1GB) are intially stored at sites outwith the UK (CERN, FNAL and SLAC) and all the other sites have empty storage between 50GB and 300GB (These numbers are scaled down from the real values which are between 5TB and 100TB). There are 100 files for each experiment and three jobs per experiment, which require between 20 and 50 files each. The experiments GridPP is involved in are: ATLAS, CMS, LHCb (all at CERN), CDF, DZero (both at FNAL) and BaBar (at SLAC). When a site requests a file for a job it automatically creates a local replica until its local storage has filled up, and from then on our algorithm decides if it is worth deleting any files to create replicas of new files and if so which files to delete.

Each section of the applet is explained in detail below:

Simulation Parameters
This table shows some of the input parameters for OptorSim. When "Auction" is on the Grid uses an auctioning protocol (described in detail in the links below) for finding the best file among the replicas distributed around the Grid. The "Scale Factor" is used to make the simulation run faster by scaling down the size of the files and storage space. Currently it is set so that twelve seconds of simulation time = one second of real time.

Simulation Time
This shows how time elapses in the simulation (hh:mm:ss), effectively the real time divided by the scale factor.

Job Submission
This window gives information on where jobs are submitted by the Resource Broker. Each site has a policy which states which types of jobs it is willing to run, determined by which experimental research groups are working at the site.

Site Information
By clicking on a site on the map, information on it will be displayed in this window, such as the size of the storage on the site and the files present. This means the way the files change over time on the site can be monitored.

Grid Status
The map shows the GridPP testbed together with other UK e-Science sites as green circles. The size of these circles give an indication of the relative number of CPUs available to process the job. When a job is submitted to a site it flashes white. The bars above each site show the storage capacity and the red section represents the current fraction of storage used.
The sites are interconnected by a network linked together via routers shown as black circles. The width of the network links is an indication of the capacity of the link, so the JANET backbone links are large and the connections from individual sites are relatively small. The colour on the links shows the amount of traffic, which varies from light blue (light traffic) to dark blue (heavy traffic). (Note: The backbone does not actually lie in the sea around the UK; this is a schematic projection.)
The site depicted in the English Channel is effectively the world outside the UK which for the purposes of this simulation contains CERN, FNAL and SLAC.

As the simulation begins most of the jobs are scheduled to the sites with the best network connections since none of the sites in the UK have any files. But as the queues build up at these sites, the other sites become more favourable to run the jobs. After some time the heavy traffic to the outside world becomes lighter as replicas of files become distributed throughout the Grid and eventually the whole network becomes less used as the jobs are scheduled to sites which now have the files required stored locally.

More information on OptorSim can be found in:

Any questions? e-mail the OptorSim mailing list: hep-proj-grid-optorsim@listbox.cern.ch


(c) 2003 CERN, ITC-irst, PPARC, on behalf of the EU DataGrid.


Last modified Wed 26 November 2003 . View page history
Switch to HTTPS . Website Help . Print View . Built with GridSite 1.4.3
For more about GridPP please contact Neasan O'Neill