Deprecated: Assigning the return value of new by reference is deprecated in /u/deptinfo/dalle/www/wiki2/cookbook/sourceblock.php on line 153

Warning: Cannot modify header information - headers already sent by (output started at /u/deptinfo/dalle/www/wiki2/cookbook/sourceblock.php:153) in /u/deptinfo/dalle/www/wiki2/pmwiki.php on line 885
Olivier Dalle's Corner: Grid5000 / Starting with Grid5000
Olivier Dalle's
Corner
$WikiTagline
 

This page is a summary of my first experiment/setup with grid 5000. It’s mostly for my own use, but I’d be happy if it helps others.

This material is provided as is and subject to change or deletion at any time.

So… Today I eventually started experimenting on Grid 5000.

The very first was to create a new G5k user account from the G5k web-site. Being a member of a team that is part of a site that is part of the G5k project, it is easy, I just had to provide the login of the local scientific coordinator of my site (in addition to my personal info) to get approved. (For the Sophia Antipolis site, the current local coordinator is Fabrice Huet.)

EXperimenting with the G5k API Tutorial

  • The The G5k API includes a great tutorial, but it does not cover the OAR batch scheduling system. So I stopped the tutorial after I created my first job using the tutorial, to investigate about OAR. I will come back to the tuto once I have figured it out.
    • The tuto says the OAR doc can be found on G5k wiki. Indeed, on the wiki page, I found a software link in the left navigation pane to a page that lists OAR2 in first position.
    • Reading about OAR, I understand it is a pretty sophisticated system. Great. How do I use it. Paradoxically, I could find anywhere in the documentation how to install the basic commands, like oarsub and so on. After a reading several times the doc, I still could not find where to get the software and went back to the software page on the W5k site, where I could find this link to the OAR developer’s site where I could find the link to the oar project on github: https://github.com/oar-team/oar. There we go!
    • In my terminal, I run git clone git://github.com/oar-team/oar.git
    • Then I enter the oar dir and start reading the INSTALL file. Same thing! It distinguishes server nodes, computing nodes, visu nodes and so on but nothing about a simple user node. Grumpf. How do I compile these commands?!
    • After trying make I got a usage message that explains how to compile OAR modules, one of which is called tools. Eureka! make tools-build is the command. Unfortunately, make install requires root permissions and since I am not so sure about the result I’d rather not go for a sudo, but try the commands as is. Where are the newly built commands?? Looking at the comoile traces, I found the compilation directory in sources/core/tools/.
    • First experiment with with OAR tools: can I see the job I created while doing the tutorial? Coming back to where I stopped in the tuto, I think I need to retrieve my job’s id and export it as follows: export OAR_JOB_ID=455061 (I fund the job id in the header of my response to my job submission request). The shell scripts found in the previous directory do not work. Ok, I guess this is enough for today, I guess I will try first to use g5k without OAR commands.
    • Front-end! This is the answer: G5k hosts are not accessed directly from anywhere but from font-end machines that have the OAR commands installed… So to summarize, we can accesse the api from anywhere using the REST web-service, or we can access to the service using OAR commands from one of the frontal.
  • Back to the tutorial. I created my jobs more than 2 hrs ago so they were probably terminated automatically. Lets check. Yes the command curl -kn https://api.grid5000.fr/sid/sites/rennes/jobs/455061?pretty shows that my job has been killed.
  • Let’s start a new one.
  • On this page I can see my job, among all the jobs currently running on G5000. Good.
  • I request my job to be killed using this command:
    curl -kni https://api.grid5000.fr/sid/sites/rennes/jobs/455117 -X DELETE
  • the following response include this line which seems to confirm that it worked:\\ X-Oar-Info: Deleting the job = 455117 …REGISTERED. The job(s) [ 455117 ] will be deleted in a near future.
  • The tutorial does not explain how to retreive the output of my jobs. Let’s investigate this.