I'm guessing it was tobus :p

Sofie


Michiel Van Bel wrote:
Hall of shame!
(not me though, i havent used the cluster in months)

Frederik Delaere wrote:
  
Hi Cluster people,

As you might have noticed there where cluster problems the last two days.
The partition where the grid engine is installed was full, I cleaned 
it up but it was
full again in a couple of hours, which make the complete grid engine 
fail, and jobs got lost.

It took me a while to figure out what went wrong, but I found the 
problem:

somebody submitted a couple of 1000 jobs with a 2 megabyte parameter / 
job

eg: "qsub somejob.pl <insert 2 megabytes of parameters here>"

grid engine stores all these commands while the jobs are in the queue, 
it stores it on
its own partition and not somewhere on a data partition, the partition 
went from
21% usage to 99% usage in a couple of hours. To solve this, the jobs 
had to be
changed to read these parameters from a file stored somewhere on the NAS.

So in the future, tell new people about this, or it will happen again 
some day.

frederik

    


  

-- 
Sofie Van Landeghem
PhD Student
VIB Department of Plant Systems Biology, Ghent University
Bioinformatics and Evolutionary Genomics
Technologiepark 927, 9052 Gent, BELGIUM
Tel: +32 (0)9 331 36 95                        fax:+32 (0)9 3313809
Website: http://bioinformatics.psb.ugent.be