I second that motion !

(wasn't me though)

Michiel Van Bel schreef:
Hall of shame!
(not me though, i havent used the cluster in months)

Frederik Delaere wrote:
  
Hi Cluster people,

As you might have noticed there where cluster problems the last two days.
The partition where the grid engine is installed was full, I cleaned 
it up but it was
full again in a couple of hours, which make the complete grid engine 
fail, and jobs got lost.

It took me a while to figure out what went wrong, but I found the 
problem:

somebody submitted a couple of 1000 jobs with a 2 megabyte parameter / 
job

eg: "qsub somejob.pl <insert 2 megabytes of parameters here>"

grid engine stores all these commands while the jobs are in the queue, 
it stores it on
its own partition and not somewhere on a data partition, the partition 
went from
21% usage to 99% usage in a couple of hours. To solve this, the jobs 
had to be
changed to read these parameters from a file stored somewhere on the NAS.

So in the future, tell new people about this, or it will happen again 
some day.

frederik

    


  


-- 
==================================================================
Sebastian Proost                                       PhD Student

Tel:+ 32 (0) 9 33 13 822                      fax:+32 (0)9 3313809
VIB Department of Plant Systems Biology, Ghent University
Technologiepark 927, 9052 Gent, BELGIUM
sebastian.proost@psb.vib-ugent.be          http://www.psb.ugent.be
==================================================================
"If I knew what I was doing, it wouldn't be called research."
                                                 --Albert Einstein