
Hi, I've finally found the several reasons why the parsing of a 250KB xml file takes about 27 minutes in MOBY. (And this was supposed to be a quick save ;)) The first main bottleneck is the way they cast perl types to a string: sub as_string { my $self = shift; my $dump_str; my $io = IO::Scalar->new (\$dump_str); my $oio = select ($io); $DUMPER->dumpValue (\$self); select ($oio); return $dump_str; } Here they temporary change STDOUT to print to a variable and then do a dump from it (this is only useful for complet structures like hashes, arrays and objects and not for simple types like int and string). So I simply removed it as it gave me the same results as before. I should actually check if the datatype is complex or not instead of dropping this function. The speed up is (hold on to your pantyhose): 5500%. (Actually it's larger, but I stopped profiling after this number was reached.) I think someone was really really wasted when coding that. Or I'm really missing something. The second bottleneck was their way of finding a file in the mOBY cache of datatypes, services, etc.. They look in a default place and then transverse the @INC array (perls equivalent for $PATH). But they don't cache the results. Everytime a file is requested, it is checked for _on disk_. My cache was on a share, so I moved it to a local disk, but, and this is for once good for IT, I had only an improvement of mere 5%! Now I cache the results. Speed up is another 50% compared with the result from the first improvement. The third one is the building up of the whole MOBY environment _everytime_ a start_element event is fired from the SAX parser. Me thinks me's gonna write me a MOBY Factory. What a fuckup, Kenny -- ================================================================== Kenny Billiau Web Developer Tel:+32 (0)9 331 36 95 fax:+32 (0)9 3313809 VIB Department of Plant Systems Biology, Ghent University Technologiepark 927, 9052 Gent, BELGIUM kenny.billiau@psb.ugent.be http://bioinformatics.psb.ugent.be ==================================================================