
Hi, after some call stack checking, I've reduced the time to parse from: 1h15min > remove to_string > 37seconds > cache find_file > 17 seconds > cache generation of datatypes > 1.3 seconds! Should I send in my patches or not? ;) -Kenny On Wed, 5 Nov 2008, Kenny Billiau wrote:
Hi,
I've finally found the several reasons why the parsing of a 250KB xml file takes about 27 minutes in MOBY. (And this was supposed to be a quick save ;))
The first main bottleneck is the way they cast perl types to a string:
sub as_string { my $self = shift; my $dump_str; my $io = IO::Scalar->new (\$dump_str); my $oio = select ($io); $DUMPER->dumpValue (\$self); select ($oio); return $dump_str; }
Here they temporary change STDOUT to print to a variable and then do a dump from it (this is only useful for complet structures like hashes, arrays and objects and not for simple types like int and string). So I simply removed it as it gave me the same results as before. I should actually check if the datatype is complex or not instead of dropping this function. The speed up is (hold on to your pantyhose): 5500%. (Actually it's larger, but I stopped profiling after this number was reached.) I think someone was really really wasted when coding that. Or I'm really missing something.
The second bottleneck was their way of finding a file in the mOBY cache of datatypes, services, etc.. They look in a default place and then transverse the @INC array (perls equivalent for $PATH). But they don't cache the results. Everytime a file is requested, it is checked for _on disk_. My cache was on a share, so I moved it to a local disk, but, and this is for once good for IT, I had only an improvement of mere 5%! Now I cache the results. Speed up is another 50% compared with the result from the first improvement.
The third one is the building up of the whole MOBY environment _everytime_ a start_element event is fired from the SAX parser. Me thinks me's gonna write me a MOBY Factory.
What a fuckup, Kenny
--