thank you for the encouragement.
i ended up solving it using a makefile (thank you for the hint). it feels a bit cheaty. all my script does is transform the input to a makefile. the makefile does the “heavy lifting” of solving the dependency graph. then again, letting parallel processes sort it out by blocking fifos basically means the linux kernel is solving the dependancy graph.
still, using a makefile feels right too because i am using “the right tool for the job”. because one of the main purpose of a makefile is to sort the dependancy graph.
but the parallel processing with syncing and locking in a bash script had some nerd cred feeling to it. oh well. i guess using a makefile has nerd cred too because all the script kidies today do not even know what a makefile is when all they use is npm.
for your question about my implementation: i suspect that the limit i am hitting is not the file descriptor limit.
here is a rought sketch of my parallel process implementation:
a process roughly does the following:
- parse operand and arguments
- if both arguments are available then calculate and exit. this is the trivial case.
- if an argument is missing then notify the other process (by placing a file) so it knows that this one is waiting. then read from a fifo which will block (until someone writes to the fifo).
- if the other process is done it will write to the fifo which will allow this process to continue.
- if this process is done collecting all arguments and calculating it will itself see whether some other process is waiting and write to their fifo to allow them to continue.
the use of a file descriptor is when reading and writing to the fifo. also when locking critical sections using flock. critical sections are in point 3 and 5 above (race condition between checking for value and writing value).
no file description action happens simultaneously in a given process. so the extra file descriptor usage of a process is at most one at any point in time. at least as far as i understand. i am by far no expert in linux kernel internals.
one thing i tried was to close stdin and stdout and stderr since i didn’t use them (other than debug print). no effect on the amount of processes i was able to start before error. so i suspected it was not a file descriptor limit.
the error message: /tmp/Answer.sh: fork: retry: Resource temporarily unavailable
a quick search came up with a handfull of internet posts saying it is a file descriptor limit. but it seems that it could be any other limit from ulimit. i did not make a lot of effort to try to understand what limit exactly that i hit.
i tried to optimize a bit here and there but it seemed clear to me that no amount of optimizing will reduce the amount of resources that i need other than solving the dependancy graph before starting the processes. at which point the parallel processes are no longer doing anything cool.
i am curious whether you will be able to solve it using parallel processing in bash. i hope i did not spoil to much for you.
thank you again for encouragement. i am happy that my idea and my rambling about my mourning inspired someone.