I’m amazed at the length of this thread.
As other people said, bash is not meant to do algorithms, it’s just meant to call standard UNIX tools which will do the actual job. Trying to do any kind of tight loop in bash is like using visual basic to do 3D imaging. The worst possible tool for the job.
UNIX pipelining system is extremely powerful. The data are read only once at the entrance of the chain and then remain in memory, so disk access overhead is limited to a minimum. Each process in the pipe chain will run in parallel, so except for the sort that cannot produce results until it has processed the whole input, other filters will spit out results as soon as they have an input line available. If an element of the chain is waiting for disk I/O, the other elements use the available CPU to do their part of the job. No bottleneck at all, except logical data dependency (you need to wait for the sort to finish before you can process its output data).
And most important of all, each UNIX tool is an optimized binary that will do its job extremely well, while bash is a general purpose interpreter that cannot compete with compiled code in terms of efficiency and will run your own code, which is very likely to be less optimized than tools that have been fine tuned for decades.
I solved it in 7 lines of bash, with a pipeline of :
- sort to get the values sorted
- awk to produce a list of differences (with a 26 character long awk script that just substract each line from the next)
- another sort to sort the differences
- tail to display the needed result
The processing time of the awk script is basically zero. The CPU goes into the two successive sorts.
If you try to do the differences within the bash shell, your code will run inside a single process (the bash interpreter) so you will add a huge bottleneck just to substract two consecutive values.
If (God forbid!) you write your own quicksort or whatever sorting algorithm in bash script, you will replace a lightning fast optimized binary code with a ponderous interpreter that needs to think hard each time it tries to access a variable or index an array.
It can be an interesting challenge to make that work fast enough, but you can spare yourself the hassle really easily.