Two days ago, somebody — let us call her Imogen — asked for resources on threading in the #ruby IRC channel. Imogen was instantly instilled with fear, uncertainty and doubt from others in the channel; people whom I otherwise hold great respect for.
<person A> with ruby, don’t [write threaded code]
<person B> doesn’t really help writing threaded code unless you’re on jruby
<Imogen> but for my heavily I/O-wait jobs, mri threads seems like a good fit?
<person A> for concurrency in MRI use spawn
<person B> 1 GIL and you’re toast
<Imogen> sad panda :(
Now now, Imogen, don’t be a sad panda. While it is true that Ruby MRI will only run one thread at a time (although that is not the whole story), that does not mean threads are useless. To me, it sounds like your jobs are a perfect fit for ruby threads!
It turned out that Imogen wanted to execute a large amount of jobs using system(). She did not care about order; only that scheduled jobs are finished eventually, and that she will be able to wait for any pending work to finish before quitting. This is a perfect candidate for threading.
Yes, ruby runs only one thread of ruby code at a time because of the GIL, and no, this is not an issue. Ruby is very smart when doing
system() calls: even if the call might block indefinitely, other threads will continue to be scheduled by Ruby until the
system() call returns. See here:
The above code will continuously print “waiting…” while it is waiting for the
system() calls to finish. One could say that Ruby threads are merely a way of having Ruby do a select() loop for you, and this is good news for Imogen!
Imogen initially spawned one new thread every time she needed some work done, and there’s a problem with this approach. If you start too many threads too often, or have too many threads running at the same time, performance will suffer. Ruby ends up juggling your threads more time than it is running them.
As I am sure you can imagine by now, a thread pool is what we need. Something that waits for us to give it some work, and then we simply forget about it. I explained this to Imogen. However, as she did not know where to start, and knowing it’d take me ~10 minutes, I offered to help.
It is simple, and it works. If you fire an exception inside a job, you will kill a worker… and you don’t want to kill your hard working threads, do you?
Also, here’s the source code.