Python notes - threads/threading

From Helpful
Jump to: navigation, search
Various things have their own pages, see Category:Python. Some of the pages that collect various practical notes include:


Python's threads are OS-level threads (pthreads, windows threads, or such) so rely on OS scheduling.

Python threads will not distribute among cores, due to the semantics of the GIL. C extensions can do better for their private work, but for parallel processing you're often better of doing multiprocessing (tends to be more portable and efficient than multithreading for cores anyway).

The standard library itself is mostly thread-safe, not yet fully. Exceptions are mostly in the places you'ld expect them, like IO. Python won't crash, but there are some things that aren't quite as atomic as you might expect.

There are two modules, the simple thread that provides basic thread objects, and the higher level threading that builds further and provides mutexes, semaphores and such.

Because in practice (and I paraphrase) 11 out of 10 people don't manage to implement a threaded app correctly, you may want to take a look at stackless and/or frameworks like Kamaelia to save you headaches in the long run.

Or just be conservative about what you share between threads, and how you lock that. Python makes that a little easier than lower-level languages, but there's still plenty room for mistakes.

Or just use it to separate things that won't block each other (...but in this case, also look at event driven systems - they can be more efficient depending on what you're doing).

thread module

The thread module provides threads, mutexes, and a few other things.

To e.g. fire off a function in a thread:

import thread
def thing():
   pass #do something here

start_new_thread's signature is (function, args[, kwargs]), where args should be a tuple and kwargs a dict.

However, since you cannot wait for such threads, and main thread termination might monkeywrench things(verify), you should probably use the threading module instead:

threading module

The threading module provides some more advanced locking mechanisms, and creates objects that represent threads.

Verbose way

Create objects that represent threads - extending Thread and adding code and state for the work that needs to be done:

from threading import Thread
class MyThread(Thread):
    ''' An object that contains, and effectively is, a thread '''
    def __init__(self):
    def run(self): #the function that indirectly gets run when you start()
thr1 = MyThread() # create the thread object
thr1.join()       # wait for it to finish
# The object stays around after the thread terminated, so you can e.g.
print thr1.stuff  #...retrieve data stuck on it by the thread while it was running

In some cases, e.g. you've got a bunch of bookkeeping to do, this can be a clean way of doing that.

In other cases it only adds questionable complexity. Depends.

Brief way

When you just want to create a thread running a specific function, there is a helper function that create the object for you and save you half a dozen boilerplatish lines:

thr1 = threading.Thread(target=func)  # you can also hand along args, kwards, and name


local() creates an object with data guaranteed to be local to the thread.

More precisely, data stored/retrieved in here will never be seen by another thread that accesses the same Local object.

This is great for temporary bookkeeping,

It also makes it easier to write thread functions you can reuse for concurrent threads, without worrying about collision mixups or races from non-local variables.

For example: (and making the point that the main interpreter is considered its own thread)

import threading,time
def f():
    global i
    perthread.num = i
    print perthread.num
#create the thread objects
#start the threads, wait for them to finish
print perthread.num

There are three different states in perthread, one for the main thread, one for the first fired thread, one for the second fired thread.

This will likely output 1, 2 and 42 (or possibly 2, 1 and 42, depending on which thread was scheduled first) because perthread was accessed from the two new threads and the main thread, respectively (...note that its value set from a non-locked global (i), so you could theoretically get racing and 1,1,42 as the result(verify)).

simple-ish pooling

This article/section is a stub — probably a pile of half-sorted notes, is not well-checked so may have incorrect bits. (Feel free to ignore, fix, or tell me)

In py3 you get help in the form of ThreadPoolExecutor.

In py2, things are a little more manual.

A decent way is to start as many threads as you want, and have each source jobs safely from the same place, probably via a thread-safe, multi-producer, multi-consumer queue like queue.Queue.

Note that if you're not so careful with exceptions in the threads individual ones will stop.

If you want to start a thread for each job, then there's a quick and dirty solution in something like:

jobs=range(50)    # these represent jobs to be started. You'ld use something real
target_threads = 5
while len(jobs)>0: #while there are jobs to be worked on
    if threading.active_count() < target_threads+1:  # if there are fewer threads working than we want
       t=threading.Thread(target=some_function( jobs.pop() ))
    else: #enough threads working
#Make sure all threads are done before we exit.
for th in fired:

daemon threads

a thread marked as a daemon thread basically controls what happens when the main thread is done but there are others: if there are only non-daemon threads, it won't.

They are often used in a "these are supporting threads, it makes to sense to keep them running when the rest is done"

This isn't necessary, in that you can always do clean shutdown - and sometimes should.

These are also called nonessential thread, and service thread. Note that these names refer to their role within the process, not within the system (so have nothing to do with service / daemon processes).

Timely thread cleanup, getting Ctrl-C to work, and other subtleties

This article/section is a stub — probably a pile of half-sorted notes, is not well-checked so may have incorrect bits. (Feel free to ignore, fix, or tell me)

You may want to deal with Ctrl-C.

Ctrl-C will arrive in only one thread. You probably prefer to react to it in the main thread, purely for ease of management.

If the signal module is available, then the signal will go to the main thread - but you can't count on that on every system. The more robust variant is to try-catch KeyboardInterrupt in all threads, and pass it along by calling thread.interrupt_main() in the other threads' handlers.

You may want to deal with other signals as well. In part because you may care, in part because various Python implementations will try to get the OS scheduler to switch faster to get the signal handled sooner - which means that scheduling is less efficient until you actually do.

Also, the main thread needs to be scheduled at all, and the OS may not have any reason to do so - say, if it's indefinitely join()ed on a child threads, or waiting on a lock. As such, you want your main thread to poll instead of join/block. If you use thread.join() this means using it with a short-ish timeout value, probably in a loop.

If you want to be able to cleanly shut down threads, you'll want some way of letting the main thread tell all others to stop their work soon - quite possibly just a shared variable (in the example below a global).

Example dealing with a bunch of that:

import threading
import thread
import time
stop_now = False
# Your thread funtion might look something like:                                                                                                                                                               
def threadfunc():
        while not stop_now:
            print "child thread doing stuff"                                                                                                                                                                      
    except KeyboardInterrupt: # for cases where we see it, send it to the main thread instead                                                                                                                  
        print "\nCtrl-C signal arrived in thread"                                                                                                                                                              
# ...assuming we've started threads we can join on, e.g.
fired_threads = []
f = threading.Thread(target=threadfunc)
fired_threads.append( f )
# then your main loop can e.g. do
while len(fired_threads)>0:
        #wait for each to finish. The main thread will be doing just this.                                                                                                                                     
        for th in fired_threads:
            print "parent thread watching"
            th.join(0.2) #timeout quickly - this main thread still gets scheduled relatively rarely
            if not th.is_alive():
                #print "Thread '%s' is done"  # you may want to name your threads when you make them.                                                                                                  
    except KeyboardInterrupt:
        print "\nCtrl-C signal arrived in main thread, asking threads to stop"                                                                                                                                 
        #some mechanism to get worker threads to stop,                                                                                                                                                         
        # such as a global they listen to within reasonable time                                                                                                                                               
        stop_now = True


This article/section is a stub — probably a pile of half-sorted notes, is not well-checked so may have incorrect bits. (Feel free to ignore, fix, or tell me)

The following are factories that return new locks.


  • once acquire()d once, any thread doing another acquire()s will be block until the lock is released
  • any thread may release it


  • must be release()d by the thread that acquire()d it.
  • a thread may lock it multiple times (without blocking); acts semaphore-like in that multiple aquires should be followed by just as many release()s



On python threading, and the GIL

This article/section is a stub — probably a pile of half-sorted notes, is not well-checked so may have incorrect bits. (Feel free to ignore, fix, or tell me)

(this is largely information from David Beazley's interesting GIL talk; see e.g. [1])