Python notes - threads/threading

From Helpful
Revision as of 14:17, 4 July 2017 by Helpful (Talk | contribs)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search
Various things have their own pages, see Category:Python. Some of the pages that collect various practical notes include:

Intro

Python's threads are OS-level threads (pthreads, windows threads, or such) so rely on OS scheduling.

Note that python threads will not distribute among cores, due to the semantics of the GIL. C extensions can do better for their private work.


The standard library mostly thread-safe, not yet fully. Python won't crash, but there are some things that aren't quite as atomic as you might expect, which you might want to code around if you want to avoid that causing trouble. Most of the non-safe parts are in expectable spots such as IO.


There are two modules, the simple thread that provides basic thread objects, and the higher level threading that builds further and provides mutexes, semaphores and such.

Because in practice (and I paraphrase) 11 out of 10 people don't manage to implement a threaded app correctly, you may want to take a look at stackless and/or frameworks like Kamaelia to save you headaches in the long run. (Or just be conservative about what you share between threads, and how you lock that. Python makes that a little easier than lower-level languages, but there's still plenty room for mistakes)

thread module

The thread module provides threads, mutexes, and a few other things.

To e.g. fire off a function in a thread:

import thread
 
def thing():
   pass #do something here
 
thread.start_new_thread(thing,())

start_new_thread's signature is (function, args[, kwargs]), where args should be a tuple and kwargs a dict.


However, since you cannot wait for such threads, and main thread termination might monkeywrench things(verify), you should probably use the threading module instead:


threading module

The threading module provides some more advanced locking mechanisms, and creates objects that represent threads.

Verbose way

The more verbose way to use it is to model in theads - create objects that represent threads, extending Thread and adding code for the work that needs to be done, like:

from threading import Thread
 
class MyThread(Thread):
    ''' An object that contains, and effectively is, a thread '''
    def __init__(self):
        Thread.__init__(self)
 
    def run(self): #the function that indirectly gets run when you start()
        self.stuff=time.time()
 
 
thr1 = MyThread() # create the thread object
thr1.start()      
 
thr1.join()       # wait for it to finish
 
# The object stays around after the thread terminated, so you can e.g.
print thr1.stuff  #...retrieve data stuck on it by the thread while it was running

Such a class is handy if you want to attach some further information to a thread, such as a name, or just want to make a class that can be run as a thread (which sometimes makes sense, and sometimes is questionably complex OO).

Brief way

When you just want to create a thread running a specific function, there is a helper function that create the object for you and save you half a dozen boilerplatish lines:

thr1 = threading.Thread(target=func)  # you can also hand along args, kwards, and name

local()

Local() creates an object that stores separated data for each thread.

That is, data stored/retrieved in here will never be seen by another thread that accesses the same Local object.

This can be a nice way of writing functions you can reuse for multiple threads, without worrying about collision mixups or races from non-local variables.


For example:

import threading,time
i=0
perthread=threading.local()
perthread.num=42
 
def f():
    global i
    i+=1
    perthread.num = i
    time.sleep(1)
    print perthread.num
 
#create the thread objects
thread1=threading.Thread(target=f)
thread2=threading.Thread(target=f)
 
#start the threads, wait for them to finish
thread1.start();thread2.start()
thread1.join();thread2.join()
 
print perthread.num

There are three different states in perthread, one for the main thread, one for the first fired thread, one for the second fired thread.

This will likely output 1, 2 and 42 (or possibly 2, 1 and 42, depending on which thread was scheduled first) because perthread was accessed from the two new threads and the main thread, respectively (...note that its value set from a non-locked global (i), so you could theoretically get racing and 1,1,42 as the result(verify)).

simple-ish pooling

This article/section is a stub — probably a pile of half-sorted notes, is not well-checked so may have incorrect bits. (Feel free to ignore, fix, or tell me)

In py3 you get help in the form of ThreadPoolExecutor.

In py2, things are a little more manual.


The elegant way is to start as many threads as you want, and have them source jobs safely from the same place -- probably a queue.Queue (a thread-safe, multi-producer, multi-consumer queue).

You'll want to carefully deal with exceptions in those threads, or they'll stop and you'll still need wrapping code like the quick-n-dirty stuff below:



Arguably the shortest way is to make each job its own thread. There's a bunch of overhead, which makes this an inefficient solution if the jobs are trivial, but I've used quick and dirty code like:

jobs=range(50)    # these represent jobs to be started. You'ld use something real
fired=[]
worker_threads = 5
 
while len(jobs)>0: #while there are jobs to be worked on
    if threading.active_count() < worker_threads+1:  # if there are fewer threads working than we want
       t=threading.Thread(target=some_function( jobs.pop() ))
       t.start()
       fired.append(t) 
    else: #enough threads working
       time.sleep(0.2)
 
#Make sure all threads are done before we exit.
for th in fired:
    th.join()

daemon threads

a thread marked as a daemon thread basically controls what happens when the main thread is done but there are others: if there are only non-daemon threads, it won't.

They are often used in a "these are supporting threads, it makes to sense to keep them running when the rest is done"

This isn't necessary, in that you can always do clean shutdown - and sometimes should.


These are also called nonessential thread, and service thread. Note that these names refer to their role within the process, not within the system (so have nothing to do with service / daemon processes).



Timely thread cleanup, and getting Ctrl-C to work

This article/section is a stub — probably a pile of half-sorted notes, is not well-checked so may have incorrect bits. (Feel free to ignore, fix, or tell me)

You may want to deal with Ctrl-C.


Because it's useful to exit cleanly with threads around, but also because when there is a signal pending, various Python implementations will try to get the OS scheduler to switch faster to get it handled soon, which means that scheduling is less efficient until you do.

Also, the main thread needs to be scheduled at all, and the OS may not have any reason to do so - say, if it's indefinitely join()ed on a thread, or waiting on a lock. As such, you want your main thread to poll instead of block. If you use thread.join() this means using it with a short-ish timeout value, probably in a loop.


Ctrl-C will arrive in only one thread. You'll probably prefer to react to it in the main thread, purely for ease of management.

If the signal module is available, then the signal will go to the main thread - but you can't count on that on every system. The more robust variant is to try-catch KeyboardInterrupt in all threads, and pass it along by calling thread.interrupt_main() in the other threads' handlers.


If you want to be able to cleanly shut down threads, you'll need some way of letting the main thread tell all others to stop their work soon - quite possibly just a shared variable (in the example below a global).

Example:

import threading
import thread
import time
 
stop_now = False
 
# Your thread funtion might look something like:                                                                                                                                                               
def threadfunc():
    try:
        while not stop_now:
            print "child thread sleeping"                                                                                                                                                                      
            time.sleep(1)
    except KeyboardInterrupt: # for cases where we see it, send it to the main thread instead                                                                                                                  
        print "\nCtrl-C signal arrived in thread"                                                                                                                                                              
        thread.interrupt_main()
 
 
# ...assuming we've started threads we can join on, e.g.
fired_threads = []
f = threading.Thread(target=threadfunc)
f.start()
fired_threads.append( f )
 
# then your main loop can e.g. do
while len(fired_threads)>0:
    try:
        #wait for each to finish. The main thread will be doing just this.                                                                                                                                     
        for th in fired_threads:
            print "parent thread watching"
            th.join(0.2) #timeout quickly - this main thread still gets scheduled relatively rarely
            if not th.is_alive():
                #print "Thread '%s' is done"%th.name  # you may want to name your threads when you make them.                                                                                                  
                fired_threads.remove(th)
    except KeyboardInterrupt:
        print "\nCtrl-C signal arrived in main thread, asking threads to stop"                                                                                                                                 
        #some mechanism to get worker threads to stop,                                                                                                                                                         
        # such as a global they listen to within reasonable time                                                                                                                                               
        stop_now = True

Locks

This article/section is a stub — probably a pile of half-sorted notes, is not well-checked so may have incorrect bits. (Feel free to ignore, fix, or tell me)

The following are factories that return new locks.

threading.Lock()

  • once acquire()d once, any thread doing another acquire()s will be block until the lock is released
  • any thread may release it

threading.RLock()

  • must be release()d by the thread that acquire()d it.
  • a thread may lock it multiple times (without blocking); acts semaphore-like in that multiple aquires should be followed by just as many release()s

threading.Semaphore([value])

threading.BoundedSemaphore([value])

On python threading, and the GIL

This article/section is a stub — probably a pile of half-sorted notes, is not well-checked so may have incorrect bits. (Feel free to ignore, fix, or tell me)

(this is largely information from David Beazley's interesting GIL talk; see e.g. [1])