Difference between revisions of "Python notes - threads/threading"

From Helpful
Jump to: navigation, search
m (local())
m (Timely thread cleanup, getting Ctrl-C to work, and other subtleties)
 
(4 intermediate revisions by the same user not shown)
Line 124: Line 124:
  
 
'''In py3''' you get help in the form of  [http://docs.python.org/dev/library/concurrent.futures.html#threadpoolexecutor ThreadPoolExecutor].
 
'''In py3''' you get help in the form of  [http://docs.python.org/dev/library/concurrent.futures.html#threadpoolexecutor ThreadPoolExecutor].
 +
  
 
'''In py2''', things are a little more manual.  
 
'''In py2''', things are a little more manual.  
  
 +
A decent way is to start as many threads as you want, and have each source jobs safely from the same place, probably via a thread-safe, multi-producer, multi-consumer queue like  [http://docs.python.org/2/library/queue.html queue.Queue].
  
 
+
Note that if you're not so careful with exceptions in the threads individual ones will stop.
The elegant way is to start as many threads as you want, and have them source jobs safely from the same place -- probably a [http://docs.python.org/2/library/queue.html queue.Queue] (a thread-safe, multi-producer, multi-consumer queue).
+
 
+
You'll want to carefully deal with exceptions in those threads, or they'll stop and you'll still need wrapping code like the quick-n-dirty stuff below:
+
 
+
 
+
  
  
Arguably the shortest way is to make each job its own thread. Quick and dirty variant:
+
If you want to start a thread for each job, then there's a quick and dirty solution in something like:
 
<code lang="python">
 
<code lang="python">
 
jobs=range(50)    # these represent jobs to be started. You'ld use something real
 
jobs=range(50)    # these represent jobs to be started. You'ld use something real
Line 155: Line 152:
 
</code><!--
 
</code><!--
 
Notes:
 
Notes:
* inefficient if many short-running jobs, both because overhead and the threads aren't cleaned up until the end
+
* inefficient if many short-running jobs, both because overhead, and threads aren't cleaned up until the end
  
 
* Uses 'total threads in this interpreter', so implicitly assumes we're the only ones firing off threads. You could do this better with more bookkeeping.
 
* Uses 'total threads in this interpreter', so implicitly assumes we're the only ones firing off threads. You could do this better with more bookkeeping.
Line 185: Line 182:
  
  
====Timely thread cleanup, and getting Ctrl-C to work====
+
====Timely thread cleanup, getting Ctrl-C to work, and other subtleties====
 
{{stub}}
 
{{stub}}
  
 
You may want to deal with Ctrl-C.  
 
You may want to deal with Ctrl-C.  
  
 +
Ctrl-C will arrive in only one thread.
 +
You probably prefer to react to it in the main thread, purely for ease of management.
  
Because it's useful to exit cleanly with threads around, but also because when there is a signal pending, various Python implementations will try to get the OS scheduler to switch faster to get it handled soon - which means that scheduling is less efficient until you actually ''do''.
+
If the <tt>signal</tt> module is available, then the signal ''will'' go to the main thread - but you can't count on that on every system.
 +
The more robust variant is to try-catch KeyboardInterrupt in all threads, and pass it along by calling <tt>thread.interrupt_main()</tt> in the other threads' handlers.
  
Also, the main thread needs to be scheduled at all, and the OS may not have any reason to do so - say, if it's indefinitely join()ed on a thread, or waiting on a lock.
 
As such, you want your main thread to poll instead of block.
 
If you use thread.join() this means using it with a short-ish timeout value, probably in a loop.
 
  
 +
You may want to deal with other signals as well.
 +
In part because you may care, in part because various Python implementations will try to get the OS scheduler to switch faster to get the signal handled sooner - which means that scheduling is less efficient until you actually ''do''.
  
Ctrl-C will arrive in only one thread.
 
You'll probably prefer to react to it in the main thread, purely for ease of management.
 
  
If the <tt>signal</tt> module is available, then the signal ''will'' go to the main thread - but you can't count on that on every system.
+
Also, the main thread needs to be scheduled at all, and the OS may not have any reason to do so - say, if it's indefinitely join()ed on a child threads, or waiting on a lock.  
The more robust variant is to try-catch KeyboardInterrupt in all threads, and pass it along by calling <tt>thread.interrupt_main()</tt> in the other threads' handlers.
+
As such, you want your main thread to poll instead of join/block.
 +
If you use thread.join() this means using it with a short-ish timeout value, probably in a loop.
 +
 
  
  
 +
If you want to be able to cleanly shut down threads, you'll want some way of letting the main thread tell all others to stop their work soon - quite possibly just a shared variable (in the example below a global).
  
If you want to be able to cleanly shut down threads, you'll need some way of letting the main thread tell all others to stop their work soon - quite possibly just a shared variable (in the example below a global).
 
  
Example:
+
Example dealing with a bunch of that:
 
<code lang="python">
 
<code lang="python">
 
import threading
 
import threading
Line 220: Line 219:
 
     try:
 
     try:
 
         while not stop_now:
 
         while not stop_now:
             print "child thread sleeping"                                                                                                                                                                       
+
             print "child thread doing stuff"                                                                                                                                                                       
 
             time.sleep(1)
 
             time.sleep(1)
 
     except KeyboardInterrupt: # for cases where we see it, send it to the main thread instead                                                                                                                   
 
     except KeyboardInterrupt: # for cases where we see it, send it to the main thread instead                                                                                                                   

Latest revision as of 17:54, 22 March 2020

Various things have their own pages, see Category:Python. Some of the pages that collect various practical notes include:

Intro

Python's threads are OS-level threads (pthreads, windows threads, or such) so rely on OS scheduling.

Python threads will not distribute among cores, due to the semantics of the GIL. C extensions can do better for their private work, but for parallel processing you're often better of doing multiprocessing (tends to be more portable and efficient than multithreading for cores anyway).


The standard library itself is mostly thread-safe, not yet fully. Exceptions are mostly in the places you'ld expect them, like IO. Python won't crash, but there are some things that aren't quite as atomic as you might expect.


There are two modules, the simple thread that provides basic thread objects, and the higher level threading that builds further and provides mutexes, semaphores and such.


Because in practice (and I paraphrase) 11 out of 10 people don't manage to implement a threaded app correctly, you may want to take a look at stackless and/or frameworks like Kamaelia to save you headaches in the long run.

Or just be conservative about what you share between threads, and how you lock that. Python makes that a little easier than lower-level languages, but there's still plenty room for mistakes.

Or just use it to separate things that won't block each other (...but in this case, also look at event driven systems - they can be more efficient depending on what you're doing).

thread module

The thread module provides threads, mutexes, and a few other things.

To e.g. fire off a function in a thread:

import thread
 
def thing():
   pass #do something here
 
thread.start_new_thread(thing,())

start_new_thread's signature is (function, args[, kwargs]), where args should be a tuple and kwargs a dict.


However, since you cannot wait for such threads, and main thread termination might monkeywrench things(verify), you should probably use the threading module instead:


threading module

The threading module provides some more advanced locking mechanisms, and creates objects that represent threads.

Verbose way

Create objects that represent threads - extending Thread and adding code and state for the work that needs to be done:

from threading import Thread
 
class MyThread(Thread):
    ''' An object that contains, and effectively is, a thread '''
    def __init__(self):
        Thread.__init__(self)
 
    def run(self): #the function that indirectly gets run when you start()
        self.stuff=time.time()
 
 
thr1 = MyThread() # create the thread object
thr1.start()      
 
thr1.join()       # wait for it to finish
 
# The object stays around after the thread terminated, so you can e.g.
print thr1.stuff  #...retrieve data stuck on it by the thread while it was running

In some cases, e.g. you've got a bunch of bookkeeping to do, this can be a clean way of doing that.

In other cases it only adds questionable complexity. Depends.

Brief way

When you just want to create a thread running a specific function, there is a helper function that create the object for you and save you half a dozen boilerplatish lines:

thr1 = threading.Thread(target=func)  # you can also hand along args, kwards, and name

local()

local() creates an object with data guaranteed to be local to the thread.

More precisely, data stored/retrieved in here will never be seen by another thread that accesses the same Local object.


This is great for temporary bookkeeping,

It also makes it easier to write thread functions you can reuse for concurrent threads, without worrying about collision mixups or races from non-local variables.


For example: (and making the point that the main interpreter is considered its own thread)

import threading,time
i=0
perthread=threading.local()
perthread.num=42
 
def f():
    global i
    i+=1
    perthread.num = i
    time.sleep(1)
    print perthread.num
 
#create the thread objects
thread1=threading.Thread(target=f)
thread2=threading.Thread(target=f)
 
#start the threads, wait for them to finish
thread1.start();thread2.start()
thread1.join();thread2.join()
 
print perthread.num

There are three different states in perthread, one for the main thread, one for the first fired thread, one for the second fired thread.

This will likely output 1, 2 and 42 (or possibly 2, 1 and 42, depending on which thread was scheduled first) because perthread was accessed from the two new threads and the main thread, respectively (...note that its value set from a non-locked global (i), so you could theoretically get racing and 1,1,42 as the result(verify)).

simple-ish pooling

This article/section is a stub — probably a pile of half-sorted notes, is not well-checked so may have incorrect bits. (Feel free to ignore, fix, or tell me)

In py3 you get help in the form of ThreadPoolExecutor.


In py2, things are a little more manual.

A decent way is to start as many threads as you want, and have each source jobs safely from the same place, probably via a thread-safe, multi-producer, multi-consumer queue like queue.Queue.

Note that if you're not so careful with exceptions in the threads individual ones will stop.


If you want to start a thread for each job, then there's a quick and dirty solution in something like:

jobs=range(50)    # these represent jobs to be started. You'ld use something real
fired=[]
target_threads = 5
 
while len(jobs)>0: #while there are jobs to be worked on
    if threading.active_count() < target_threads+1:  # if there are fewer threads working than we want
       t=threading.Thread(target=some_function( jobs.pop() ))
       t.start()
       fired.append(t) 
    else: #enough threads working
       time.sleep(0.2)
 
#Make sure all threads are done before we exit.
for th in fired:
    th.join()

daemon threads

a thread marked as a daemon thread basically controls what happens when the main thread is done but there are others: if there are only non-daemon threads, it won't.

They are often used in a "these are supporting threads, it makes to sense to keep them running when the rest is done"

This isn't necessary, in that you can always do clean shutdown - and sometimes should.


These are also called nonessential thread, and service thread. Note that these names refer to their role within the process, not within the system (so have nothing to do with service / daemon processes).



Timely thread cleanup, getting Ctrl-C to work, and other subtleties

This article/section is a stub — probably a pile of half-sorted notes, is not well-checked so may have incorrect bits. (Feel free to ignore, fix, or tell me)

You may want to deal with Ctrl-C.

Ctrl-C will arrive in only one thread. You probably prefer to react to it in the main thread, purely for ease of management.

If the signal module is available, then the signal will go to the main thread - but you can't count on that on every system. The more robust variant is to try-catch KeyboardInterrupt in all threads, and pass it along by calling thread.interrupt_main() in the other threads' handlers.


You may want to deal with other signals as well. In part because you may care, in part because various Python implementations will try to get the OS scheduler to switch faster to get the signal handled sooner - which means that scheduling is less efficient until you actually do.


Also, the main thread needs to be scheduled at all, and the OS may not have any reason to do so - say, if it's indefinitely join()ed on a child threads, or waiting on a lock. As such, you want your main thread to poll instead of join/block. If you use thread.join() this means using it with a short-ish timeout value, probably in a loop.


If you want to be able to cleanly shut down threads, you'll want some way of letting the main thread tell all others to stop their work soon - quite possibly just a shared variable (in the example below a global).


Example dealing with a bunch of that:

import threading
import thread
import time
 
stop_now = False
 
# Your thread funtion might look something like:                                                                                                                                                               
def threadfunc():
    try:
        while not stop_now:
            print "child thread doing stuff"                                                                                                                                                                      
            time.sleep(1)
    except KeyboardInterrupt: # for cases where we see it, send it to the main thread instead                                                                                                                  
        print "\nCtrl-C signal arrived in thread"                                                                                                                                                              
        thread.interrupt_main()
 
 
# ...assuming we've started threads we can join on, e.g.
fired_threads = []
f = threading.Thread(target=threadfunc)
f.start()
fired_threads.append( f )
 
# then your main loop can e.g. do
while len(fired_threads)>0:
    try:
        #wait for each to finish. The main thread will be doing just this.                                                                                                                                     
        for th in fired_threads:
            print "parent thread watching"
            th.join(0.2) #timeout quickly - this main thread still gets scheduled relatively rarely
            if not th.is_alive():
                #print "Thread '%s' is done"%th.name  # you may want to name your threads when you make them.                                                                                                  
                fired_threads.remove(th)
    except KeyboardInterrupt:
        print "\nCtrl-C signal arrived in main thread, asking threads to stop"                                                                                                                                 
        #some mechanism to get worker threads to stop,                                                                                                                                                         
        # such as a global they listen to within reasonable time                                                                                                                                               
        stop_now = True

Locks

This article/section is a stub — probably a pile of half-sorted notes, is not well-checked so may have incorrect bits. (Feel free to ignore, fix, or tell me)

The following are factories that return new locks.

threading.Lock()

  • once acquire()d once, any thread doing another acquire()s will be block until the lock is released
  • any thread may release it

threading.RLock()

  • must be release()d by the thread that acquire()d it.
  • a thread may lock it multiple times (without blocking); acts semaphore-like in that multiple aquires should be followed by just as many release()s

threading.Semaphore([value])

threading.BoundedSemaphore([value])

On python threading, and the GIL

This article/section is a stub — probably a pile of half-sorted notes, is not well-checked so may have incorrect bits. (Feel free to ignore, fix, or tell me)

(this is largely information from David Beazley's interesting GIL talk; see e.g. [1])