PythonI/O进阶学习笔记_10.python的多线程

当前位置:

首页 > Python基础教程 >

PythonI/O进阶学习笔记_10.python的多线程

content：

1. python的GIL

2. 多线程编程简单示例

3. 线程间的通信

4. 线程池

5. threadpool Future 源码分析

===========================

一. python的GIL

关于python的GIL，有一位博主写的我觉得真的挺好的，清晰明了通俗易懂。http://cenalulu.github.io/python/gil-in-python/

在这里就不赘述了，但是注意文章中的试验结论是基于python2的。python3中已经有所改进所以文中示例未必能得出当时相同的结论。

但是依旧对我们理解GIL很有帮助。

那总结下来什么是GIL呢？

global interpreter lock

python前期为了在多线程编程的时候更为简单，于是诞生了GIL。GIL使得同一时刻只有一个线程在一个cpu上执行字节码，无法将多个线程映射到多个cpu上。也就使得python无法实现真正意义上的多线程。

那是不是有了GIL是不是就绝对安全了？我们编码的时候就不需要考虑线程安全了？

并不是，GIL释放的时间可能那时候进程并没有执行完成。

GIL会在适当的时候释放，比如在字节码某特定行数以及特定时间片被释放，也会在遇到io操作的时候主动释放。

二. 多线程编程简单示例

想要实现开启线程执行任务，有两种方法，直接用Thread进行实例化，或者自己实现继承Thread的子类。

1.通过therad类实例化

这种情况适用于代码量比较少，逻辑比较简单的时候

import time
import  threading
def get_detail_html(url):
    print("get detail html start")
    time.sleep(2)
    print("get detail html stop")
 
def get_detail_url(url):
    print("url start")
    time.sleep(2)
    print("url end")
 
if __name__=="__main__":
    thread1= threading.Thread(target=get_detail_html,args=("",))
    thread2= threading.Thread(target=get_detail_url,args=("",))
    start_time=time.time()
    
    # thread1.setDaemon()
    # thread2.setDaemon()
 
    thread1.start()
    thread2.start()
 
    thread1.join()
    thread2.join()
 
    print("lasttime :{}".format(time.time()-start_time))
    pass

2.通过继承Thread来实现多线程（继承Thread，完成自己的thread子类）

按这种情况来写的话，我们就需要重载我们的run方法。（注意是run方法而不是start）

import time
import  threading
 
class thread_get_detail_html(threading.Thread):
    def run(self):
        print("get detail html start")
        time.sleep(2)
        print("get detail html stop")
 
class thread_get_detail_url(threading.Thread):
    def run(self):
        print("url start")
        time.sleep(2)
        print("url end")
 
if __name__=="__main__":
    # thread1= threading.Thread(target=get_detail_html,args=("",))
    # thread2= threading.Thread(target=get_detail_url,args=("",))
    thread1=thread_get_detail_html()
    thread2=thread_get_detail_url()
    start_time=time.time()
 
    # thread1.setDaemon()
    # thread2.setDaemon()
 
    thread1.start()
    thread2.start()
 
    thread1.join()
    thread2.join()
 
    print("lasttime :{}".format(time.time()-start_time))
    pass

以上就能发现，启动了两个线程分别执行了thread_get_detail_url和thread_get_detail_url。

三. 线程间的通信

实际上在二中，是在模拟一个简易爬虫的流程。先获取所有我们要爬取的url，然后再对每个url的html页面内容进行获取。那么这就涉及到一个问题了，thread_get_detail_url和thread_get_detail_html之间，需要thread_get_detail_url来的带一个url列表，而thread_get_detail_html也能获取这个url列表去进行操作。

这就涉及到线程间的通信了。

python中常用的线程间的这种需求的通信方式有：

- 全局变量
- Queue消息队列
假设我们现在继续来完成这个爬虫的正常逻辑。

1. 线程间的变量传递

1.1 全局变量

import time
import  threading
 
detail_url_list=[]
 
def get_detail_html():
    global detail_url_list
    if len(detail_url_list)==0:
        return
    url=detail_url_list.pop()
    print("get detail html start :{}".format(url))
    time.sleep(2)
    print("get detail html stop :{}".format(url))
 
def get_detail_url():
    global  detail_url_list
    print("url start")
    for i in range(20):
        detail_url_list.append("htttp://www.baidu.com/{id}".format(id=i))
    time.sleep(2)
    print("url end")
 
if __name__=="__main__":
    start_time=time.time()
    thread1= threading.Thread(target=get_detail_url)
    thread1.start()
    for i in range(10):
        thread_2=threading.Thread(target=get_detail_html)
        thread_2.start()
    print("lasttime :{}".format(time.time()-start_time))
    pass

实际上，还可以更方便。将变量作为参数传递，在方法中就不需要global了。

import time
import  threading
 
detail_url_list=[]
def get_detail_html(detail_url_list):
    if len(detail_url_list)==0:
        return
    url=detail_url_list.pop()
    print("get detail html start :{}".format(url))
    time.sleep(2)
    print("get detail html stop :{}".format(url))
 
def get_detail_url(detail_url_list):
    print("url start")
    for i in range(20):
        detail_url_list.append("htttp://www.baidu.com/{id}".format(id=i))
    time.sleep(2)
    print("url end")
 
if __name__=="__main__":
    start_time=time.time()
    thread1= threading.Thread(target=get_detail_url,args=(detail_url_list,))
    thread1.start()
    for i in range(10):
        thread_2=threading.Thread(target=get_detail_html,args=(detail_url_list,))
        thread_2.start()
 
    print("lasttime :{}".format(time.time()-start_time))
    pass

但是这样是不能应用于多进程的。

还可以生成一个variables.py文件，直接import这个文件，这种情况变量过多的时候，这种方法比较方便。

但是如果我们直接import变量名，是不能看到其他进程对这个变量的修改的。

但是以上的方法都是线程不安全的操作。想要达到我们要的效果，就必须要加锁。所以除非对锁足够了解，知道自己在干嘛，否则并不推荐这种共享变量的方法来进行通信。

1.2 queue消息队列

a.queue实现上述

import time
import  threading
from queue import  Queue
 
def get_detail_html(queue):
    url=queue.get()
    print("get detail html start :{}".format(url))
    time.sleep(2)
    print("get detail html stop :{}".format(url))
 
def get_detail_url(queue):
    print("url start")
    for i in range(20):
        queue.put("htttp://www.baidu.com/{id}".format(id=i))
    time.sleep(2)
    print("url end")
 
if __name__=="__main__":
    start_time=time.time()
    url_queue=Queue()
    thread1= threading.Thread(target=get_detail_url,args=(url_queue,))
    thread1.start()
    for i in range(10):
        thread_2=threading.Thread(target=get_detail_html,args=(url_queue,))
        thread_2.start()

b.queue是如何实现线程安全的？

我们并不推荐1.1中直接用全局变量的方法，是因为需要我们自己花精力去维护其中的锁操作才能实现线程安全。而python的Queue是在内部帮我们实现了线程安全的。

queue使用了deque deque是在字节码的程度上就实现了线程安全的

c.queue的其他方法

get_nowait（立即取出一个元素，不等待）(异步)

put_nowait（立即放入一个元素，不等待）(异步）

join: 一直block住，从quque的角度阻塞住线程。调用task_done()函数退出。

2.线程间的同步问题

2.1 线程为什么需要同步？同步到底是个啥意思？

这是在多线程中，必须要面对的问题。

例子：我们有个共享变量total，一个方法对total进行加法，一个方法对加完之后的total进行减法。

如果循环对total进行加减的次数比较大的时候，就会比较明显的发现，每次运行的时候，得到的taotal可能是不一样的。

import threading
 
total=0
def add():
    global total
    for i in range(100000000):
        total += 1
def desc():
    global  total
    for i in range(100000000):
        total = total - 1
if __name__=="__main__":
    add_total=threading.Thread(target=add)
    desc_total=threading.Thread(target=desc)
    add_total.start()
    desc_total.start()
    add_total.join()
    desc_total.join()
 
    print(total)

为什么不会像我们希望的最后的total为0呢？

从字节码的角度上看，我们看看简化后的add和desc的字节码。

#input 
def add1(a):
    a += 1
 
def desc1(a):
    a -= 1
 
import  dis
print(dis.dis(add1))
print(dis.dis(desc1))
 
#output
22           0 LOAD_FAST                0 (a)
              2 LOAD_CONST               1 (1)
              4 INPLACE_ADD

栏目列表