《Working With Ruby Threads》学习笔记

Introduction

why care?

The promise of multi-threading

第1章: You’re Always in a Thread

$ irb
> Thread.main
=> #<Thread:0x007fdc830677c0 run>
> Thread.current == Thread.main => true
$ irb
> Thread.main
=> #<Thread:0x007fdc830677c0 run>
> Thread.current == Thread.main => true

第2章:Threads of Execution

Shared address space

$ top -l1 -pid 8409 -stats pid,th

以上命令可以查看进程 id 为 8409 的线程数量

Non-deterministic context switching(非确定的环境切换)

In order to provide fair access, the thread scheduler can ‘pause’ a thread at any time, suspending its current state 为了提供公平的访问,线程调度能在任意时间 “暂停” 一个线程,暂停它的当前状态

||= 语句不是线程安全的,因为线程可能在任何时间被阻止,如果 A 线程运行 ||= 获得了初始值并且暂停,可能会出现失去 B 线程赋值的情况

# This statement
results ||= Queue.new

# when broken down, becomes something like
if @results.nil? 
 temp = Queue.new 
 @results = temp
end

A race condition involves two threads racing to perform an operation on some shared state. 一个竞争条件是在共享状态下,包含两个线程竞争去执行一个同样的操作

针对重要原则的两个策略:

1) don’t allow concurrent modification 2) protect concurrent modification

第三章:Lifecycle of a Thread

Thread.new

Thread.new { ... }
Thread.fork { ... } 
Thread.start(1, 2) { |x, y| x + y }

Thread.new 及其别名方法

Thread#join

Thread#status

Thread#value 的几个可能值:

Thread.stop

require 'thread'

thread = Thread.new do
  Thread.stop
  puts "Hello there"
end

# wait for the thread trigger its stop
puts "----" until thread.status == 'sleep'

thread.wakeup
thread.join

# 输出------
# ----
# ----
...
# ----
# ----
# ----
# Hello there
# [Finished in 1.6s]

Thread.pass

Thread.pass 类似于 Thread.stop 但是他仅仅是让线程调度器去调度另一个线程,不会使当前线程处于 sleep

Avoid Thread#raise

Avoid Thread#kill

第四章:Concurrent != Parallel

You can’t guarantee anything will be parallel

扩展阅读

第五章:The GIL and MRI

The global lock

GIT 别名:Global Interpreter LockGVL (Global VM Lock)Global Lock

The special case: blocking IO(特殊情况:IO 阻塞)

代码在:chapter05/block_io_demo1.rb

require 'open-uri'
3.times.map do 
  Thread.new do
    open('http://zombo.com') 
  end
end.each(&:value)

运行以上代码,假设我们已经生成了所有的线程,他们都试图获取 GIL 来执行代码,Thread A 获得了 GIL,它创建了一个套接字并且试图打开一个连接到 zombo.com,这是线程 A 等待响应,并释放了 GIL, 线程 B 将获得 GIL 并且和线程 A 执行同样的步骤

Why?

There are three reasons that the GIL exists(几种 GIL 存在的原因 ):

  1. 为了在竞争条件下保护 MRI 核心部件 竞争条件会引起很多问题,这同样的问题会出现在 MRI的 C 内核, ,最简单的办法就是减少竞争的数量,防止多个线程同时运行
  2. To facilitate the C extension API(为了便于使用 C 扩展 API) 只要代码块用到了 C 语言扩展 API, GIL 会阻塞其它代码的运行,因为 C 扩展可能不是线程安全的,GIL 的存在保证了线程安全
  3. To reduce the likelihood of race conditions in your Ruby code(尽可能的减少竞争条件)

Misconceptions

错误1: Myth: the GIL guarantees your code will be thread-safe.(GIL 保证你的代码是线程安全的)

代码在:chapter04/unsafe_counter.rb

counter = 0
5.times.map do
  Thread.new do
    temp = @counter

    # 加入以下这行,将会导致结果出错,因为 IO 阻塞时,线程会释放 GIL,导致两个线程的 @counter 值相同
    # puts  temp 
    temp = temp + 1
    @counter = temp
  end
end.each(&:join)
puts @counter

错误2:Myth: the GIL prevents concurrency

第六章:Real Parallel Threading with JRuby and Rubinius

Proof

代码见 chapter06/prime.rb计算素数, MRI 没有 JRuby 和 Rubinius 快

使用 1.8.7 的版本

require 'benchmark'

def prime_sieve_upto(n)
  all_nums = (0..n).to_a
  all_nums[0] = all_nums[1] = nil
  all_nums.each do |p|

    #jump over nils
    next unless p

    #stop if we're too high already
    break if p * p > n

    #kill all multiples of this number
    (p*p).step(n, p){ |m| all_nums[m] = nil }
  end

  #remove unwanted nils
  all_nums.compact
end


primes = 1_000_000
iterations = 10
num_threads = 5
iterations_per_thread = iterations / num_threads

Benchmark.bm(15) do |x|
  x.report('single-threaded') do
    iterations.times do
      prime_sieve_upto(primes)
    end
  end
  x.report('multi-threaded') do
    num_threads.times.map do
      Thread.new do
        iterations_per_thread.times do
          prime_sieve_upto(primes)
        end
      end
    end.each(&:join)
  end
end

ree-1.8.7-2012.02

                     user     system      total        real
single-threaded  5.660000   0.060000   5.720000 (  5.725174)
multi-threaded   6.110000   0.110000   6.220000 (  6.228208)

MRI ruby 1.9.3-p551

                      user     system      total        real
single-threaded   3.450000   0.060000   3.510000 (  3.531772)
multi-threaded    3.660000   0.080000   3.740000 (  3.760532)

MRI ruby 2.0.0-p598

                      user     system      total        real
single-threaded   3.630000   0.080000   3.710000 (  3.726324)
multi-threaded    3.680000   0.090000   3.770000 (  3.808694)

MRI ruby 2.0.0-p648

                      user     system      total        real
single-threaded   3.210000   0.060000   3.270000 (  3.276048)
multi-threaded    3.330000   0.080000   3.410000 (  3.402474)

MRI ruby 2.1.0

                      user     system      total        real
single-threaded   2.360000   0.070000   2.430000 (  2.422242)
multi-threaded    2.390000   0.070000   2.460000 (  2.462325)

MRI ruby 2.2.3:

                      user     system      total        real
single-threaded   2.300000   0.070000   2.370000 (  2.361750)
multi-threaded    2.410000   0.080000   2.490000 (  2.482332)

jruby-9.0.4.0:

                      user     system      total        real
single-threaded   7.740000   0.280000   8.020000 (  2.676519)
multi-threaded   11.760000   0.230000  11.990000 (  3.064823)

MRI ruby 还是一直在进步,Rubinius 就没测试了,装的好慢,可恨的 GFW

So… how many should you use?

真是应用的或许不是很清晰,可能某处是 IO-bound, 某处是 CPU-bound,也可能都不是,而是 memory-bound,或者也可能在任何地方也并没有最大化消耗资源

以 rails 应用作为例子:

the only way to a surefire answer is to measure: 通过不同的线程数量去运行代码,然后分析测量结果,不通过测量,我们不能找到争取的答案

第七章:How Many Threads Are Too Many?

为了从并发获益,我们必须把一个问题拆分为可以同时运行的较小的任务,如果一个问题有不可分割的重要任务,那么使用并发也不能有更多的性能增益

ALL the threads

1.upto(10_000) do |i|
  Thread.new { sleep }
  puts i
end

以上代码输出:
1
2
...
2043
2044
2045
2046
chapter06/spawning_threads.rb:2:in `initialize': can't create Thread: Resource temporarily unavailable (ThreadError)

Context Switching

IO-bound

代码例子见:./chapter06/io_bound.rb

CPU-bound

… 待续

第八章:Thread safety

What’s really at stake?

When your code isn’t thread-safe, the worst that can happen is that your underlying data becomes incorrect 当你的代码不是线程安全的,这最坏的情况会发生,你的基础数据会变得不正确

The computer is oblivious

Is anything thread-safe by default?

第九章:Protecting Data with Mutexes

Mutual exclusion

# 通用的 mutex 使用方式
mutex.synchronize do 
  shared_array << nil
end

The contract

Making key operations atomic

Mutexes and memory visibility

Mutex performance

第 10 章: Signaling Threads with Condition Variables

The API by example

代码在:chapter10/xkcd_printer.rb

Broadcast

重开1个正在等待状态变量的线程。重开的线程将尝试ConditionVariable#wait所指的mutex锁。若有等待状态的线程的话,就返回该线程。除此之外将返回nil 。

重开所有正在等待状态变量的线程。重开的线程将尝试ConditionVariable#wait 所指的 mutex 锁

第 11 章: Thread-safe Data Structures

第 12 章:Writing Thread-safe Code

… 未完