Skip to content

Process Pools in Python Multiprocessing

What is a Process Pool?

  • A ProcessPool is a pool of worker processes used to execute tasks concurrently.
  • It simplifies the management of multiple processes, especially when there are many tasks to distribute.

Why Use a Process Pool?

  • To execute tasks in parallel across multiple processes without manually managing them.
  • To reuse worker processes, reducing the overhead of creating and destroying processes repeatedly.
  • Provides an easy-to-use interface for parallel execution.

Key Methods of Process Pools

  1. apply(): Executes a single task in a process and waits for the result (blocking).
  2. apply_async(): Executes a single task asynchronously and returns a AsyncResult object (non-blocking).
  3. map(): Distributes an iterable of tasks across processes and collects results (blocking).
  4. map_async(): Same as map() but non-blocking.
  5. starmap(): Similar to map() but supports multiple arguments for each task. map doesn't support multiple argument.
  6. imap(): Lazily returns an iterator to results as they become available.

How to Use a Process Pool?

Basic Syntax

from multiprocessing import Pool

with Pool(processes=4) as pool:  # Create a pool with 4 worker processes
    result = pool.map(func, iterable)

Examples

1. Using map()

from multiprocessing import Pool

def square(x):
    return x * x

if __name__ == "__main__":
    with Pool(processes=4) as pool:
        results = pool.map(square, [1, 2, 3, 4])
    print(results)  # Output: [1, 4, 9, 16]

2. Using apply_async()

from multiprocessing import Pool

def cube(x):
    return x ** 3

if __name__ == "__main__":
    with Pool(processes=2) as pool:
        result = pool.apply_async(cube, (3,))
        print(result.get())  # Output: 27

3. Using starmap()

from multiprocessing import Pool

def add(a, b):
    return a + b

if __name__ == "__main__":
    with Pool(processes=2) as pool:
        results = pool.starmap(add, [(1, 2), (3, 4), (5, 6)])
    print(results)  # Output: [3, 7, 11]

Map Vs iMap vs imap_unordered

The imap() function in Python's multiprocessing.Pool is a variant of map() that processes tasks lazily, meaning it returns an iterator instead of a fully-evaluated list. This can be beneficial when working with large datasets because results are produced and consumed one at a time, avoiding the need to store all results in memory at once.


Key Differences Between map() and imap()

Feature map() imap()
Return Type List (eager evaluation) Iterator (lazy evaluation)
Memory Usage Stores all results in memory Produces results one at a time
Use Case Small-to-medium datasets Large datasets where memory is a concern

Example of imap()

from multiprocessing import Pool
import time

def slow_square(x):
    time.sleep(1)  # Simulate a slow computation
    return x * x

if __name__ == "__main__":
    with Pool(processes=2) as pool:
        results = pool.imap(slow_square, [1, 2, 3, 4, 5])

        # Process results as they are ready
        for result in results:
            print(result)

Output (one result every second):

1
4
9
16
25

Why Use imap()?

  • Memory Efficiency:

Instead of generating and storing all results at once, imap() produces them incrementally. Useful when working with large input datasets or when the task is memory-intensive.

  • Time Efficiency:

If you want to start processing results as they are computed, imap() enables this instead of waiting for all tasks to complete like map().


Variants:

  • imap_unordered():
  • Similar to imap() but does not preserve the order of results.
  • Results are returned as soon as individual tasks are completed, regardless of their order in the input.
  • Useful for maximizing throughput when the order of results doesn't matter.

    Example

with Pool(processes=2) as pool:
    results = pool.imap_unordered(slow_square, [1, 2, 3, 4, 5])
    for result in results:
        print(result)  # Results might appear out of order

Advantages

  • Simplifies parallel execution of tasks.
  • Automatically handles process management (creation, destruction, and task distribution).
  • Efficient for CPU-bound tasks that benefit from parallelism.

Disadvantages

  • Limited to functions (cannot directly use methods tied to objects).
  • Overhead in creating and managing the pool can be significant for very lightweight tasks.

When to Use a Process Pool?

  • Use a ProcessPool for independent, parallelizable tasks that are CPU-bound and can benefit from multiple processes.
  • Avoid using it for I/O-bound tasksβ€”for such tasks, consider using ThreadPoolExecutor or asynchronous programming.