Python’s “yield” Keyword & Generators – Explained


Section 1: Introduction

In Python, the “yield” keyword is used in a special type of function called a generator function. Generator functions are a convenient way to create iterators that enable you to iterate over a sequence of values without having to store the entire sequence in memory. This can be particularly useful when dealing with large data sets, infinite sequences, or data streams.

The “yield” keyword allows you to temporarily suspend the execution of a generator function at a specific point, returning a value to the caller, while maintaining the function’s internal state. When the generator is called again, execution resumes from where it left off, using the preserved state.

In this article, we will explore the concept of generators in Python, the role of the “yield” keyword, and practical examples of how to use “yield” in various scenarios. We will also discuss generator expressions and the use cases for “yield” and generators in real-world applications.

Section 2: Python Generators

Generators in Python are a special kind of function that allow you to create iterators in a memory-efficient manner. Instead of returning a single value or a collection of values at once, generators produce one value at a time, only generating the next value when requested. This is achieved using the “yield” keyword within the generator function. When a generator function is called, it returns a generator object, which can be used to iterate over the sequence of values produced by the generator.

2.1 Differences between generators and regular functions

There are several key differences between generators and regular functions in Python:

  1. Generators use the “yield” keyword to produce values, while regular functions use the “return” keyword.
  2. Generators return a generator object when called, whereas regular functions return a single value or a collection of values.
  3. The execution of a generator function can be paused and resumed, allowing it to maintain its internal state across multiple calls. Regular functions, on the other hand, do not maintain their internal state once they have returned.
  4. Generators are memory-efficient and can handle large data sets or infinite sequences, while regular functions may require storing the entire sequence in memory.

Table – (Differences between generators and regular functions):

Feature Generators Regular Functions
Return type Generator object Any value or None
Execution Paused and resumed at ‘yield’ statement Runs until completion
State Maintains state between calls State not preserved
Memory efficiency On-demand item generation May require large storage

2.2 Iterables, Iterators, and Generators

Iterable, iterator, and generator are three related concepts in Python:

  • Iterable: An object that can be looped over using a “for” loop. Examples include lists, tuples, dictionaries, sets, and strings. Iterables implement the __iter__() method, which returns an iterator.
  • Iterator: An object that represents a stream of values and implements the __next__() method in Python 3 (or next() in Python 2). Calling this method on an iterator returns the next value in the sequence. When there are no more items to return, the iterator raises the StopIteration exception.
  • Generator: A special type of iterator created using a generator function. Generators use the “yield” keyword to produce values one at a time, and maintain their internal state between calls.

2.3 The iterator protocol

The iterator protocol is a standard interface in Python that allows objects to be looped over using a “for” loop. The protocol consists of two methods:

  1. __iter__() – This method should be implemented on an iterable object and must return an iterator.
  2. __next__() (or next() in Python 2) – This method should be implemented on an iterator object and must return the next value in the sequence. When there are no more items to return, it should raise the StopIteration exception.

When you use a “for” loop to iterate over an iterable, Python first calls the __iter__() method on the iterable to obtain an iterator. Then, it repeatedly calls the __next__() method on the iterator to get the next value, until the iterator raises the StopIteration exception, signaling that there are no more items to iterate over.

Generators implicitly implement the iterator protocol by defining both the __iter__() and __next__() methods. The __iter__() method returns the generator object itself, and the __next__() method is implemented by the generator function’s logic, which uses the “yield” keyword to produce values.

Section 3: The “yield” Keyword

3.1 How “yield” works in a generator function

The “yield” keyword is used within a generator function to produce values one at a time. When a generator function is called, it returns a generator object without executing any of the function’s code. The generator object can then be used to iterate over the sequence of values produced by the generator.

When the generator’s __next__() method is called (implicitly, through a “for” loop, or explicitly by calling next()), the function’s code starts executing until it encounters a “yield” statement. The value specified after the “yield” keyword is returned to the caller, and the function’s execution is paused, retaining its internal state. The next time the generator’s __next__() method is called, the execution resumes from the point where it was paused, and the process continues until the generator function is exhausted or raises the StopIteration exception.

3.2 Advantages of using “yield” in Python

Using “yield” in Python has several advantages:

  1. Memory efficiency: Generators only produce values as they are requested, so they don’t require storing the entire sequence in memory. This is beneficial when dealing with large data sets or infinite sequences.
  2. Lazy evaluation: Generators only compute the values as needed, which can improve performance in some cases by reducing the number of computations.
  3. Simplified code: Generator functions can simplify code by replacing complex iterator classes with more straightforward and readable code.
  4. Easy handling of asynchronous tasks: Generators can be used in conjunction with asynchronous programming, allowing for efficient handling of tasks that require waiting for external events, such as reading from a file or fetching data over the network.

3.3 Understanding the execution flow with “yield”

Let’s consider the following generator function to better understand the execution flow with “yield”:

def simple_generator():
    yield "First value"
    yield "Second value"
    yield "Third value"
  1. When the function simple_generator() is called, it returns a generator object without executing any code.
  2. When the generator’s __next__() method is called (either explicitly or through a “for” loop), the function starts executing from the beginning, and stops at the first “yield” statement. It returns the value “First value” and pauses the execution.
  3. The next time the generator’s __next__() method is called, the execution resumes from the point where it was paused (right after the first “yield” statement), and continues until it encounters the next “yield” statement. It returns the value “Second value” and pauses the execution again.
  4. The process continues until the generator function is exhausted or raises the StopIteration exception, signaling that there are no more values to return.

In this example, iterating over the generator object returned by simple_generator() would produce the values “First value”, “Second value”, and “Third value” in sequence.

Section 4: Practical Examples of Using “yield”

4.1 Basic example of a generator function with “yield”

Here’s a simple example of a generator function that uses the “yield” keyword to generate a sequence of square numbers:

def square_numbers(n):
    for i in range(1, n+1):
        yield i * i

for square in square_numbers(5):
    print(square)

Output:

1
4
9
16
25

Explanation:

  • The square_numbers generator function yields the square of each number in the range from 1 to n (inclusive).
  • Iterating over the generator object returned by square_numbers(5) produces the square numbers 1, 4, 9, 16, and 25.

4.2 Using “yield” to create infinite sequences

Here’s an example of a generator function that generates an infinite sequence of even numbers using “yield”:

def even_numbers():
    num = 0
    while True:
        yield num
        num += 2

even_gen = even_numbers()
for _ in range(5):
    print(next(even_gen))

Output:

0
2
4
6
8

Explanation:

  • The even_numbers generator function yields even numbers indefinitely in an infinite loop.
  • Using next() to get the first five even numbers from the generator object even_gen results in the output: 0, 2, 4, 6, and 8.

4.3 Generator function with “yield” for reading large files

Here’s an example of a generator function that reads a large file line by line using “yield”:

def read_file(file_path):
    with open(file_path, "r") as file:
        for line in file:
            yield line.strip()

for line in read_file("sample.txt"):
    print(line)

Assuming we have a text file called sample.txt:

content inside sample.txt file

Output:

This is the first line of the file.
This is the second line of the file.
This is the third line of the file.

Explanation:

  • The read_file generator function reads the file line by line and yields each line after stripping any leading or trailing whitespace.
  • Iterating over the generator object returned by read_file(“sample.txt”) prints the lines of the file one by one.

4.4 Combining multiple generator functions using “yield from”

Here’s an example of using “yield from” to combine two generator functions:

def gen1():
    yield "A"
    yield "B"

def gen2():
    yield "X"
    yield "Y"

def combined_gen():
    yield from gen1()
    yield from gen2()

for value in combined_gen():
    print(value)

Output:

A
B
X
Y

Explanation:

  • The combined_gen generator function uses “yield from” to delegate the yielding of values to gen1() and gen2().
  • Iterating over the generator object returned by combined_gen() produces the combined sequence of values from both gen1() and gen2(): A, B, X, and Y.

Section 5: Generator Expressions

Generator expressions are a concise way to create generators, similar to how list comprehensions are used to create lists. They have a similar syntax to list comprehensions, but are enclosed in parentheses rather than brackets.

5.1 Comparing generator expressions with list comprehensions

List comprehensions:

  • Create a new list in memory.
  • Have a syntax that uses square brackets [].

Generator expressions:

  • Create a generator object.
  • Use less memory as they generate values on the fly.
  • Have a syntax that uses parentheses ().

Table – (Comparing generator expressions with list comprehensions):

Feature Generator Expressions List Comprehensions
Syntax Similar to list comprehensions Uses square brackets
Output Generator object List
Memory efficiency On-demand item generation Creates entire list
Use case Large datasets, streaming data Smaller datasets

5.2 Examples of generator expressions and their usage

Here’s a simple example of a generator expression to generate the square of numbers from 1 to 5:

squares_gen = (x * x for x in range(1, 6))

for square in squares_gen:
    print(square)

Output:

1
4
9
16
25

Explanation:

  • The generator expression (x * x for x in range(1, 6)) generates the square of each number in the range from 1 to 5 (inclusive).
  • Iterating over the generator object squares_gen produces the square numbers 1, 4, 9, 16, and 25.

Another example of a generator expression is to calculate the sum of squares without creating a list in memory:

sum_of_squares = sum(x * x for x in range(1, 6))
print(sum_of_squares)

Output:

55

Explanation:

  • The generator expression (x * x for x in range(1, 6)) generates the square of each number in the range from 1 to 5 (inclusive).
  • Using the sum() function with the generator expression calculates the sum of squares (1 + 4 + 9 + 16 + 25) without creating a list in memory, resulting in the output 55.

Section 6: Use Cases of “yield” and Generators

6.1 Resource management

Generators are useful for managing resources, such as file handles or network connections, which need to be closed or released when they are no longer needed. By using “yield” and the context manager, you can ensure that resources are properly managed:

import contextlib

@contextlib.contextmanager
def open_file(filename, mode):
    file = open(filename, mode)
    try:
        yield file
    finally:
        file.close()

with open_file("file.txt", "r") as file:
    for line in file:
        print(line.strip())

Here, open_file is a generator function that ensures the file is closed after its usage.

6.2 Stream processing

Generators are ideal for processing large streams of data, as they allow you to process the data one chunk at a time, without loading the entire dataset into memory. For example, processing a large CSV file line by line:

def read_large_csv(file_name):
    with open(file_name, "r") as file:
        for line in file:
            yield line.strip().split(",")

csv_gen = read_large_csv("large_file.csv")
for row in csv_gen:
    print(row)

This generator reads a large CSV file line by line, yielding each row as a list of values, without loading the entire file into memory.

6.3 Implementing coroutines

Generators can also be used to implement coroutines, which are a way to express concurrency in your code without the overhead of threads or processes. Coroutines can be used for various tasks, such as asynchronous I/O, parallel processing, or event-driven programming:

async def my_coroutine():
    while True:
        value = await some_async_operation()
        if value is not None:
            yield value

async for value in my_coroutine():
    print(value)

In this example, my_coroutine is an asynchronous generator function that uses the “yield” keyword to produce values as they become available from some_async_operation(). The async for loop processes the values as they are generated.

Section 7: Conclusion

In this article, we’ve explored the “yield” keyword in Python and its significance in the context of generators. We’ve covered the following key concepts:

  • The “yield” keyword is used in generator functions to produce a sequence of values without having to store them all in memory.
  • Generators are a powerful tool in Python for creating and managing sequences, allowing for more efficient and memory-friendly processing of large datasets or infinite sequences.
  • We’ve discussed the differences between generator functions and regular functions, as well as the iterator protocol and how it works with generators.
  • We’ve provided practical examples of using “yield” and generators, such as creating infinite sequences, processing large files, and combining multiple generator functions using “yield from“.
  • We’ve also explored generator expressions, which offer a more concise and memory-efficient alternative to list comprehensions.
  • Finally, we’ve looked at various use cases of “yield” and generators in Python, including resource management, stream processing, and implementing coroutines.

In conclusion, understanding the “yield” keyword and generators is essential for any Python developer, as they provide a powerful and efficient way to manage large datasets and complex sequences. With the concepts covered in this article, you should now have a solid understanding of the “yield” keyword and its various applications in Python programming.

Our goal is to make your coding journey easier, and we hope this article has done just that.

Python's yield Keyword & Generators - Explained - FI

Happy programming!

About the Author

This article was authored by Rawnak.