IO & BytesIO in Python

Python's built-in io module provides tools for handling text and binary streams efficiently.

Types of Streams

Text I/O (TextIOBase) → Handles text files (.txt, .csv, etc.).
Binary I/O (BufferedIOBase) → Handles binary files (.png, .mp4, .zip, etc.).
Raw I/O (RawIOBase) → Low-level access to files & devices.
In-Memory Streams (BytesIO, StringIO) → Simulate file-like objects in RAM.

`BytesIO`

io.BytesIO is a file-like object that operates in memory instead of disk.

Creating a BytesIO Object

import io

# Creating a BytesIO object with binary data
byte_stream = io.BytesIO(b"Hello, this is binary data!")

# Read the data
print(byte_stream.read())  # Output: b'Hello, this is binary data!'

Writing to BytesIO

byte_stream = io.BytesIO()
byte_stream.write(b"Python BytesIO Example")

# Reset cursor to the beginning before reading
byte_stream.seek(0)
print(byte_stream.read())  # Output: b'Python BytesIO Example'

Using BytesIO Like a File

with io.BytesIO() as byte_file:
    byte_file.write(b"Hello World!")
    byte_file.seek(0)  # Move back to start
    print(byte_file.read())  # Output: b'Hello World!'

Writing `BytesIO` to a File

1️⃣ Using .getvalue() (For small data)

byte_stream = io.BytesIO(b"Binary Content")

with open("output.bin", "wb") as f:
    f.write(byte_stream.getvalue())  # Extract all bytes and write to file

2️⃣ Using shutil.copyfileobj() (For large data)

import shutil

byte_stream = io.BytesIO(b"Large binary content...")

with open("output_large.bin", "wb") as f:
    shutil.copyfileobj(byte_stream, f)  # Efficient copy without memory overhead

Reading & Writing Binary Files with `open()`

# Writing binary data to a file
with open("data.bin", "wb") as f:
    f.write(b"Some binary data")

# Reading binary data from a file
with open("data.bin", "rb") as f:
    content = f.read()
    print(content)  # Output: b'Some binary data'

# reading file in 1000 bytes chunks
with open("data.bin", "rb") as f:
    while chunk := f.read(1000):
        print(chunk)

# download from cloud and save in 1000 bytes chunks
with self.fs.open(_file["name"], "rb") as cloud_file, open(temp_path, "wb") as local_file:
    # b"" is the sentinel value, pause when you get empty bytes
    for chunk in iter(lambda: cloud_file.read(4096), b""):  # Read in 4KB chunks
        local_file.write(chunk)

Seeking & Telling in Binary Streams

seek(offset, whence) → Move the file pointer
whence=0 → Start from the beginning (default)
whence=1 → Move relative to current position
whence=2 → Move relative to the end
Example: Seeking & Reading in Chunks

with open("data.bin", "rb") as f:
    f.seek(5)  # Move to the 5th byte
    print(f.read(10))  # Read next 10 bytes

Get Current Position with tell()

with open("data.bin", "rb") as f:
    print(f.tell())  # Output: 0 (Start position)
    f.read(5)
    print(f.tell())  # Output: 5 (After reading 5 bytes)

Using truncate() to modify the file size

io.truncate(size) is used to resize a file to a specific byte length. If size is smaller than the current file size, the file is truncated (cut off) at that point. If size is larger, the file is extended, and the new space is filled with null bytes (\x00). It is useful for clearing files or adjusting their length without rewriting them.

with open("example.txt", "wb") as f:
    f.write(b"Hello, World!")
    f.truncate(5)  # File now contains only "Hello"

Downloading a very large file very fast 🚀

To download a large file very fast, we can make a header request to get the file size bytes.
Create a file and preallocate the size.
Then start number of threads that will download their range in parallel.
After downloading, each thread will write their range to the file.

Handling Large Files Efficiently

Using seek() & read() for Large Files

with open("large_file.bin", "rb") as f:
    chunk_size = 1024  # Read in 1KB chunks
    while chunk := f.read(chunk_size):
        process(chunk)  # Replace with your processing logic

Using `mmap` for Efficient File Access

Memory-mapped File for Large Binary Data

import mmap

with open("large_file.bin", "r+b") as f:
    mm = mmap.mmap(f.fileno(), 0)
    print(mm[:100])  # Read first 100 bytes
    mm.close()

✅ No RAM overhead even for 100GB+ files!

When to Use `BytesIO`?

✅ When you don't want to use disk I/O (temporary storage)
✅ Handling binary data manipulation in memory
✅ Simulating file objects for APIs that expect file-like input

IO & BytesIO in Python

Types of Streams

BytesIO

Writing BytesIO to a File

Reading & Writing Binary Files with open()