Skip to content

File Reader Tools

The FileReader tool provides multi-format file reading with line numbers, pagination, and safety caps. It supports plain text files, Jupyter notebooks, and PDFs.

Class Overview

  • FileReader - Four reading methods for different file formats:
    • read() - Text files with line numbers and pagination
    • read_notebook() - Jupyter notebooks (.ipynb)
    • read_pdf() - PDF files (requires optional dependency)
    • read_image() - Image files with multimodal content blocks (requires optional dependency)

Usage

Reading Text Files

from toolregistry_hub import FileReader

# Read a file with line numbers
content = FileReader.read("/path/to/file.py")
print(content)
# [/path/to/file.py] lines 1-50 of 200 (use offset=51 to read more)
# 1 | import os
# 2 | import sys
# 3 |
# 4 | def main():
# ...

# Read with pagination
content = FileReader.read("/path/to/file.py", offset=50, limit=25)

Reading Jupyter Notebooks

# Read notebook cells with type markers and outputs
content = FileReader.read_notebook("analysis.ipynb")
# [Notebook: analysis.ipynb]
#
# --- Cell 1 [markdown] ---
# # Data Analysis
#
# --- Cell 2 [code] ---
# ```python
# import pandas as pd
# df = pd.read_csv("data.csv")
# ```
# Output:
# ...

No external dependencies needed -- uses stdlib json.

Reading PDFs

# Read all pages (up to 20 page cap)
content = FileReader.read_pdf("document.pdf")

# Read specific page range
content = FileReader.read_pdf("document.pdf", pages="5-10")

# Read a single page
content = FileReader.read_pdf("document.pdf", pages="3")

Requires pypdf or pdfplumber:

pip install toolregistry-hub[reader]

If both are installed, pdfplumber is preferred for better text quality.

Reading Images

# Read an image — returns multimodal content blocks
blocks = FileReader.read_image("screenshot.png")
# [
#   {"type": "text", "text": "[Image: screenshot.png (image/png, 45321 bytes)]"},
#   {"type": "image", "source": {"type": "base64", "media_type": "image/png", "data": "iVBOR..."}}
# ]

# With custom max size (default 5 MB base64)
blocks = FileReader.read_image("large_photo.jpg", max_size=1_000_000)

Supported formats: .png, .jpg, .jpeg, .gif, .webp.

If the base64-encoded image exceeds max_size, Pillow is used for adaptive quality downsampling. Requires Pillow:

pip install toolregistry-hub[reader_image]

If Pillow is not installed, the original image is returned with a warning logged.

Parameters

read()

Parameter Type Default Description
path str required Path to text file
offset int 1 Starting line number (1-indexed)
limit int \| None None Max lines to read (default 2000)

read_notebook()

Parameter Type Default Description
path str required Path to .ipynb file

read_pdf()

Parameter Type Default Description
path str required Path to PDF file
pages str \| None None Page range (e.g. "1-5", "3")

read_image()

Parameter Type Default Description
path str required Path to image file
max_size int 5242880 Max base64-encoded size in bytes (5 MB)

Safety Caps

  • Text files: 10 MB max file size
  • Text lines: 2000 lines default per read
  • PDF pages: 20 pages max per call
  • Notebook outputs: 10 KB per cell output
  • Images: 5 MB max base64-encoded size (auto-downsampled if exceeded)

MCP Server Endpoints

POST /tools/reader/read
POST /tools/reader/read_pdf
POST /tools/reader/read_notebook
POST /tools/reader/read_image

API Reference

toolregistry_hub.file_reader.FileReader

Multi-format file reader with line numbers and pagination.

read staticmethod

read(path: str, offset: int = 1, limit: int | None = None) -> str

Read a text file with line numbers.

Parameters:

Name Type Description Default
path str

Path to file.

required
offset int

Starting line number (1-indexed). Defaults to 1.

1
limit int | None

Maximum number of lines to read. Defaults to 2000.

None

Returns:

Type Description
str

File content with line numbers in "N | content" format.

str

Includes a metadata header with file path, total lines, and

str

the range actually read.

Raises:

Type Description
FileNotFoundError

If the file does not exist.

IsADirectoryError

If the path is a directory.

ValueError

If offset is less than 1.

Source code in toolregistry_hub/file_reader.py
@staticmethod
def read(
    path: str,
    offset: int = 1,
    limit: int | None = None,
) -> str:
    """Read a text file with line numbers.

    Args:
        path: Path to file.
        offset: Starting line number (1-indexed). Defaults to 1.
        limit: Maximum number of lines to read. Defaults to 2000.

    Returns:
        File content with line numbers in ``"N | content"`` format.
        Includes a metadata header with file path, total lines, and
        the range actually read.

    Raises:
        FileNotFoundError: If the file does not exist.
        IsADirectoryError: If the path is a directory.
        ValueError: If offset is less than 1.
    """
    p = Path(path)
    if not p.exists():
        raise FileNotFoundError(f"File not found: {path}")
    if p.is_dir():
        raise IsADirectoryError(f"Path is a directory, not a file: {path}")
    if offset < 1:
        raise ValueError("offset must be >= 1")

    effective_limit = limit if limit is not None else _MAX_LINES_DEFAULT

    # Size guard
    file_size = p.stat().st_size
    if file_size > _MAX_FILE_SIZE_BYTES:
        return (
            f"[File too large: {file_size:,} bytes "
            f"(limit {_MAX_FILE_SIZE_BYTES:,}). "
            f"Use offset/limit to read in segments.]"
        )

    text = p.read_text(encoding="utf-8", errors="replace")
    all_lines = text.splitlines()
    total_lines = len(all_lines)

    start = offset - 1  # convert to 0-indexed
    end = min(start + effective_limit, total_lines)
    selected = all_lines[start:end]

    # Build line-numbered output
    width = len(str(end))
    numbered = [
        f"{i + offset:>{width}} | {line}" for i, line in enumerate(selected)
    ]

    # Metadata header
    range_str = f"{offset}-{start + len(selected)}"
    header = f"[{path}] lines {range_str} of {total_lines}"
    if end < total_lines:
        header += f" (use offset={end + 1} to read more)"

    return header + "\n" + "\n".join(numbered)

read_image staticmethod

read_image(path: str, max_size: int = _MAX_IMAGE_SIZE_BYTES) -> list

Read an image file and return as multimodal content blocks.

Returns a list of content blocks (TextBlock + ImageBlock) that the toolregistry pipeline can expand into format-specific multimodal messages via expand_content_blocks().

If the base64-encoded image exceeds max_size, Pillow is used to downsample it. If Pillow is not installed, the original image is returned with a warning.

Parameters:

Name Type Description Default
path str

Path to image file (.png, .jpg, .jpeg, .gif, .webp).

required
max_size int

Maximum base64-encoded size in bytes. Defaults to 5 MB.

_MAX_IMAGE_SIZE_BYTES

Returns:

Type Description
list

A list of two content blocks::

[ {"type": "text", "text": "[Image: name (mime, size)]"}, {"type": "image", "source": { "type": "base64", "media_type": "image/png", "data": "iVBOR..." }} ]

Raises:

Type Description
FileNotFoundError

If the file does not exist.

ValueError

If the file extension is not supported.

Source code in toolregistry_hub/file_reader.py
@staticmethod
def read_image(
    path: str,
    max_size: int = _MAX_IMAGE_SIZE_BYTES,
) -> list:
    """Read an image file and return as multimodal content blocks.

    Returns a list of content blocks (TextBlock + ImageBlock) that the
    toolregistry pipeline can expand into format-specific multimodal
    messages via ``expand_content_blocks()``.

    If the base64-encoded image exceeds ``max_size``, Pillow is used to
    downsample it. If Pillow is not installed, the original image is
    returned with a warning.

    Args:
        path: Path to image file (.png, .jpg, .jpeg, .gif, .webp).
        max_size: Maximum base64-encoded size in bytes. Defaults to 5 MB.

    Returns:
        A list of two content blocks::

            [
                {"type": "text", "text": "[Image: name (mime, size)]"},
                {"type": "image", "source": {
                    "type": "base64",
                    "media_type": "image/png",
                    "data": "iVBOR..."
                }}
            ]

    Raises:
        FileNotFoundError: If the file does not exist.
        ValueError: If the file extension is not supported.
    """
    p = Path(path)
    if not p.exists():
        raise FileNotFoundError(f"File not found: {path}")

    ext = p.suffix.lower()
    if ext not in _SUPPORTED_IMAGE_EXTENSIONS:
        raise ValueError(
            f"Unsupported image format: '{ext}'. "
            f"Supported: {', '.join(sorted(_SUPPORTED_IMAGE_EXTENSIONS))}"
        )

    media_type = _EXTENSION_TO_MIME[ext]
    img_data = p.read_bytes()
    raw_size = len(img_data)

    b64_data = base64.b64encode(img_data).decode("ascii")

    if len(b64_data) > max_size:
        img_data, media_type = FileReader._downsample_image(
            img_data, media_type, max_size
        )
        b64_data = base64.b64encode(img_data).decode("ascii")

    return [
        {
            "type": "text",
            "text": f"[Image: {p.name} ({media_type}, {raw_size} bytes)]",
        },
        {
            "type": "image",
            "source": {
                "type": "base64",
                "media_type": media_type,
                "data": b64_data,
            },
        },
    ]

read_notebook staticmethod

read_notebook(path: str) -> str

Read a Jupyter notebook and return formatted cell contents.

Uses stdlib json only — no external dependencies.

Parameters:

Name Type Description Default
path str

Path to .ipynb file.

required

Returns:

Type Description
str

All cells with type markers (code/markdown) and outputs.

Raises:

Type Description
FileNotFoundError

If the file does not exist.

ValueError

If the file is not a valid notebook.

Source code in toolregistry_hub/file_reader.py
@staticmethod
def read_notebook(path: str) -> str:
    """Read a Jupyter notebook and return formatted cell contents.

    Uses stdlib ``json`` only — no external dependencies.

    Args:
        path: Path to ``.ipynb`` file.

    Returns:
        All cells with type markers (code/markdown) and outputs.

    Raises:
        FileNotFoundError: If the file does not exist.
        ValueError: If the file is not a valid notebook.
    """
    p = Path(path)
    if not p.exists():
        raise FileNotFoundError(f"File not found: {path}")

    try:
        data = json.loads(p.read_text(encoding="utf-8"))
    except json.JSONDecodeError as e:
        raise ValueError(f"Invalid notebook JSON: {e}") from e

    if "cells" not in data:
        raise ValueError(f"Not a valid Jupyter notebook (no 'cells' key): {path}")

    # Detect language from kernel info
    lang = "python"
    kernel_info = data.get("metadata", {}).get("kernelspec", {})
    if kernel_info.get("language"):
        lang = kernel_info["language"]

    lines: list[str] = []
    lines.append(f"[Notebook: {path}]")

    for i, cell in enumerate(data["cells"]):
        cell_type = cell.get("cell_type", "unknown")
        source = "".join(cell.get("source", []))

        lines.append(f"\n--- Cell {i + 1} [{cell_type}] ---")

        if cell_type == "code":
            lines.append(f"```{lang}")
            lines.append(source)
            lines.append("```")

            # Process outputs
            for output in cell.get("outputs", []):
                output_text = FileReader._extract_notebook_output(output)
                if output_text:
                    lines.append(f"Output:\n{output_text}")
        else:
            lines.append(source)

    return "\n".join(lines)

read_pdf staticmethod

read_pdf(path: str, pages: str | None = None) -> str

Read a PDF file and extract text.

Uses pypdf (zero-dependency, BSD) by default. If pdfplumber is installed, uses it for better text quality.

Parameters:

Name Type Description Default
path str

Path to PDF file.

required
pages str | None

Page range string (e.g. "1-5", "3", "10-20"). Max 20 pages per call. Defaults to all pages (up to cap).

None

Returns:

Type Description
str

Extracted text content with page markers.

Raises:

Type Description
FileNotFoundError

If the file does not exist.

ImportError

If neither pypdf nor pdfplumber is installed.

ValueError

If page range is invalid.

Source code in toolregistry_hub/file_reader.py
@staticmethod
def read_pdf(
    path: str,
    pages: str | None = None,
) -> str:
    """Read a PDF file and extract text.

    Uses ``pypdf`` (zero-dependency, BSD) by default. If ``pdfplumber``
    is installed, uses it for better text quality.

    Args:
        path: Path to PDF file.
        pages: Page range string (e.g. ``"1-5"``, ``"3"``, ``"10-20"``).
            Max 20 pages per call. Defaults to all pages (up to cap).

    Returns:
        Extracted text content with page markers.

    Raises:
        FileNotFoundError: If the file does not exist.
        ImportError: If neither ``pypdf`` nor ``pdfplumber`` is installed.
        ValueError: If page range is invalid.
    """
    p = Path(path)
    if not p.exists():
        raise FileNotFoundError(f"File not found: {path}")

    start_page, end_page = FileReader._parse_page_range(pages)

    # Try pdfplumber first (better quality), fall back to pypdf
    try:
        return FileReader._read_pdf_pdfplumber(p, start_page, end_page)
    except ImportError:
        pass

    try:
        return FileReader._read_pdf_pypdf(p, start_page, end_page)
    except ImportError:
        raise ImportError(
            "PDF reading requires 'pypdf' or 'pdfplumber'. "
            "Install with: pip install toolregistry-hub[reader]"
        ) from None