文件读取工具¶
FileReader 工具提供多格式文件读取功能,支持行号显示、分页和安全上限。支持纯文本文件、Jupyter Notebook 和 PDF。
类概述¶
FileReader- 四种读取方法,对应不同文件格式:read()- 文本文件,带行号和分页read_notebook()- Jupyter Notebook(.ipynb)read_pdf()- PDF 文件(需要可选依赖)read_image()- 图片文件,返回多模态内容块(需要可选依赖)
使用方法¶
读取文本文件¶
from toolregistry_hub import FileReader
# 读取文件,显示行号
content = FileReader.read("/path/to/file.py")
print(content)
# [/path/to/file.py] lines 1-50 of 200 (use offset=51 to read more)
# 1 | import os
# 2 | import sys
# 3 |
# 4 | def main():
# ...
# 分页读取
content = FileReader.read("/path/to/file.py", offset=50, limit=25)
读取 Jupyter Notebook¶
# 读取 notebook 单元格,显示类型标记和输出
content = FileReader.read_notebook("analysis.ipynb")
# [Notebook: analysis.ipynb]
#
# --- Cell 1 [markdown] ---
# # Data Analysis
#
# --- Cell 2 [code] ---
# ```python
# import pandas as pd
# df = pd.read_csv("data.csv")
# ```
# Output:
# ...
无需外部依赖 -- 使用标准库 json。
读取 PDF¶
# 读取所有页面(上限 20 页)
content = FileReader.read_pdf("document.pdf")
# 读取指定页面范围
content = FileReader.read_pdf("document.pdf", pages="5-10")
# 读取单页
content = FileReader.read_pdf("document.pdf", pages="3")
需要安装 pypdf 或 pdfplumber:
如果两者都已安装,优先使用 pdfplumber 以获得更好的文本质量。
读取图片¶
# 读取图片 — 返回多模态内容块
blocks = FileReader.read_image("screenshot.png")
# [
# {"type": "text", "text": "[Image: screenshot.png (image/png, 45321 bytes)]"},
# {"type": "image", "source": {"type": "base64", "media_type": "image/png", "data": "iVBOR..."}}
# ]
# 自定义最大尺寸(默认 5 MB base64)
blocks = FileReader.read_image("large_photo.jpg", max_size=1_000_000)
支持格式:.png、.jpg、.jpeg、.gif、.webp。
如果 base64 编码后的图片超过 max_size,将使用 Pillow 进行自适应质量压缩。需要安装 Pillow:
如果未安装 Pillow,将返回原始图片并记录警告日志。
参数¶
read()¶
| 参数 | 类型 | 默认值 | 描述 |
|---|---|---|---|
path |
str |
必填 | 文本文件路径 |
offset |
int |
1 |
起始行号(从 1 开始) |
limit |
int \| None |
None |
最大读取行数(默认 2000) |
read_notebook()¶
| 参数 | 类型 | 默认值 | 描述 |
|---|---|---|---|
path |
str |
必填 | .ipynb 文件路径 |
read_pdf()¶
| 参数 | 类型 | 默认值 | 描述 |
|---|---|---|---|
path |
str |
必填 | PDF 文件路径 |
pages |
str \| None |
None |
页面范围(如 "1-5"、"3") |
read_image()¶
| 参数 | 类型 | 默认值 | 描述 |
|---|---|---|---|
path |
str |
必填 | 图片文件路径 |
max_size |
int |
5242880 |
base64 编码最大字节数(5 MB) |
安全上限¶
- 文本文件:最大 10 MB
- 文本行数:每次读取默认 2000 行
- PDF 页数:每次调用最多 20 页
- Notebook 输出:每个单元格输出最大 10 KB
- 图片:base64 编码最大 5 MB(超出时自动压缩)
MCP 服务端点¶
POST /tools/reader/read
POST /tools/reader/read_pdf
POST /tools/reader/read_notebook
POST /tools/reader/read_image
API 参考¶
toolregistry_hub.file_reader.FileReader ¶
Multi-format file reader with line numbers and pagination.
read
staticmethod
¶
Read a text file with line numbers.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
path
|
str
|
Path to file. |
required |
offset
|
int
|
Starting line number (1-indexed). Defaults to 1. |
1
|
limit
|
int | None
|
Maximum number of lines to read. Defaults to 2000. |
None
|
Returns:
| Type | Description |
|---|---|
str
|
File content with line numbers in |
str
|
Includes a metadata header with file path, total lines, and |
str
|
the range actually read. |
Raises:
| Type | Description |
|---|---|
FileNotFoundError
|
If the file does not exist. |
IsADirectoryError
|
If the path is a directory. |
ValueError
|
If offset is less than 1. |
Source code in toolregistry_hub/file_reader.py
read_image
staticmethod
¶
Read an image file and return as multimodal content blocks.
Returns a list of content blocks (TextBlock + ImageBlock) that the
toolregistry pipeline can expand into format-specific multimodal
messages via expand_content_blocks().
If the base64-encoded image exceeds max_size, Pillow is used to
downsample it. If Pillow is not installed, the original image is
returned with a warning.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
path
|
str
|
Path to image file (.png, .jpg, .jpeg, .gif, .webp). |
required |
max_size
|
int
|
Maximum base64-encoded size in bytes. Defaults to 5 MB. |
_MAX_IMAGE_SIZE_BYTES
|
Returns:
| Type | Description |
|---|---|
list
|
A list of two content blocks:: [ {"type": "text", "text": "[Image: name (mime, size)]"}, {"type": "image", "source": { "type": "base64", "media_type": "image/png", "data": "iVBOR..." }} ] |
Raises:
| Type | Description |
|---|---|
FileNotFoundError
|
If the file does not exist. |
ValueError
|
If the file extension is not supported. |
Source code in toolregistry_hub/file_reader.py
read_notebook
staticmethod
¶
Read a Jupyter notebook and return formatted cell contents.
Uses stdlib json only — no external dependencies.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
path
|
str
|
Path to |
required |
Returns:
| Type | Description |
|---|---|
str
|
All cells with type markers (code/markdown) and outputs. |
Raises:
| Type | Description |
|---|---|
FileNotFoundError
|
If the file does not exist. |
ValueError
|
If the file is not a valid notebook. |
Source code in toolregistry_hub/file_reader.py
read_pdf
staticmethod
¶
Read a PDF file and extract text.
Uses pypdf (zero-dependency, BSD) by default. If pdfplumber
is installed, uses it for better text quality.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
path
|
str
|
Path to PDF file. |
required |
pages
|
str | None
|
Page range string (e.g. |
None
|
Returns:
| Type | Description |
|---|---|
str
|
Extracted text content with page markers. |
Raises:
| Type | Description |
|---|---|
FileNotFoundError
|
If the file does not exist. |
ImportError
|
If neither |
ValueError
|
If page range is invalid. |