AgentJet loads training tasks from various data sources through Task Reader. This page covers the Task schema definition and different built-in Task Readers for common scenarios.
Overview
In agent training, all training data must be represented as tasks following a unified schema.
Key Concepts
- Unified Schema: All tasks conform to the
Taskstructure regardless of source - Multiple Sources: Load from local files, HuggingFace datasets, interactive environments, or auto-generate new tasks
- Automatic Routing: The framework selects the appropriate reader based on
ajet.task_reader.type
Task Schema
All training tasks must be defined according to the following structure:
class Task(BaseModel):
main_query: str = Field(default="")
init_messages: List[dict] = Field(default=[])
task_id: str = Field(default="")
env_type: str = Field(default="")
metadata: dict = Field(default_factory=dict)
Field Descriptions
| Field | Type | Description |
|---|---|---|
main_query |
str |
The main instruction or question for the agent to solve |
init_messages |
List[dict] |
Initial conversation messages (e.g., system prompts). Each must have role and content fields |
task_id |
str |
Unique identifier for the task |
env_type |
str |
Environment type (e.g., "math", "appworld") |
metadata |
dict |
Additional context information (e.g., reference answers for reward calculation) |
Example Task
{
"main_query": "What is 15 * 23?",
"init_messages": [
{
"role": "system",
"content": "You are a helpful math assistant."
}
],
"task_id": "math_001",
"env_type": "math",
"metadata": {
"answer": "345",
"difficulty": "easy"
}
}
Best Practices
- Use
metadatato store information needed for reward computation (e.g., reference answers, scoring rubrics) - Keep
main_queryclear and concise - Use
init_messagesfor system prompts or few-shot examples
Built-in Task Readers
AgentJet provides multiple built-in Task Readers for different scenarios. The framework automatically routes to the correct reader based on ajet.task_reader.type.
Quick Selection Guide
JSONL File
You have prepared task data in JSONL format locally.
HuggingFace
Load tasks from HuggingFace Hub (e.g., GSM8K, MATH).
EnvService
Tasks come from a running environment service.
1. JSONL File Reader
When to use: You have prepared training tasks in JSONL format locally.
Each line should be a JSON object conforming to the Task schema:
How it works
- Reads tasks line-by-line from specified JSONL files
- Automatically validates against Task schema
- Supports separate training and validation splits
2. HuggingFace Dataset Reader
When to use: Load tasks from HuggingFace Hub datasets (e.g., GSM8K, MATH).
ajet:
task_reader:
type: huggingface_dat_repo
huggingface_dat_repo:
dataset_path: "gsm8k" # HF dataset repo name
dataset_name: "main" # Optional: dataset subset name
training_split: "train" # Training split name
validation_split: "test" # Validation split name
How it works
- Downloads dataset from HuggingFace Hub using
datasetslibrary - Automatically maps dataset fields to Task schema
- Caches downloaded data locally for faster subsequent runs
3. EnvService Reader
When to use: Tasks are provided by an interactive environment service (e.g., AppWorld, RL gym environments).
ajet:
task_reader:
type: env_service
env_service:
env_type: "appworld" # Environment type
env_url: "http://127.0.0.1:8080" # Service URL
env_action_preference: code # Action format: code/text/box
training_split: train
validation_split: dev
How it works
- Connects to a running environment service via HTTP
- Pulls task instances from the environment
- Supports dynamic task generation from interactive environments
Use Cases
- Training agents in simulated environments (e.g., FrozenLake, game environments)
- Complex interactive scenarios where tasks are generated dynamically