This page provides a detailed description of the configuration files for AgentJet.


Overview

AgentJet uses YAML-format configuration files to set up data, algorithms, rewards, logging, and other runtime behaviors.

Default Configuration

The default config is located at ajet/default_config/ajet_default.yaml.

At a high level, a typical config contains a single root section ajet, which is divided into several logical parts:

  •  Basic Metadata — Project name, experiment name, experiment directory, and backbone selection

    • project_name, experiment_name, experiment_dir
    • backbone: Select training backend (debug, trinity, or verl)
  •  Data & Reward — How to load data and evaluate agents

    • task_reader: Load training/validation samples
    • task_judge: Evaluate agents and compute rewards
    • data: Prompt/response length and batch sizes
  •  Model & Rollout — Model configuration and agent interaction

    • model: Base model to train
    • rollout: Agent-environment interaction settings
    • context_tracker: Conversation/history management

Model Configuration

Specifying the Model

config.yaml
ajet:
  model:
    path: path/to/model
Source Type Example
Local file /mnt/data/models/Qwen2.5-14B-Instruct
HuggingFace repo Qwen/Qwen2.5-14B-Instruct (auto-downloaded)

Environment Variables for LLM-as-Judge

If using LLM-as-a-Judge, configure these environment variables:

# DashScope API key for remote LLM calling
export DASHSCOPE_API_KEY='sk-xxxxxx|sk-yyyyyy'
export DASHSCOPE_API_KEY_BACKUP='sk-zzzzzz'

Data Configuration

Task Reader

task_reader defines how to read training and validation data.

ajet:
  task_reader:
    type: env_service
    env_service:
      env_type: "appworld"
      env_url: "http://127.0.0.1:8080"
      env_action_preference: code
      training_split: train
      validation_split: dev
ajet:
  task_reader:
    type: jsonl_dataset_file
    jsonl_dataset_file:
      training:
        file_path: "data/train.jsonl"
      validation:
        file_path: "data/val.jsonl"
ajet:
  task_reader:
    type: huggingface_dat_repo
    huggingface_dat_repo:
      dataset_path: "gsm8k"
      training_split: "train"
      validation_split: "validation"

Task Judge

task_judge evaluates agent performance and calculates rewards.

config.yaml
ajet:
  task_judge:
    judge_type: customized_protocol  # or 'rubrics_auto_grader'
    judge_protocol: ajet.task_judge.env_service_as_judge->EnvServiceJudge
    alien_llm_model: qwen3-235b-a22b-instruct-2507
    alien_llm_response_length: 512
Option Description
customized_protocol Use a custom Python class for scoring
rubrics_auto_grader Use LLM-based automatic grading

Training Configuration

Backend Selection

AgentJet supports three training backends:

Backend Description
trinity Default. Flexible and scalable framework for RL fine-tuning
verl Volcano Engine reinforcement learning for LLMs
debug Allows breakpoint debugging in IDEs
config.yaml
ajet:
  backbone: trinity  # debug, trinity, or verl

Rollout Configuration

Controls agent behavior during environment interaction:

config.yaml
ajet:
  rollout:
    user_workflow: tutorial.example_appworld.appworld->ExampleAgentScopeWorkflow
    max_env_worker: 128
    temperature: 0.9
    top_p: 1.0
    name: vllm
    n_vllm_engine: 2
    num_repeat: 4
Parameter Description
user_workflow Path to workflow implementation class
temperature / top_p Sampling parameters
name Inference engine (e.g., vllm)
n_vllm_engine Number of vLLM engines (Trinity only)

Common Training Parameters

config.yaml
ajet:
  trainer_common:
    total_epochs: 50
    save_freq: 20
    test_freq: 20
    val_before_train: False
    val_pass_n: 4
    nnodes: 1
    n_gpus_per_node: 8
    mini_batch_num: 1
    fsdp_config:
      param_offload: True
      optimizer_offload: True
Parameter Description
total_epochs Total training epochs
save_freq Checkpoint save frequency (steps)
test_freq Validation frequency (steps)
nnodes / n_gpus_per_node Distributed training setup
fsdp_config FSDP memory optimization

Optimization Algorithms

config.yaml
ajet:
  trainer_common:
    algorithm:
      adv_estimator: grpo
      use_kl_in_reward: False
    optim:
      lr: 1e-6
    use_kl_loss: True
    kl_loss_coef: 0.002
    kl_loss_type: low_var_kl
Parameter Description
adv_estimator Advantage estimator (e.g., grpo)
lr Learning rate
use_kl_loss Include KL divergence in loss
kl_loss_coef KL loss coefficient

Debug Mode

When backbone: debug, additional settings are available:

config.yaml
ajet:
  debug:
    debug_max_parallel: 16
    debug_first_n_tasks: 2
    debug_vllm_port: 18000
    debug_vllm_seed: 12345
    debug_tensor_parallel_size: 4

Debug Mode Use Cases

  • Limiting tasks: Quickly verify the pipeline on a few tasks
  • Fixing randomness: debug_vllm_seed helps reproduce issues
  • Reduced parallelism: Easier to debug with smaller concurrency

Logging & Monitoring

Logger Selection

config.yaml
ajet:
  trainer_common:
    logger: swanlab  # console, wandb, or swanlab
Logger Description
console Standard output for quick progress checking
wandb Weights & Biases experiment tracking
swanlab SwanLab logging

Output Structure

All experiment outputs are saved in ./launcher_record/{experiment_name}:

Directory Contents
Logs Logs and error messages
Metrics Training metrics (depends on logger)
Checkpoint Model checkpoints

Full Configuration Example

Complete Configuration Template
config.yaml
ajet:
  project_name: "ajet_default_project"
  experiment_name: "read_yaml_name"
  experiment_dir: "auto"
  backbone: debug

  model:
    path: /path/to/model/Qwen2.5-14B-Instruct

  data:
    max_prompt_length: 3000
    max_response_length: 15000
    train_batch_size: 32

  rollout:
    user_workflow: tutorial.example_appworld.appworld->ExampleAgentScopeWorkflow
    force_disable_toolcalls: False
    max_env_worker: 128
    gamma: 1.0
    compute_madness_checklist:
      - "nonsense"
    agent_madness_termination: True
    agent_madness_reward: -1.0
    max_response_length_in_one_turn: 4096
    max_model_len: 18000
    multi_turn:
      max_sample_per_task: 30
      max_steps: 30
      expected_steps: 1
    tensor_model_parallel_size: 1
    n_vllm_engine: 2
    max_num_seqs: 10
    name: vllm
    num_repeat: 4
    temperature: 0.9
    top_p: 1.0
    val_kwargs:
      temperature: 0.0
      top_k: -1
      top_p: 1.0
      do_sample: False
      num_repeat: 1

  task_reader:
    type: env_service
    env_service:
      env_type: "appworld"
      env_url: "http://127.0.0.1:8080"
      env_action_preference: code
      training_split: train
      validation_split: dev

  task_judge:
    judge_type: customized_protocol
    judge_protocol: ajet.task_judge.env_service_as_judge->EnvServiceJudge
    alien_llm_model: qwen3-235b-a22b-instruct-2507
    alien_llm_response_length: 512

  debug:
    debug_max_parallel: 16
    debug_first_n_tasks: 2
    debug_vllm_port: 18000
    debug_vllm_seed: 12345
    debug_tensor_parallel_size: 4

  trainer_common:
    val_before_train: False
    val_pass_n: 4
    save_freq: 20
    test_freq: 20
    total_epochs: 50
    nnodes: 1
    n_gpus_per_node: 8
    logger: swanlab
    algorithm:
      adv_estimator: grpo
      use_kl_in_reward: False
    mini_batch_num: 1
    fsdp_config:
      param_offload: True
      optimizer_offload: True
    optim:
      lr: 1e-6
    use_kl_loss: True
    kl_loss_coef: 0.002
    kl_loss_type: low_var_kl
    ulysses_sequence_parallel_size: 1
    checkpoint_base_dir: ./saved_checkpoints

  context_tracker:
    context_tracker_type: "linear"
    alien_llm_model: qwen3-235b-a22b-instruct-2507
    alien_llm_response_length: 512

Next Steps