In this document, we demonstrate how to implement and train, from scratch, an agent that can use Python to perform calculations and solve 'gsm8k' math problems.
- Define agent workflow Create your agent using AgentScope/Langchain/OpenaiSDK or only http requests, wrap it in a Workflow class.
- Define reward Configure how the agent's outputs are evaluated and scored.
- Prepare dataset Set up the dataset and configure the task reader.
- Debug (Optional) Test your workflow in debug mode before full training.
- Start training Launch the training process and track progress.
Step 1: ✨Define agent Workflow + Reward
First of all, create a directory for this training project:
Next, define your workflow (or convert an existing workflow). Here we use AgentScope to implement this agent. You can toggle two code before and after convertion to see the difference. If you prefer langchain or openai sdk, please refer to this article.
class MathToolWorkflow(Workflow): # ✨✨ inherit `Workflow` class
name: str = "math_agent_workflow"
async def execute(self, workflow_task: WorkflowTask, tuner: AjetTuner) -> WorkflowOutput:
# run agentscope
query = workflow_task.task.main_query
self.toolkit = Toolkit()
self.toolkit.register_tool_function(execute_python_code)
self.agent = ReActAgent(
name="math_react_agent", sys_prompt=system_prompt,
model=tuner.as_agentscope_model(), # ✨✨ compared with a normal agentscope agent, here is the difference!
formatter=DashScopeChatFormatter(),
toolkit=self.toolkit,
memory=InMemoryMemory(), max_iters=2,
)
self.agent.set_console_output_enabled(False)
msg = Msg("user", query, role="user")
result = await self.agent.reply(msg)
final_answer = extract_final_answer(result)
# compute reward
reference_answer = workflow_task.task.metadata["answer"].split("####")[-1].strip()
match = re.search(r"\\boxed\{([^}]*)\}", final_answer)
if match: is_success = (match.group(1) == reference_answer)
else: is_success = False
return WorkflowOutput(reward=(1.0 if is_success else 0.0), metadata={"final_answer": final_answer})
class MathToolWorkflow(object):
name: str = "math_agent_workflow"
async def execute(self, workflow_task: WorkflowTask) -> WorkflowOutput:
# run agentscope
query = workflow_task.task.main_query
self.toolkit = Toolkit()
self.toolkit.register_tool_function(execute_python_code)
self.agent = ReActAgent(
name="math_react_agent", sys_prompt=system_prompt,
model=DashScopeChatModel(model='qwen-max'),
formatter=DashScopeChatFormatter(),
toolkit=self.toolkit,
memory=InMemoryMemory(), max_iters=2,
)
self.agent.set_console_output_enabled(False)
msg = Msg("user", query, role="user")
result = await self.agent.reply(msg)
final_answer = extract_final_answer(result)
# compute reward
reference_answer = workflow_task.task.metadata["answer"].split("####")[-1].strip()
match = re.search(r"\\boxed\{([^}]*)\}", final_answer)
if match: is_success = (match.group(1) == reference_answer)
else: is_success = False
return WorkflowOutput(reward=(1.0 if is_success else 0.0), metadata={"final_answer": final_answer})
Step 2: ✨Prepare dataset
Data Sources
AgentJet provides multiple ways to read data:
- Read from local files on disk
- Read from a Hugging Face repo
- Read from an EnvService
Download the openai/gsm8k dataset:
Now, we have obtained all materials required to train the agent.
# ------------------ main configuration ------------------
ajet:
project_name: example_math_agent
task_reader:
type: huggingface_dat_repo # ✨✨✨✨ `env_service` or `dataset_file` or `huggingface_dat_repo`
# effective when `type: huggingface_dat_repo`
huggingface_dat_repo:
dataset_path: 'openai/gsm8k'
training_split: "train"
validation_split: "test"
task_judge:
# ✨✨✨✨ null, because in this certain case, we write reward function together with workflow
judge_protocol: null
model:
# ✨✨✨✨ set the model to be trained
path: Qwen/Qwen2.5-7B
rollout:
user_workflow: "tutorial.example_math_agent.math_agent->ExampleMathLearn" # ✨✨✨✨ write and select workflow
num_repeat: 6 # grpo `n`
tensor_model_parallel_size: 1 # vllm tp
max_response_length_in_one_turn: 1024
max_model_len: 10000
data:
train_batch_size: 100
max_prompt_length: 3000
max_response_length: 7000
debug:
debug_max_parallel: 1
debug_first_n_tasks: 1
trainer_common:
save_freq: 100
test_freq: 100
total_epochs: 100
logger: swanlab
# ------------------ do not modify ------------------
hydra:
searchpath:
- file://ajet/default_config
- file://ajet/default_config/verl
- file://ajet/default_config/trinity
# ------------------ do not modify ------------------
defaults:
- verl_default
- trinity_default
- ajet_default
- _self_
Step 6: Debug (Optional)
Before full training, you can run some test in debug mode, using raw base model to test whether bug exists. We choose VSCode to debug because it is open-source and fast.
VS Code Debugging
- You can create
.vscode/launch.jsonfor breakpoint debugging:
After .vscode/launch.json is created, press F5 to start debugging. (Do not forget to configure python venv path in VSCode.)
For more debugging techniques, please refer to debugging guidelines.
Step 7: Start Training
After debugging, launch the full training:
Output Location
Training logs and checkpoints will be saved default to: