Workflow

This tutorial introduces how to define a trainable workflow.

AgentJet provides two convenient and mutually compatible ways to wrap your Workflow:

Simple: Emphasizes simplicity, ease of use, and readability
Advanced: Emphasizes flexibility, controllability, and extensibility

In this article we use AgentScope framework for demonstration.

Simple Practice

Simple Practice Abstract

Simply set model argument in AgentScope ReActAgent argument to tuner.as_agentscope_model() when initializing your agent.
Wrap your code with class MyWorkflow(Workflow) and you agent is ready to be tuned.

1. When to Use This Simple Practice

Choose Simple Practice If You...

Know exactly which agents should be trained, or the number of agents is small
Already finished basic debugging of your workflow
Do not need to change which agents are trained on the fly

2. Convert Your Workflow to AgentJet Trainable Workflow

The very first step is to create a class as a container to wrap your code:

converted_workflow.py - AgentJet Workflow

from ajet import AjetTuner, Workflow, WorkflowOutput, WorkflowTask
class MyWorkflow(Workflow):
    async def execute(self, workflow_task: WorkflowTask, tuner: AjetTuner) -> WorkflowOutput:
        # ... your ReActAgent workflow here 🛩️ ...
        return WorkflowOutput(reward=..., metadata={...})

Next, use the tuner argument, call its tuner.as_agentscope_model() method:

BeforeAfter

model = DashScopeChatModel(model_name="qwen-max", stream=False)  # 🛩️ change here
agent_instance = ReActAgent(
   name=f"Friday",
   sys_prompt="You are a helpful assistant",
   model=model,
   formatter=DashScopeChatFormatter(),
)

model = tuner.as_agentscope_model() # 🛩️ change here
agent_instance = ReActAgent(
   name=f"Friday",
   sys_prompt="You are a helpful assistant",
   model=model,
   formatter=DashScopeChatFormatter(),
)

AjetTuner

AjetTuner also has .as_raw_openai_sdk_client() and .as_oai_baseurl_apikey() method. But .as_agentscope_model() is more convenient for AgentScope agent workflow.

3. Code Example

$\text{[math]}$

Math Agent

Training a math agent that can write Python code to solve mathematical problems.

Learning to Ask

Learning to ask questions like a doctor for medical consultation scenarios.

Countdown Game

Writing a countdown game using AgentScope and solving it with RL.

Frozen Lake

Solving a frozen lake walking puzzle using AgentJet's reinforcement learning.

Advanced Practice

Advanced Practice Abstract

The tuner.as_agentscope_model() function has hidden paramters, please further complete them to tell AgentJet the identity of agents.
The ajet.Workflow class has hidden attribute trainable_targets, please assign it manually to narrow down agents to be tuned.

1. When to Use Advanced Practice

When designing a multi-agent collaborative workflow where each agent plays a different role, AgentJet provides enhanced training and debugging capabilities.

Multi-Agent Benefits

With a multi-agent setup, you can:

Precisely control which agents are fine-tuned
Explicitly define the default model for agents not being trained
Switch trainable targets on the fly without modifying source code

1. How to promote to advanced agent scenario:

Simple, there are only two more issues that should be take care of in addition:

i. .as_agentscope_model has three hidden (optional) parameters, complete them for each agent.

parameter	explaination
`agent_name`	The name of this agent
`target_tag`	A tag that mark the agent category
`debug_model`	The model used when this agent is not being tuned

as_agentscope_model() parameters

model_for_an_agent = tuner.as_agentscope_model(
    agent_name="AgentFriday",    # the name of this agent
    target_tag="Agent_Type_1",                # `target_tag in self.trainable_targets` means we train this agent, otherwise we do not train this agent.
    debug_model=OpenAIChatModel(
        model_name="Qwen/Qwen3-235B-A22B-Instruct-2507",
        stream=False,
        api_key="api_key",
    ),      # the model used when this agent is not in `self.trainable_targets`
)

ii. Workflow has a hidden (optional) attribute called trainable_targets, config it.

`trainable_targets` value	explaination
`trainable_targets = None`	All agents using `as_agentscope_model` will be trained
`trainable_targets = ["Agent_Type_1", "Agent_Type_2"]`	Agents with `target_tag=Agent_Type_1`, `target_tag=Agent_Type_2`, ... will be trained
`trainable_targets = []`	Illegal, no agents are trained

Scenario	Model Used
`target_tag` in `trainable_targets`	Trainable model
`target_tag` NOT in `trainable_targets`	Registered `debug_model`

Warning

Regardless of target_tag differences, all agents share a single model instance (one model weight to play different roles, the model receives different preceptions when playing different roles).

2. Multi-Agent Example

Here's a complete example with multiple agent roles (Werewolves game):

tutorial/example_werewolves/start.py

class ExampleWerewolves(Workflow):
    trainable_targets: List[str] | None = Field(default=["werewolf"], description="List of agents to be fine-tuned.")

    async def execute(self, workflow_task: WorkflowTask, tuner: AjetTuner) -> WorkflowOutput:

        # ensure trainable targets is legal
        assert self.trainable_targets is not None, "trainable_targets cannot be None in ExampleWerewolves (because we want to demonstrate a explicit multi-agent case)."

        # bad guys and good guys cannot be trained simultaneously
        # (because mix-cooperation-competition MARL needs too many advanced techniques to be displayed here)
        if "werewolf" in self.trainable_targets:
            assert len(self.trainable_targets) == 1, "Cannot train hostile roles simultaneously."
        else:
            assert len(self.trainable_targets) != 0, "No trainable targets specified."

        # make and shuffle roles (fix random seed for reproducibility)
        roles = ["werewolf"] * 3 + ["villager"] * 3 + ["seer", "witch", "hunter"]
        task_id = workflow_task.task.metadata["random_number"]
        np.random.seed(int(task_id))
        np.random.shuffle(roles)

        # initialize agents
        players = []
        for i, role in enumerate(roles):
            default_model = OpenAIChatModel(
                model_name="/mnt/data_cpfs/model_cache/modelscope/hub/Qwen/Qwen/Qwen3-235B-A22B-Instruct-2507/",
                stream=False,
                client_args={"base_url": "http://22.17.52.4:2888/v1"},
                api_key="no_api_key",
                generate_kwargs={"temperature": 0.01},
            )
            model_for_this_agent = tuner.as_agentscope_model(
                agent_name=f"Player{i + 1}",    # the name of this agent
                target_tag=role,                # `target_tag in self.trainable_targets` means we train this agent, otherwise we do not train this agent.
                debug_model=default_model,      # the model used when this agent is not in `self.trainable_targets`
            )
            agent = ReActAgent(
                name=f"Player{i + 1}",
                sys_prompt=get_official_agent_prompt(f"Player{i + 1}"),
                model=model_for_this_agent,
                formatter=DashScopeMultiAgentFormatter()
                    if role in self.trainable_targets
                    else OpenAIMultiAgentFormatter(),
                max_iters=3 if role in self.trainable_targets else 5,
            )
            # agent.set_console_output_enabled(False)
            players += [agent]

        # reward condition
        try:
            good_guy_win = await werewolves_game(players, roles)
            raw_reward = 0
            is_success = False
            if (good_guy_win and self.trainable_targets[0] != "werewolf") or (
                not good_guy_win and self.trainable_targets[0] == "werewolf"
            ):
                raw_reward = 1
                is_success = True
            logger.warning(f"Raw reward: {raw_reward}")
            logger.warning(f"Is success: {is_success}")
        except BadGuyException as e:
            logger.bind(exception=True).exception(
                f"Error during game execution. Game cannot continue, whatever the cause, let's punish trainable agents  (Although they maybe innocent)."
            )
            raw_reward = -0.1
            is_success = False
        except Exception as e:
            logger.bind(exception=True).exception(
                f"Error during game execution. Game cannot continue, whatever the cause, let's punish trainable agents  (Although they maybe innocent)."
            )
            raw_reward = -0.1
            is_success = False

        return WorkflowOutput(reward=raw_reward, is_success=is_success)

Configuration Flexibility

In this example:

role describes an agent's in-game identity (werewolf, villager, etc.)
chosen_model defines the default model when the role is not being trained
You can flexibly switch training targets by modifying trainable_targets

TinkerJet

Wrapping and training your agent on a machine without GPU.

Working in progress and comming soon.

Next Steps

Data Pipeline

Configure data loading from files, HuggingFace, or environments.

Task Judger

Set up reward functions to evaluate agent performance.