📚 Submission Guidelines

Submission Rules

  1. teams should devise LLM agentic workflows within the predefined modular design space, including:
    • Module Recombination: Designing novel agents by recombining the classic modules (reasoning, planning, tool use, and memory). Tutorial
    • Module Design: Developing or modifying modules to implement new functionalities while ensuring compatibility with the standardized framework.
    • Workflow Construction: Designing the overall structure and interaction flow of the agentic system to effectively integrate the modules and achieve the desired outcomes.
  2. We will use Qwen2.5-72B-instruct for evaluating the submitted agents. The use of external models, training of external models, or utilization of external tools is strictly prohibited.
  3. During the evaluation of submitted agents, all data access must go through the interaction tools provided by the textual web simulator.
  4. External data access is prohibited.
  5. Submission limits:
    • Development Phase: Each team may submit up to one submission per day.
    • Evaluation Phase: Each team may submit up to three submissions per day.
  6. Evaluation will be time- and token-limited.
  7. Any form of dishonest conduct is strictly prohibited and will result in immediate disqualification if detected.

❗️ Important Evaluation Parameters

  • Rate Limits: RPM=300, TPM=180000 (applies to all phases)
  • Time Limits:
    • Development Phase: 60 minutes maximum (600 tasks from different datasets)
    • Final Phase: 120 minutes maximum (600 tasks from different datasets)
  • Task Parallelism Settings:
    Phase Official Evaluation With Supported API Key
    Development 10 tasks (600 tasks, 60 minutes) 4 tasks (600 tasks, 120 minutes)
    Final 5 tasks (600 tasks, 120 minutes) 2 tasks (600 tasks, 240 minutes)
  • Please make sure your agent can pass the evaluation.
  • Note: Task parallelism refers to concurrent task handling and is controlled by max_workers in simulator.run_simulation
  • Note: Please frequently update your github repository to ensure the latest version of your agent and evaluation code.

Submit Your Agent

Submit your solution through the submission button in the top right corner.

Fairness Mechanisms

  1. Public Leaderboard: We will provide a public leaderboard for all teams to monitor and review performance.
  2. Rigorous Review: Submissions will undergo a thorough review to ensure compliance with competition rules and the integrity of results.
  3. Equal Access to Resources: We will ensure all teams have access to sufficient and fair computational resources.