📚 Submission Guidelines

Submission Rules

teams should devise LLM agentic workflows within the predefined modular design space, including:
- Module Recombination: Designing novel agents by recombining the classic modules (reasoning, planning, tool use, and memory). Tutorial
- Module Design: Developing or modifying modules to implement new functionalities while ensuring compatibility with the standardized framework.
- Workflow Construction: Designing the overall structure and interaction flow of the agentic system to effectively integrate the modules and achieve the desired outcomes.
We will use Qwen2.5-72B-instruct for evaluating the submitted agents. The use of external models, training of external models, or utilization of external tools is strictly prohibited.
During the evaluation of submitted agents, all data access must go through the interaction tools provided by the textual web simulator.
External data access is prohibited.
Submission limits:
- Development Phase: Each team may submit up to one submission per day.
- Evaluation Phase: Each team may submit up to three submissions per day.
Evaluation will be time- and token-limited.
Any form of dishonest conduct is strictly prohibited and will result in immediate disqualification if detected.

Rate Limits: RPM=300, TPM=180000 (applies to all phases)
Time Limits:
- Development Phase: 60 minutes maximum (600 tasks from different datasets)
- Final Phase: 120 minutes maximum (600 tasks from different datasets)

Task Parallelism Settings:

Phase	Official Evaluation	With Supported API Key
Development	10 tasks (600 tasks, 60 minutes)	4 tasks (600 tasks, 120 minutes)
Final	5 tasks (600 tasks, 120 minutes)	2 tasks (600 tasks, 240 minutes)

Please make sure your agent can pass the evaluation.
Note: Task parallelism refers to concurrent task handling and is controlled by max_workers in simulator.run_simulation
Note: Please frequently update your github repository to ensure the latest version of your agent and evaluation code.

Submit your solution through the submission button in the top right corner.

Registration must be done using your real name, which must match the name used when claiming any prizes.
Each individual may only join one team per track.

Please carefully SELECT the TRACK you want to submit to.
The content of your submission should be a .py file containing your agent (Only one {your_team}.py file without evaluation code).
Example submissions:
- For Track 1: submission_1
- For Track 2: submission_2
Evaluation Schedule: Agents submitted before 10 AM (UTC+8) each day will be evaluated together. Results will be updated to the leaderboard by 10 PM (UTC+8). You will be notified via email if your agent encounters any errors or timeouts. Thank you for your cooperation.

Public Leaderboard: We will provide a public leaderboard for all teams to monitor and review performance.
Rigorous Review: Submissions will undergo a thorough review to ensure compliance with competition rules and the integrity of results.
Equal Access to Resources: We will ensure all teams have access to sufficient and fair computational resources.