🌏 AgentSociety Challenge

We are excited to announce the AgentSociety Challenge, a competition for designing large language model (LLM) agents to advance web experience. As organizers, we recognize the crucial role online review platforms play in today's web society, serving as vital spaces for experience sharing, advice seeking, and social interaction. We have carefully designed two competitive tracks to explore the potential of LLM agents: the User Modeling Track, which focuses on understanding user preferences and simulating online reviews, and the Recommendation Track, dedicated to delivering personalized recommendations across diverse scenarios. The Challenge utilizes large-scale public datasets from prominent online review platforms, complemented by a textual simulator to provide interactive feedback for LLM agents. Through this challenge, we aim to drive innovation and breakthrough in the integration of LLM technologies with real-world Web applications.

📍 Problem Statement

The participants are tasked to construct agentic workflows with LLM to process information, write reviews, and make recommendations by interacting with online review platforms. To facilitate this, we have built an interactive simulation environment that allows agents to engage with these platforms in a controlled setting. The interactive environment is built based on open-source datasets, including Yelp[1], Amazon[2], and Goodreads[3], complemented by a textual simulator to provide interactive feedback for agents. Specifically, this challenge consists of two tracks:

Based on the concept of modular agent design, participants are encouraged to adopt an actionable modular design space with four key modules: Planning, Reasoning, Tool use, and Memory[4]. These modules have standardized interface to allow them cooperate with each other in a workflow: upon receiving a task query, the agent begins by decomposing it into sub-tasks through the Planning module. The Reasoning module then processes each sub-task, and when necessary, activates the Tool use module to select external tools for problem-solving. Additionally, the Memory module supports the reasoning process by accessing past observations and experiences, ensuring the agent continuously refines its actions based on feedback from the environment. Through this modular design approach, participants can not only create effective solutions for this challenge but also contribute reusable modules to the broader agent community, fostering innovation and collaboration in agent development.

As a bonus, the top-ranking teams advancing to the Final Phase will receive assistance in deploying their agents on the OpenCity platform[5]. OpenCity is an open-source platform designed to simulate large-scale LLM agents within a digital city environment. Solutions submitted to the Recommendation and User Modeling tracks will be adapted to simulate offline movement and online reviews, respectively, offering a dynamic illustration of the Agent Society vision. Selected teams will be invited to present their work at the WWW Workshop, where they can compete for the Best Demo award.

📅 Timeline

The competition will be held from Jan 1st, 2025 to Feb 14th, 2025, following the timeline outlined below:

Note: All submission deadlines are end-of-day in the Anywhere on Earth (AoE) time zone. Top 20 teams will be selected to participate in the Final Phase.

💪🏻 Token Support

To support participants, we will provide API-key access during the development phase. This will be offered through InfinigenceAI, with the following specifications: RPM=100, TPM=60000, RPD=10000, and the ability to call models such as Qwen-72b-instruct. If you wish to use the API, please send your identification documents via email. Required documents include your full name, organization, proof of participation in this competition, track of participation and any other relevant identification details. Once your verification is completed, we will send you the API key. Please note that the number of API keys is limited. Note: All API keys have been distributed. For teams who have not received an API key, please arrange your own API access. Thank you for your cooperation.

🏆 Award

The total prize pool is 12,000 USD. To encourage and reward excellence, the competition will offer monetary prizes and certificates to top-3 winning teams of each track:

Besides, challenge prizes also include the in-person travel grant for the top 2 winning teams each track. This remuneration is to be paid after the in-person presentation in our workshop the Web Conference: We will recommend 1st & 2nd place of each track to submit a competition-related papers for inclusion in the conference's companion proceedings.

🏪 Organizing Committee

Hosting Organization: Tsinghua University, Infinigence AI, the Web Conference


Committee Members: Yuwei Yan (HKUST-gz); Yu Shang, Qingbing Zeng, Yu Li, Keyu Zhao, Zhiheng Zheng, Xuefei Ning (Tsinghua University); Tianji Wu, Shengen Yan (Infinigence AI)


Committee Chair: Fengli Xu, Yu Wang, Yong Li (Tsinghua University)

📬 Contact

For any questions or inquiries, please contact us at LLMSociety-Challenge@outlook.com.

📚 Reference

  1. Nabiha Asghar. Yelp dataset challenge: Review rating prediction. arXiv preprint arXiv:1605.05362, 2016. [arXiv]
  2. Yupeng Hou, Jiacheng Li, Zhankui He, An Yan, Xiusi Chen, Julian McAuley. Bridging Language and Items for Retrieval and Recommendation. arXiv preprint arXiv:2403.03952, 2024. [arXiv]
  3. Mengting Wan, Julian J. McAuley. Item recommendation on monotonic behavior chains. In Proceedings of the 12th ACM Conference on Recommender Systems (RecSys 2018), Vancouver, BC, Canada, October 2-7, 2018, pp. 86–94. ACM, 2018. [DOI]
  4. Yu Shang, Yu Li, Keyu Zhao, Likai Ma, Jiahe Liu, Fengli Xu, Yong Li. AgentSquare: Automatic LLM Agent Search in Modular Design Space. arXiv preprint arXiv:2410.06153, 2024. [arXiv].
  5. Yuwei Yan, Qingbin Zeng, Zhiheng Zheng, Jingzhe Yuan, Jie Feng, Jun Zhang, Fengli Xu, Yong Li. OpenCity: A Scalable Platform to Simulate Urban Activities with Massive LLM Agents. arXiv preprint arXiv:2410.21286, 2024. [arXiv].