🌏 AgentSociety Challenge
We are excited to announce the AgentSociety Challenge, a competition for designing large language model (LLM) agents to advance web experience. As organizers, we recognize the crucial role online review platforms play in today's web society, serving as vital spaces for experience sharing, advice seeking, and social interaction. We have carefully designed two competitive tracks to explore the potential of LLM agents: the User Modeling Track, which focuses on understanding user preferences and simulating online reviews, and the Recommendation Track, dedicated to delivering personalized recommendations across diverse scenarios. The Challenge utilizes large-scale public datasets from prominent online review platforms, complemented by a textual simulator to provide interactive feedback for LLM agents. Through this challenge, we aim to drive innovation and breakthrough in the integration of LLM technologies with real-world Web applications.
📍 Problem Statement
The participants are tasked to construct agentic workflows with LLM to process information, write reviews, and make recommendations by interacting with online review platforms. To facilitate this, we have built an interactive simulation environment that allows agents to engage with these platforms in a controlled setting. The interactive environment is built based on open-source datasets, including Yelp[1], Amazon[2], and Goodreads[3], complemented by a textual simulator to provide interactive feedback for agents. Specifically, this challenge consists of two tracks:
- User Modeling Track: In this track, participants must design agents to simulate user reviews and star ratings, focusing on simulating user behavior in specific scenarios by leveraging their historical actions and accessible environmental data. By utilizing the text simulation environment, participants come up with new solutions to these problems in an interactive environment in agentic mode, ensuring a comprehensive and diverse evaluation of user behaviors.
- Recommendation Track: In this track, participants are tasked with using agents to generate rankings based on the current user's preferences, which emphasizes building personalized recommendation assistants that provide users with appropriate suggestions tailored to particular contexts.
Based on the concept of modular agent design, participants are encouraged to adopt an actionable modular design space with four key modules: Planning, Reasoning, Tool use, and Memory[4]. These modules have standardized interface to allow them cooperate with each other in a workflow: upon receiving a task query, the agent begins by decomposing it into sub-tasks through the Planning module. The Reasoning module then processes each sub-task, and when necessary, activates the Tool use module to select external tools for problem-solving. Additionally, the Memory module supports the reasoning process by accessing past observations and experiences, ensuring the agent continuously refines its actions based on feedback from the environment. Through this modular design approach, participants can not only create effective solutions for this challenge but also contribute reusable modules to the broader agent community, fostering innovation and collaboration in agent development.
As a bonus, the top-ranking teams advancing to the Final Phase will receive assistance in deploying their agents on the OpenCity platform[5]. OpenCity is an open-source platform designed to simulate large-scale LLM agents within a digital city environment. Solutions submitted to the Recommendation and User Modeling tracks will be adapted to simulate offline movement and online reviews, respectively, offering a dynamic illustration of the Agent Society vision. Selected teams will be invited to present their work at the WWW Workshop, where they can compete for the Best Demo award.
📅 Timeline
The competition will be held from Jan 1st, 2025 to Feb 14th, 2025, following the timeline outlined below:
- Start of Development Phase: Jan 1st, 2025 (Final submission deadline: Jan 30th, 2025, AoE)
- Start of Final Phase: Feb 1st, 2025
- Winner Notification and Paper Invitation: Feb 14th, 2025
💪🏻 Token Support
To support participants, we will provide API-key access during the development phase. This will be offered through InfinigenceAI, with the following specifications: RPM=100, TPM=60000, RPD=10000, and the ability to call models such as Qwen-72b-instruct. If you wish to use the API, please send your identification documents via email. Required documents include your full name, organization, proof of participation in this competition, track of participation and any other relevant identification details. Once your verification is completed, we will send you the API key. Please note that the number of API keys is limited. Note: All API keys have been distributed. For teams who have not received an API key, please arrange your own API access. Thank you for your cooperation.
🏆 Award
The total prize pool is 12,000 USD. To encourage and reward excellence, the competition will offer monetary prizes and certificates to top-3 winning teams of each track:
- 🥇 1st Place: 2,000 USD
- 🥈 2nd Place: 1,000 USD
- 🥉 3rd Place: 500 USD
- 🥇 1st Place: 1,500 USD
- 🥈 2nd Place: 1,000 USD
🏪 Organizing Committee
Hosting Organization: Tsinghua University, Infinigence AI, the Web Conference
Committee Members: Yuwei Yan (HKUST-gz); Yu Shang, Qingbing Zeng, Yu Li, Keyu Zhao, Zhiheng Zheng, Xuefei Ning (Tsinghua University); Tianji Wu, Shengen Yan (Infinigence AI)
Committee Chair: Fengli Xu, Yu Wang, Yong Li (Tsinghua University)
📬 Contact
For any questions or inquiries, please contact us at LLMSociety-Challenge@outlook.com.
📚 Reference
- Nabiha Asghar. Yelp dataset challenge: Review rating prediction. arXiv preprint arXiv:1605.05362, 2016. [arXiv]
- Yupeng Hou, Jiacheng Li, Zhankui He, An Yan, Xiusi Chen, Julian McAuley. Bridging Language and Items for Retrieval and Recommendation. arXiv preprint arXiv:2403.03952, 2024. [arXiv]
- Mengting Wan, Julian J. McAuley. Item recommendation on monotonic behavior chains. In Proceedings of the 12th ACM Conference on Recommender Systems (RecSys 2018), Vancouver, BC, Canada, October 2-7, 2018, pp. 86–94. ACM, 2018. [DOI]
- Yu Shang, Yu Li, Keyu Zhao, Likai Ma, Jiahe Liu, Fengli Xu, Yong Li. AgentSquare: Automatic LLM Agent Search in Modular Design Space. arXiv preprint arXiv:2410.06153, 2024. [arXiv].
- Yuwei Yan, Qingbin Zeng, Zhiheng Zheng, Jingzhe Yuan, Jie Feng, Jun Zhang, Fengli Xu, Yong Li. OpenCity: A Scalable Platform to Simulate Urban Activities with Massive LLM Agents. arXiv preprint arXiv:2410.21286, 2024. [arXiv].