Recently, in the Deep Past Challengeheld by Kaggle, the world’s premier data science competition platform, a team led by the ShanghaiTech DataTech club delivered an exceptional performance. Relying on solid technical expertise and rigorous research methodology, the team stood out among 2,673 teams worldwide to clinch the Global Championship, demonstrating the solid technical strength of ShanghaiTech students at the frontier of AI.
Kaggle, a global data science and machine learning platform under Google, is currently one of the largest and most recognized AI competition platforms in the world. This specific challenge was hosted by the Deep Past Initiative, focusing on utilizing AI to solve the complex puzzle of deciphering human cultural heritage. The event attracted 3,311 developers globally, with a total of 68,000 solution submissions.
The competition focused on a global challenge in cultural heritage preservation, and required participants to develop an AI system to translate English from the transliterated cuneiform texts of Old Assyrian (a dialect of Akkadian), dating back over 4,000 years. These texts, often inscribed on clay tablets to record debts, contracts, and daily affairs, are vital historical materials for studying ancient civilizations.
While these clay tablets are keys to understanding the ancient world, fewer than ten experts globally are capable of deciphering them. Furthermore, many existing tablets are damaged, resulting in extremely scarce data—a factor that poses an ultimate challenge for AI technology.
Facing these difficulties, the ShanghaiTech team exhibited outstanding organizational and problem-solving capabilities. The team comprised three undergraduates from the School of Information Science and Technology (SIST), Hong Mutian ’26, Li Zhengru ’29, and Wang Yueting ’27, in collaboration with Gu Guoqin ’26 from Xiamen University.
During the preparation phase, the SIST provided essential computing power resources, ensuring a solid foundation for complex model training and technical iteration. Throughout the competition, the team adopted the core philosophy that “data quality determines model performance.” They conducted a systematic technical resolution to the problem.
Data Extraction: Moving away from poor-quality official raw data, the team used cutting-edge vision large models to accurately extract high-quality ancient text-English parallel corpora from massive historical archives.
Precision Engineering: They designed specialized prompts to achieve text formatting, spatial information alignment, and key feature anchoring.
Data Augmentation: To address data scarcity, the team combined ancient dictionaries with large language models (LLMs) to generate simulated corpora that conform to ancient grammar and historical contexts.
Finally, the team completed the training and ensemble of 11 deeply optimized models. By employing techniques such as model quantization and parallel computing, they enhanced inference efficiency and completed all computational tasks within the competition’s time limit, topping the leaderboard with a significant lead.
The excellent performance of the ShanghaiTech team in this international competition fully demonstrates the innovative and practical abilities of the university’s students in the fields of data science and AI. The technical solution developed by the team provides a feasible technical reference for the automated deciphering of cuneiform clay tablets, which will significantly assist in the research and preservation of ancient civilizations.
