🤗 HF Model, 🤗 HF Dataset, 🧑💻 Code
<aside> <img src="attachment:a16d6d45-b06c-4aff-935e-4a0a78150937:skywork_ai.jpeg" alt="attachment:a16d6d45-b06c-4aff-935e-4a0a78150937:skywork_ai.jpeg" width="40px" />
Core Contributors: Jujie He, Jiacai Liu, Chris Yuhao Liu, Rui Yan, Chaojie Wang, Peng Cheng, Yang Liu
Contributors: Xiaoyu Zhang, Fuxiang Zhang, Jiacheng Xu, Wei Shen, Siyuan Li, Liang Zeng, Tianwen Wei, Cheng Cheng, Bo An, Yahui Zhou
</aside>
<aside> 🔥
We are excited to announce the release of the Skywork-OR1
model series, which includes two general-purpose reasoning models—Skywork-OR1-7B
and Skywork-OR1-32B
—as well as a math-specialized model, skywork-OR1-Math-7B
.
Skywork-OR1-32B-Preview
delivers the 671B-parameter Deepseek-R1 performance on math tasks (AIME24 and AIME25) and coding tasks (LiveCodeBench).Skywork-OR1-7B-Preview
outperforms all similarly sized models in both math and coding scenarios.Skywork-OR1-Math-7B
is specifically optimized for mathematical reasoning, scoring 69.8 on AIME24 and 52.3 on AIME25 — well ahead of all models of similar size.We have open-sourced the model weights, training data, and complete training code. This technical blog details the training process, ensuring full reproducibility for the community. A more comprehensive technical report will be released next week. We believe this initiative will significantly advance reasoning models and support ongoing research within the LLM community.
</aside>
We will update the citation once the technical report is released. In the meantime, please cite the following:
@misc{skywork-or1-2025,
title={Skywork Open Reasoner Series},
author = {He, Jujie and Liu, Jiacai and Liu, Chris Yuhao and Yan, Rui and Wang, Chaojie and Cheng, Peng and Zhang, Xiaoyu and Zhang, Fuxiang and Xu, Jiacheng and Shen, Wei and Li, Siyuan and Zeng, Liang and Wei, Tianwen and Cheng, Cheng and An, Bo and Liu, Yang and Zhou, Yahui},
howpublished={\\url{<https://capricious-hydrogen-41c.notion.site/Skywork-Open-Reaonser-Series-1d0bc9ae823a80459b46c149e4f51680>}},
note={Notion Blog},
year={2025}
}
Following the release of Skywork-o1
, our first reasoning model focusing on Chinese, our Skywork team continues to iterate and improve upon its foundation. Today, we are proud to launch the Skywork-OR1
series—a major upgrade that delivers state-of-the-art reasoning performance at every model size. This release marks a significant leap in logical understanding and complex reasoning tasks. All Skywork-OR1
models are available for free and fully open-sourced, affirming our commitment to open AI development.
Skywork-OR1
series adopts the most transparent open-source practices in the industry: we have released the model weights, training datasets, and full training code. All resources are publicly accessible on GitHub and Hugging Face 🤗. A companion technical blog on Notion details our data processing pipeline, training methodology, and key technical insights, offering a fully reproducible reference for the community. A more comprehensive technical report will be released next week, sharing deeper lessons from the training of reasoning models. We believe this level of openness will foster shared progress in reasoning capabilities across the AI community.
In our evaluation, the Skywork-OR1
series introduces avg@k as the core metric, measuring a model’s average performance across k attempts. Unlike the conventional pass@k, which only checks whether the model gives the correct answer at least once, avg@k offers a better view of generation stability and reasoning consistency.
For the math domain, we primarily focus on NuminaMath-1.5, a comprehensive collection of 896K math problems sourced from widely used sources and advanced mathematical topics. While the number of problems in NuminaMath-1.5 is sufficient, its quality still requires careful examination before it can be used.
For the code domain, we find that there are far fewer data source options, and the difficulty is generally low concerning the current models' capabilities. In our pilot studies, we experimented with popular considerations like CODE-RL, TACO, and the Eurus-RL collection in their original mixture, but only obtained unsatisfactory results.
To select and curate high-quality data for RL, we adhere to the following general criteria for both data domains: