Publications

You can also find my articles on my ORCID profile.

Journal


SSpMM: Efficiently Scalable SpMM Kernels Across Multiple Generations of Tensor Cores

Published in IEEE Transactions on Parallel and Distributed Systems (TPDS), 2025

Zeyu Xue, Mei Wen, Jianchao Yang, Minjin Tang, Zhongdi Luo, Jing Feng, Yang Shi, Zhaoyun Chen, Junzhong Shen and Johannes Langguth. SSpMM: Efficiently Scalable SpMM Kernels Across Multiple Generations of Tensor Cores[J]. IEEE Transactions on Parallel and Distributed Systems (TPDS), 2025.

Paper | Code

SPSA: Exploring Sparse-Packing Computation on Systolic Arrays From Scratch

Published in IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 2025

Minjin Tang, Mei Wen, Jianchao Yang, Zeyu Xue and Junzhong Shen. SPSA: Exploring Sparse-Packing Computation on Systolic Arrays From Scratch[J]. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 2025.

Paper | Code

Conference


SwiftPrune: Hessian-Free Weight Pruning for Large Language Models

Published in The 2025 Conference on Empirical Methods in Natural Language Processing (EMNLP), 2025

Yuhan Kang, Yang Shi, Mei Wen, Jun He, Jianchao Yang, Zeyu Xue, Jing Feng, and Xinwang Liu. SwiftPrune: Hessian-Free Weight Pruning for Large Language Models. The 2025 Conference on Empirical Methods in Natural Language Processing (EMNLP). ACL, 2025: 12868–12879.

Paper | Slides | Code

HyFiSS: A Hybrid Fidelity Stall-Aware Simulator for GPGPUs

Published in 2024 57th IEEE/ACM International Symposium on Microarchitecture (MICRO), 2024

Yang J, Wen M, Chen D, et al. HyFiSS: A Hybrid Fidelity Stall-Aware Simulator for GPGPUs[C]//2024 57th IEEE/ACM International Symposium on Microarchitecture (MICRO). IEEE, 2024: 168-185.

Paper | Slides | Code

MSA^2: An Efficient Sparsity-Aware Accelerator for Matrix Multiplication with Multi-core Systolic Arrays

Published in 2024 International Conference on Algorithms and Architectures for Parallel Processing, 2024

Minjin Tang, Mei Wen, Junzhong Shen, Jingkui Yang, Zeyu Xue and Zili Shao. MSA2: An Efficient Sparsity-Aware Accelerator for Matrix Multiplication with Multi-core Systolic Arrays[C]. International Conference on Algorithms and Architectures for Parallel Processing. Singapore: Springer Nature Singapore, 2024: 263-282.

Paper | Slides | Code

Releasing the Potential of Tensor Core for Unstructured SpMM using Tiled-CSR Format

Published in 2023 IEEE 41st International Conference on Computer Design (ICCD), 2023

Zeyu Xue, Mei Wen, Zhaoyun Chen, Yang Shi, Minjin Tang, Jianchao Yang and Zhongdi Luo. Releasing the potential of tensor core for unstructured spmm using tiled-csr format[C]. 2023 IEEE 41st International Conference on Computer Design (ICCD). IEEE, 2023: 457-464.

Paper | Slides | Code