当前位置：首页 > news >正文

[论文笔记] LLM大模型剪枝篇——1、调研

news 2025/7/10 17:44:08

Attention Is All You Need But You Don’t Need All Of It For Inference of Large Language Models

LLaMA2在剪枝时，跳过ffn和跳过full layer的效果差不多。相比跳过ffn/full layer，跳过attention layer的影响会更小。

跳过attention layer：7B/13B从100%参数剪枝到66%，平均指标只下降1.7～1.8pp。

跳过ffn：7B/13B从100%参数剪枝到66%，平均指标下降了12.2～15.1pp。

跳过full later：7B/13B从100%参数剪枝到66%，平均指标下降了12.2～13pp。

LLaMA2在剪枝时，是否跳过最后一层的ffn/attention layer，影响不大。

The Unreasonable Ineffectiveness of the Deeper Layers

剪枝崩溃临界点：不同模型的剪枝崩溃临界点不同，LLaMA2在45%，Mistral-7B在35%，Qwen在20%，Phi-2在25%。

Mistral和phi的剪枝效果在临界点之前更稳定。Qwen的剪枝效果在临界点之前没那么稳定，需要qlora训练修复。

查看全文

http://www.mrgr.cn/news/20882.html