LD-Pruner、EdgeFusion(On-Device T2I)、FreeDiff、TextCenGen、MemLLM

news/2024/5/3 17:50:48

本文首发于公众号:机器感知

https://mp.weixin.qq.com/s/KiyNfwYWU-wBiCO-hE9qkA

图片

The devil is in the object boundary: towards annotation-free instance  segmentation using Foundation Models

图片

Foundation models, pre-trained on a large amount of data have demonstrated impressive zero-shot capabilities in various downstream tasks. However, in object detection and instance segmentation, two fundamental computer vision tasks heavily reliant on extensive human annotations, foundation models such as SAM and DINO struggle to achieve satisfactory performance. In this study, we reveal that the devil is in the object boundary, \textit{i.e.}, these foundation models fail to discern boundaries between individual objects. For the first time, we probe that CLIP, which has never accessed any instance-level annotations, can provide a highly beneficial and strong instance-level boundary prior in the clustering results of its particular intermediate layer. Following this surprising observation, we propose $\textbf{Zip}$ which $\textbf{Z}$ips up CL$\textbf{ip}$ and SAM in a novel classification-first-then-discovery pipeline, enabling annotation-free, complex-scene-capable, open-vocab......

LD-Pruner: Efficient Pruning of Latent Diffusion Models using  Task-Agnostic Insights

图片

Latent Diffusion Models (LDMs) have emerged as powerful generative models, known for delivering remarkable results under constrained computational resources. However, deploying LDMs on resource-limited devices remains a complex issue, presenting challenges such as memory consumption and inference speed. To address this issue, we introduce LD-Pruner, a novel performance-preserving structured pruning method for compressing LDMs. Traditional pruning methods for deep neural networks are not tailored to the unique characteristics of LDMs, such as the high computational cost of training and the absence of a fast, straightforward and task-agnostic method for evaluating model performance. Our method tackles these challenges by leveraging the latent space during the pruning process, enabling us to effectively quantify the impact of pruning on model performance, independently of the task at hand. This targeted pruning of components with minimal impact on the output allows for faster co......

EdgeFusion: On-Device Text-to-Image Generation

The intensive computational burden of Stable Diffusion (SD) for text-to-image generation poses a significant hurdle for its practical application. To tackle this challenge, recent research focuses on methods to reduce sampling steps, such as Latent Consistency Model (LCM), and on employing architectural optimizations, including pruning and knowledge distillation. Diverging from existing approaches, we uniquely start with a compact SD variant, BK-SDM. We observe that directly applying LCM to BK-SDM with commonly used crawled datasets yields unsatisfactory results. It leads us to develop two strategies: (1) leveraging high-quality image-text pairs from leading generative models and (2) designing an advanced distillation process tailored for LCM. Through our thorough exploration of quantization, profiling, and on-device deployment, we achieve rapid generation of photo-realistic, text-aligned images in just two steps, with latency under one second on resource-limited edge devices......

SKIP: Skill-Localized Prompt Tuning for Inference Speed Boost-Up

图片

Prompt-tuning methods have shown comparable performance as parameter-efficient fine-tuning (PEFT) methods in various natural language understanding tasks. However, existing prompt tuning methods still utilize the entire model architecture; thus, they fail to accelerate inference speed in the application. In this paper, we propose a novel approach called SKIll-localized Prompt tuning (SKIP), which is extremely efficient in inference time. Our method significantly enhances inference efficiency by investigating and utilizing a skill-localized subnetwork in a language model. Surprisingly, our method improves the inference speed up to 160% while pruning 52% of the parameters. Furthermore, we demonstrate that our method is applicable across various transformer-based architectures, thereby confirming its practicality and scalability. ......

TriForce: Lossless Acceleration of Long Sequence Generation with  Hierarchical Speculative Decoding

图片

With large language models (LLMs) widely deployed in long content generation recently, there has emerged an increasing demand for efficient long-sequence inference support. However, key-value (KV) cache, which is stored to avoid re-computation, has emerged as a critical bottleneck by growing linearly in size with the sequence length. Due to the auto-regressive nature of LLMs, the entire KV cache will be loaded for every generated token, resulting in low utilization of computational cores and high latency. While various compression methods for KV cache have been proposed to alleviate this issue, they suffer from degradation in generation quality. We introduce TriForce, a hierarchical speculative decoding system that is scalable to long sequence generation. This approach leverages the original model weights and dynamic sparse KV cache via retrieval as a draft model, which serves as an intermediate layer in the hierarchy and is further speculated by a smaller model to reduce its......

FreeDiff: Progressive Frequency Truncation for Image Editing with  Diffusion Models

图片

Precise image editing with text-to-image models has attracted increasing interest due to their remarkable generative capabilities and user-friendly nature. However, such attempts face the pivotal challenge of misalignment between the intended precise editing target regions and the broader area impacted by the guidance in practice. Despite excellent methods leveraging attention mechanisms that have been developed to refine the editing guidance, these approaches necessitate modifications through complex network architecture and are limited to specific editing tasks. In this work, we re-examine the diffusion process and misalignment problem from a frequency perspective, revealing that, due to the power law of natural images and the decaying noise schedule, the denoising network primarily recovers low-frequency image components during the earlier timesteps and thus brings excessive low-frequency signals for editing. Leveraging this insight, we introduce a novel fine-tuning free a......

TextCenGen: Attention-Guided Text-Centric Background Adaptation for  Text-to-Image Generation

图片

Recent advancements in Text-to-image (T2I) generation have witnessed a shift from adapting text to fixed backgrounds to creating images around text. Traditional approaches are often limited to generate layouts within static images for effective text placement. Our proposed approach, TextCenGen, introduces a dynamic adaptation of the blank region for text-friendly image generation, emphasizing text-centric design and visual harmony generation. Our method employs force-directed attention guidance in T2I models to generate images that strategically reserve whitespace for pre-defined text areas, even for text or icons at the golden ratio. Observing how cross-attention maps affect object placement, we detect and repel conflicting objects using a force-directed graph approach, combined with a Spatial Excluding Cross-Attention Constraint for smooth attention in whitespace areas. As a novel task in graphic design, experiments indicate that TextCenGen outperforms existing methods with......

Visual Prompting for Generalized Few-shot Segmentation: A Multi-scale  Approach

图片

The emergence of attention-based transformer models has led to their extensive use in various tasks, due to their superior generalization and transfer properties. Recent research has demonstrated that such models, when prompted appropriately, are excellent for few-shot inference. However, such techniques are under-explored for dense prediction tasks like semantic segmentation. In this work, we examine the effectiveness of prompting a transformer-decoder with learned visual prompts for the generalized few-shot segmentation (GFSS) task. Our goal is to achieve strong performance not only on novel categories with limited examples, but also to retain performance on base categories. We propose an approach to learn visual prompts with limited examples. These learned visual prompts are used to prompt a multiscale transformer decoder to facilitate accurate dense predictions. Additionally, we introduce a unidirectional causal attention mechanism between the novel prompts, learned with ......

MemLLM: Finetuning LLMs to Use An Explicit Read-Write Memory

图片

While current large language models (LLMs) demonstrate some capabilities in knowledge-intensive tasks, they are limited by relying on their parameters as an implicit storage mechanism. As a result, they struggle with infrequent knowledge and temporal degradation. In addition, the uninterpretable nature of parametric memorization makes it challenging to understand and prevent hallucination. Parametric memory pools and model editing are only partial solutions. Retrieval Augmented Generation (RAG) $\unicode{x2013}$ though non-parametric $\unicode{x2013}$ has its own limitations: it lacks structure, complicates interpretability and makes it hard to effectively manage stored knowledge. In this paper, we introduce MemLLM, a novel method of enhancing LLMs by integrating a structured and explicit read-and-write memory module. MemLLM tackles the aforementioned challenges by enabling dynamic interaction with the memory and improving the LLM's capabilities in using stored knowledge. Our......

SNP: Structured Neuron-level Pruning to Preserve Attention Scores

图片

Multi-head self-attention (MSA) is a key component of Vision Transformers (ViTs), which have achieved great success in various vision tasks. However, their high computational cost and memory footprint hinder their deployment on resource-constrained devices. Conventional pruning approaches can only compress and accelerate the MSA module using head pruning, although the head is not an atomic unit. To address this issue, we propose a novel graph-aware neuron-level pruning method, Structured Neuron-level Pruning (SNP). SNP prunes neurons with less informative attention scores and eliminates redundancy among heads. Specifically, it prunes graphically connected query and key layers having the least informative attention scores while preserving the overall attention scores. Value layers, which can be pruned independently, are pruned to eliminate inter-head redundancy. Our proposed method effectively compresses and accelerates Transformer-based models for both edge devices and server......


http://www.mrgr.cn/p/32218635

相关文章

IIS 执行此操作时出错。 详细信息:web.config 错误,.net core项目

一、IIS 执行此操作时出错。 详细信息:web.config 错误,.net core项目运行报错 错误信息提示的很明确:IIS Web Core模块问题 二、解析: IIS下报错,但是直接启动exe文件可以正常运行。三、解决方案 先安装IIS,然后安装 Asp.net Core 运行时。更多: IIS10 隐藏 http server…

特性描述01、Segment Routing MPLS介绍

Segment Routing MPLS介绍 定义 段路由SR(Segment Routing)是基于源路由理念而设计的在网络上转发数据包的一种协议。Segment Routing MPLS是指基于MPLS转发平面的Segment Routing,下文简称为Segment Routing。Segment Routing将网络路径分成一个个段,并且为这些段和网络中…

C# 窗体应用程序 Chart控件显示实时曲线

IDE: VS2019 项目模板:C# windows 窗体应用(.NET Framework) 【参考】 B站上教程C#Chart控件画折线图的使用,关于Chart控件的属性,介绍得非常详细。B站上教程C#上位机Chart控件实时曲线终极讲解,对鼠标滚轮事件等,多…

用html画一个睡觉的熊动画

<!DOCTYPE html> <html lang"en"><head><meta charset"UTF-8"><title>睡觉的熊动画</title><link rel"stylesheet" href"./style.css"> </head><body><div id"contain…

【重回王座】ChatGPT发布最新模型gpt-4-turbo-2024-04-09

今天&#xff0c;新版GPT-4 Turbo再次在大型模型排行榜上荣登榜首&#xff0c;成功超越了此前领先的Claude 3 Opus。另外&#xff0c;新模型在处理长达64k的上下文时&#xff0c;性能竟能够与旧版在处理26k上下文时的表现相当。 目前GPT-4 Turbo仅限于ChatGPT Plus的用户&…

Leetcode - 周赛393

目录 一&#xff0c;3114. 替换字符可以得到的最晚时间 二&#xff0c;3115. 素数的最大距离 三&#xff0c;3116. 单面值组合的第 K 小金额 四&#xff0c; 3117. 划分数组得到最小的值之和 一&#xff0c;3114. 替换字符可以得到的最晚时间 本题是一道模拟题&#xff0c;…

java spring boot 2 开发实战笔记

本案例是java sping boot 2.2.1 第一步搭建环境:安装依赖 由于我们公司项目是1.8 环境不能乱,我现在自己的电脑是1.8环境,所以本次整理的boot 代码 也只能用1.8 boot版本为:2.2.1,新建项目后,在xml 文件中复制上以下代码 xml配置,最精简运行起来的需要配置一个数据库…

【代码随想录】【动态规划】day51:● 309.最佳买卖股票时机含冷冻期 ● 714.买卖股票的最佳时机含手续费

股票含冷冻期 def maxProfit(self, prices):""":type prices: List[int]:rtype: int"""if len(prices)<1:return 0else:dp[[0]*2 for _ in range(len(prices))]dp[0][0]-prices[0]dp[0][1]0dp[1][0]max(dp[0][0],dp[0][1]-prices[1])dp[1][1]…

Github 2024-04-16Python开源项目日报 Top10

根据Github Trendings的统计,今日(2024-04-16统计)共有10个项目上榜。根据开发语言中项目的数量,汇总情况如下: 开发语言项目数量Python项目10TypeScript项目1Vue项目1系统设计指南 创建周期:2507 天开发语言:Python协议类型:OtherStar数量:241693 个Fork数量:42010 次…

个人 Scrum Day 4

一、项目总结 昨天已完成的工作:食堂列表界面 今日计划完成工作:用户登录系统 工作中遇到的困难:使用新版微信小程序的api接口和设计登录弹窗。 二、代码签入 https://github.com/gdut-canteen/gdut-canteen/commit/79cb13c09bc60343e20561a989d3eaaf8dc7075f 三、运行部分截…

aaPanel面板和宝塔(BT)面板安装及命令

aaPanel面板和宝塔面板都是同一家公司在运营,只是aaPanel面板主要服务于海外客户,宝塔面板服务于本地客户。通常如果使用的是海外的服务器部署web环境,建议使用aaPanel面板。宝塔面板是一款基于 Web 的管理服务器的面板软件,它可以帮助用户方便地管理服务器的各种功能。面板…

租用境外服务器,越南服务器的优势有哪些

自从中国加入世界贸易组织之后&#xff0c;国内经济增加速度非常快&#xff0c;同时越来越多的人选择去东南亚国家发展&#xff0c;因为当地的中国人很多&#xff0c;所以中国企业在当地面临着更小的文化差异。东南亚地区也是最新的经济体&#xff0c;互联网正处于蓬勃发展的阶…

数据结构:线性表————单链表专题

&#x1f308;个人主页&#xff1a;小新_- &#x1f388;个人座右铭&#xff1a;“成功者不是从不失败的人&#xff0c;而是从不放弃的人&#xff01;”&#x1f388; &#x1f381;欢迎各位→点赞&#x1f44d; 收藏⭐️ 留言&#x1f4dd; &#x1f3c6;所属专栏&#xff1…

vue和react通用后台管理系统权限控制方案

1. 介绍 在任何企业级应用中&#xff0c;尤其是后台管理系统&#xff0c;权限控制是一个至关重要的环节。它确保了系统资源的安全性&#xff0c;防止非法访问和操作&#xff0c;保障业务流程的正常进行。本文件将详细解析后台管理系统中的权限控制机制及其实施策略。 那么权限…

使用Docker部署开源建站工具—Halo,并实现个人博客公网访问

目录 推荐 前言 1. Docker部署Halo 1.1 检查Docker版本 如果未安装Docker可参考&#xff1a; 已安装Docker步骤&#xff1a; 1.2 在Docker中部署Halo 2. Linux安装Cpolar 2.1 打开服务器防火墙 2.2 安装cpolar内网穿透 3. 配置Halo个人博客公网地址 4. 固定Halo公网…

如何使用XSSFWorkbook读取文本薄?

本篇文章暂不对XSSFWorkbook类进行解析和说明,仅附有实现代码。 如果文中阐述不全或不对的,多多交流。【版权声明】未经博主同意,谢绝转载!(请尊重原创,博主保留追究权) https://www.cnblogs.com/cnb-yuchen/p/18146625 出自【进步*于辰的博客】1、文件兼容类型。 // 兼…

GAMES101 作业7 踩坑指南

首先回顾路径追踪的原理,如下图基本思想 wo是射向眼镜(相机)的光线,包含来自光源的直接光照ws,来自其他物体的间接光照wi两部分。 在实现path tracing时,我们考虑的是黄色线的方向,即光线从相机射向p点(实际上是从p点射向相机),然后通过多次随机采样从p点射出(实际上…