<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
  <channel>
    <title>Thus Spoke Zachary</title>
    <link>https://gongshangzheng.github.io</link>
    <description>热爱太阳</description>
    <language>zh-CN</language>
    <managingEditor>Xinyu ZHENG</managingEditor>
    <webMaster>Xinyu ZHENG</webMaster>
    <lastBuildDate>Sat, 20 Jun 2026 06:04:35 GMT</lastBuildDate>
    <generator>gongshangzheng.github.io</generator>
    <atom:link href="https://gongshangzheng.github.io/feed.xml" rel="self" type="application/rss+xml"/>
  <item>
    <title>视觉模型论文精读（一）：Continuous-tone Simple Points，连续值 Simple Point 与变分拓扑保持</title>
    <link>https://gongshangzheng.github.io/continuous-tone-simple-points-2026.html</link>
    <guid isPermaLink="true">https://gongshangzheng.github.io/continuous-tone-simple-points-2026.html</guid>
    <description>arXiv:2604.28159，2026 年 4 月。把 simple point 从二值图推广到连续值图，把拓扑保证从损失层提升到变分 inference 层。</description>
    <pubDate>Sat, 20 Jun 2026 13:15:54 GMT</pubDate>
    <author>Xinyu ZHENG</author>
    <category>Simple Point</category>
    <category>数字拓扑</category>
    <category>拓扑感知分割</category>
    <category>变分推理</category>
    <category>SAM2</category>
  </item>
  <item>
    <title>红外图像压缩论文精读（八）：SPIRE，单点监督引导的编码器红外小目标检测</title>
    <link>https://gongshangzheng.github.io/spire-irstd-2026.html</link>
    <guid isPermaLink="true">https://gongshangzheng.github.io/spire-irstd-2026.html</guid>
    <description>深度解读 SPIRE 如何将红外小目标检测从像素级分割重定义为质心概率回归，用 PRPS 物理先验监督 + encoder-only 架构在仅 0.29M 参数下达到 competitive 检测性能和极低虚警率</description>
    <pubDate>Sat, 20 Jun 2026 00:32:42 GMT</pubDate>
    <author>Xinyu ZHENG</author>
    <category>红外</category>
    <category>目标检测</category>
    <category>单点监督</category>
    <category>encoder-only</category>
    <category>IRSTD</category>
  </item>
  <item>
    <title>红外图像压缩论文精读（七）：RPCASSM，鲁棒 PCA 状态空间模型的红外小目标检测</title>
    <link>https://gongshangzheng.github.io/rpcassm-2026.html</link>
    <guid isPermaLink="true">https://gongshangzheng.github.io/rpcassm-2026.html</guid>
    <description>深度解读 RPCASSM 如何将鲁棒 PCA 的低秩-稀疏分解范式与双分支状态空间模型融合，在仅 0.45M 参数下实现红外小目标检测 SOTA</description>
    <pubDate>Thu, 18 Jun 2026 13:49:10 GMT</pubDate>
    <author>Xinyu ZHENG</author>
    <category>红外</category>
    <category>目标检测</category>
    <category>RPCA</category>
    <category>状态空间模型</category>
    <category>SSM</category>
  </item>
  <item>
    <title>红外图像压缩论文精读（六）：LoHGNet，洛伦兹几何编码与高阶关系学习</title>
    <link>https://gongshangzheng.github.io/lohgnet-2026.html</link>
    <guid isPermaLink="true">https://gongshangzheng.github.io/lohgnet-2026.html</guid>
    <description>深度解读 arXiv 2605.07213 LoHGNet：将洛伦兹双曲空间引入红外小目标检测，用 GA-LRCM 做层次几何编码、用 HORL 超图做高阶关系传播，三数据集 IoU 全面 SOTA。</description>
    <pubDate>Thu, 18 Jun 2026 13:43:01 GMT</pubDate>
    <author>Xinyu ZHENG</author>
    <category>红外图像</category>
    <category>图像压缩</category>
    <category>IRSTD</category>
    <category>洛伦兹空间</category>
    <category>超图神经网络</category>
    <category>小目标检测</category>
  </item>
  <item>
    <title>数字人论文精读（二十九）：MEAD，大规模情绪音视频数据集如何推动 Talking Face 从动嘴到传情</title>
    <link>https://gongshangzheng.github.io/mead-2020.html</link>
    <guid isPermaLink="true">https://gongshangzheng.github.io/mead-2020.html</guid>
    <description>深入解读 ECCV 2020 论文 MEAD——60 名演员、8 种情绪、3 级强度、7 视角、28 万条片段的大规模受控情绪音视频数据集，以及其情绪可控 talking face 生成 baseline 的三模块架构、八项联合损失和实验评测。</description>
    <pubDate>Thu, 18 Jun 2026 11:38:14 GMT</pubDate>
    <author>Xinyu ZHENG</author>
    <category>MEAD</category>
    <category>Talking Face</category>
    <category>情绪生成</category>
    <category>数据集</category>
    <category>ECCV 2020</category>
    <category>SenseTime</category>
    <category>音视频</category>
    <category>表情控制</category>
    <category>Baseline</category>
    <category>FID</category>
  </item>
  <item>
    <title>数字人论文精读（二十八）：UniLS，首个端到端音频驱动的统一说-听数字人</title>
    <link>https://gongshangzheng.github.io/unils-2024.html</link>
    <guid isPermaLink="true">https://gongshangzheng.github.io/unils-2024.html</guid>
    <description>深度解读 CVPR 2026 论文 UniLS：通过两阶段训练范式（无音频先验学习 + 双轨音频微调）解决 listening stiffness 问题，首次实现纯音频驱动的端到端统一说-听面部动画生成，论文报告听者分布指标提升达 44.1%（F-FID 单项从 13.143 降至 4.304，降幅 67.3%），实时推理达 560.6 FPS。</description>
    <pubDate>Thu, 18 Jun 2026 11:10:14 GMT</pubDate>
    <author>Xinyu ZHENG</author>
    <category>数字人</category>
    <category>UniLS</category>
    <category>talking face</category>
    <category>listening stiffness</category>
    <category>CVPR 2026</category>
  </item>
  <item>
    <title>数字人论文精读（二十七）：DSL-FIQA，双集合退化学习与关键点引导的人脸图像质量评估</title>
    <link>https://gongshangzheng.github.io/dsl-fiqa-2024.html</link>
    <guid isPermaLink="true">https://gongshangzheng.github.io/dsl-fiqa-2024.html</guid>
    <description>深度解读 CVPR 2024 论文 DSL-FIQA：如何用 Dual-Set Degradation Learning 解耦退化与内容、用 Landmark-Guided Transformer 聚焦面部显著区域，以及 CGFIQA-40k 平衡数据集如何解决肤色与性别偏差。</description>
    <pubDate>Thu, 18 Jun 2026 11:09:41 GMT</pubDate>
    <author>Xinyu ZHENG</author>
    <category>FIQA</category>
    <category>人脸质量评估</category>
    <category>FaceIQA</category>
    <category>退化学习</category>
    <category>Transformer</category>
    <category>数字人</category>
  </item>
  <item>
    <title>数字人论文精读（二十六）：ARTalk，实时语音驱动 3D 头部动画的多尺度自回归革命</title>
    <link>https://gongshangzheng.github.io/artalk-2025.html</link>
    <guid isPermaLink="true">https://gongshangzheng.github.io/artalk-2025.html</guid>
    <description>SIGGRAPH Asia 2025 · 东京大学 MI Lab · 首次同时实现 style + pose + real-time 的语音驱动 3D facial animation</description>
    <pubDate>Thu, 18 Jun 2026 09:53:20 GMT</pubDate>
    <author>Xinyu ZHENG</author>
    <category>3D talking head</category>
    <category>autoregressive</category>
    <category>VQ-VAE</category>
    <category>real-time</category>
    <category>FLAME</category>
  </item>
  <item>
    <title>红外图像压缩论文精读（四）：AnyThermal，跨模态蒸馏构建通用热红外特征骨干</title>
    <link>https://gongshangzheng.github.io/paper-anythermal.html</link>
    <guid isPermaLink="true">https://gongshangzheng.github.io/paper-anythermal.html</guid>
    <description>深度解读 arXiv 2602.06203 AnyThermal 论文：通过 CLS-token 对比蒸馏从 DINOv2 构建任务无关的通用热红外编码器，在语义分割、地点识别和深度估计上达到 SOTA，为红外压缩提供感知损失替代和特征编码目标。</description>
    <pubDate>Wed, 17 Jun 2026 21:00:00 GMT</pubDate>
    <author>Xinyu ZHENG</author>
    <category>红外图像</category>
    <category>图像压缩</category>
    <category>基础模型</category>
    <category>知识蒸馏</category>
    <category>DINOv2</category>
    <category>CLS-token</category>
    <category>热红外感知</category>
  </item>
  <item>
    <title>红外图像压缩论文精读（五）：CI-ICM，通道重要性驱动的面向机器图像压缩</title>
    <link>https://gongshangzheng.github.io/paper-ciicm.html</link>
    <guid isPermaLink="true">https://gongshangzheng.github.io/paper-ciicm.html</guid>
    <description>深度解读 arXiv 2604.05347 CI-ICM 论文：发现学习式压缩潜空间中通道重要性的非均匀分布，提出 FCGS 不均匀分组与 CI-CTX 序列熵编码，在 COCO 检测/分割上取得 BD-mAP +16.25%/+13.72%，为红外轮廓关键通道识别提供范式。</description>
    <pubDate>Wed, 17 Jun 2026 21:00:00 GMT</pubDate>
    <author>Xinyu ZHENG</author>
    <category>红外图像</category>
    <category>图像压缩</category>
    <category>ICM</category>
    <category>通道重要性</category>
    <category>FCGS</category>
    <category>熵编码</category>
    <category>任务驱动压缩</category>
  </item>
  <item>
    <title>红外图像压缩论文精读（三）：FreqKD，频率解耦蒸馏揭示红外图像的高低频分歧</title>
    <link>https://gongshangzheng.github.io/paper-freqkd.html</link>
    <guid isPermaLink="true">https://gongshangzheng.github.io/paper-freqkd.html</guid>
    <description>深度解读 arXiv 2606.11572 FreqKD 论文：通过频率解耦知识蒸馏揭示 RGB 与红外图像在高频带 2.42 倍的散度差异，提出非对称码率分配原则，为红外轮廓压缩提供频域分解的理论基础。</description>
    <pubDate>Wed, 17 Jun 2026 21:00:00 GMT</pubDate>
    <author>Xinyu ZHENG</author>
    <category>红外图像</category>
    <category>图像压缩</category>
    <category>知识蒸馏</category>
    <category>频域分解</category>
    <category>非对称码率</category>
    <category>DINOv2</category>
    <category>LoRA</category>
  </item>
  <item>
    <title>红外图像压缩系列（二）：学习式压缩、多模态联合压缩与任务驱动评价</title>
    <link>https://gongshangzheng.github.io/infrared-learned-compression.html</link>
    <guid isPermaLink="true">https://gongshangzheng.github.io/infrared-learned-compression.html</guid>
    <description>红外图像压缩系列第二篇：学习式压缩在红外场景的局限与改造、RGB-IR 联合压缩、任务驱动压缩、分层评价体系与小目标感知压缩的研究路线。</description>
    <pubDate>Wed, 17 Jun 2026 20:30:00 GMT</pubDate>
    <author>Xinyu ZHENG</author>
    <category>红外图像</category>
    <category>图像压缩</category>
    <category>学习式压缩</category>
    <category>多模态</category>
    <category>任务驱动</category>
    <category>热成像</category>
  </item>
  <item>
    <title>红外图像压缩系列总览：从热辐射成像到轮廓编码的完整研究路线</title>
    <link>https://gongshangzheng.github.io/infrared-compression-hub.html</link>
    <guid isPermaLink="true">https://gongshangzheng.github.io/infrared-compression-hub.html</guid>
    <description>红外图像压缩专题系列导读。覆盖成像原理、传统编码、学习式压缩、多模态联合压缩、边缘/轮廓压缩和 CV 前沿方法，含论文精读系列。</description>
    <pubDate>Wed, 17 Jun 2026 20:00:00 GMT</pubDate>
    <author>Xinyu ZHENG</author>
    <category>红外图像</category>
    <category>图像压缩</category>
    <category>热成像</category>
    <category>学习式压缩</category>
    <category>轮廓编码</category>
    <category>survey</category>
  </item>
  <item>
    <title>红外图像压缩系列（四）：CV 前沿方法借鉴 — 基础模型、频域分解、稀疏表征与扩散复原</title>
    <link>https://gongshangzheng.github.io/ir-contour-cv-frontiers.html</link>
    <guid isPermaLink="true">https://gongshangzheng.github.io/ir-contour-cv-frontiers.html</guid>
    <description>从 15 篇 CV 前沿论文中提炼五个新方向——红外感知基础模型、频域分解、稀疏结构表征、压缩后扩散复原、任务驱动语义压缩——为红外轮廓图像压缩注入新思路。</description>
    <pubDate>Wed, 17 Jun 2026 18:30:00 GMT</pubDate>
    <author>Xinyu ZHENG</author>
    <category>红外图像</category>
    <category>图像压缩</category>
    <category>基础模型</category>
    <category>频域分解</category>
    <category>稀疏表征</category>
    <category>扩散模型</category>
    <category>任务驱动压缩</category>
    <category>红外轮廓</category>
  </item>
  <item>
    <title>数字人论文精读（二十五）：Flow-Guided One-Shot Talking Face，用稠密光流替代稀疏关键点的突破</title>
    <link>https://gongshangzheng.github.io/flow-guided-talking-2021.html</link>
    <guid isPermaLink="true">https://gongshangzheng.github.io/flow-guided-talking-2021.html</guid>
    <description>深度解读 CVPR 2021 论文：通过 3DMM 参数化生成 dense flow，配合 flow-guided video generator 首次实现 512×512 高清 one-shot talking face generation，并开源 HDTF 数据集成为领域标准。</description>
    <pubDate>Wed, 17 Jun 2026 17:57:14 GMT</pubDate>
    <author>Xinyu ZHENG</author>
    <category>talking face</category>
    <category>flow-guided</category>
    <category>3DMM</category>
    <category>HDTF</category>
    <category>CVPR 2021</category>
  </item>
  <item>
    <title>VFHQ 深度解读：高质量视频人脸超分辨率的数据基础</title>
    <link>https://gongshangzheng.github.io/vfhq-2022.html</link>
    <guid isPermaLink="true">https://gongshangzheng.github.io/vfhq-2022.html</guid>
    <description>CVPR 2022 · 中科院深圳先进院 + 腾讯 ARC Lab · 首个高质量视频人脸数据集 VFHQ 及其 5 阶段自动化构建流水线</description>
    <pubDate>Wed, 17 Jun 2026 17:52:45 GMT</pubDate>
    <author>Xinyu ZHENG</author>
    <category>超分辨率</category>
    <category>人脸复原</category>
    <category>视频处理</category>
    <category>数据集</category>
    <category>GAN</category>
  </item>
  <item>
    <title>数字人工程解读（九）：HDTF 源码：流引导的单样本高分辨率说话人脸生成</title>
    <link>https://gongshangzheng.github.io/hdtf-source-code-analysis.html</link>
    <guid isPermaLink="true">https://gongshangzheng.github.io/hdtf-source-code-analysis.html</guid>
    <description>深度解析 MRzzm/HDTF 仓库：CVPR 2021 的流引导单样本说话人脸生成系统，涵盖 F_app 近似稠密光流构建、特征级扭曲 + Matting 混合策略、前景抠图修正、VideoGenerator 完整调用链与 HDTF 高分辨率数据集。</description>
    <pubDate>Wed, 17 Jun 2026 15:00:00 GMT</pubDate>
    <author>Xinyu ZHENG</author>
    <category>HDTF</category>
    <category>talking-face</category>
    <category>one-shot</category>
    <category>dense-flow</category>
    <category>CVPR-2021</category>
    <category>源码解读</category>
    <category>digital-human</category>
    <category>foreground-matting</category>
  </item>
  <item>
    <title>每日 arXiv 论文简报 · 2026-06-17</title>
    <link>https://gongshangzheng.github.io/arxiv-digest-2026-06-17.html</link>
    <guid isPermaLink="true">https://gongshangzheng.github.io/arxiv-digest-2026-06-17.html</guid>
    <description>自动追踪 diffusion、autoregressive、image compression、1D visual tokenizer 与 diffusion visual encoder 方向的 arXiv 每日论文。</description>
    <pubDate>Wed, 17 Jun 2026 10:00:00 GMT</pubDate>
    <author>Xinyu ZHENG</author>
    <category>arXiv</category>
    <category>论文</category>
    <category>AI</category>
    <category>视觉编码器</category>
  </item>
  <item>
    <title>红外图像压缩论文精读（一）：Huf-RLC，用零游程修补 Huffman 在红外线扫图像压缩中的短板</title>
    <link>https://gongshangzheng.github.io/huf-rlc-2025.html</link>
    <guid isPermaLink="true">https://gongshangzheng.github.io/huf-rlc-2025.html</guid>
    <description>深度重写版：从 DWT、DUSQ、DPCM、小波系数概率模型到 Run-Length-Enhanced Huffman Coding，逐公式、逐模块拆解 Huf-RLC 如何让红外压缩在接近 JPEG2000 质量的同时快 3× 以上。</description>
    <pubDate>Tue, 16 Jun 2026 22:09:58 GMT</pubDate>
    <author>Xinyu ZHENG</author>
    <category>红外图像</category>
    <category>图像压缩</category>
    <category>DWT</category>
    <category>Huffman</category>
    <category>JPEG2000</category>
    <category>游程编码</category>
  </item>
  <item>
    <title>红外图像压缩系列（一）：成像原理、数据集与传统编码基线</title>
    <link>https://gongshangzheng.github.io/infrared-image-compression-survey.html</link>
    <guid isPermaLink="true">https://gongshangzheng.github.io/infrared-image-compression-survey.html</guid>
    <description>红外图像压缩系列第一篇：从热辐射成像物理特性、典型数据集（FLIR/KAIST/LLVIP）到传统编码器（JPEG/JPEG-LS/JPEG2000/HEVC）与小波域红外统计。</description>
    <pubDate>Tue, 16 Jun 2026 21:49:23 GMT</pubDate>
    <author>Xinyu ZHENG</author>
    <category>红外图像</category>
    <category>图像压缩</category>
    <category>热成像</category>
    <category>传统编码</category>
    <category>小波</category>
    <category>JPEG2000</category>
  </item>
  <item>
    <title>数字人工程解读（四）：实时数字人模型推理 Benchmark 汇总</title>
    <link>https://gongshangzheng.github.io/digital-human-engineering-benchmark.html</link>
    <guid isPermaLink="true">https://gongshangzheng.github.io/digital-human-engineering-benchmark.html</guid>
    <description>汇总 CyberVerse 与 OpenAvatarChat 工程中实测的实时数字人模型推理数据：模型名称、效果视频、推理 FPS、显存占用与 nvidia-smi 显存消耗百分比。持续更新。</description>
    <pubDate>Tue, 16 Jun 2026 16:26:19 GMT</pubDate>
    <author>Xinyu ZHENG</author>
    <category>数字人</category>
    <category>Benchmark</category>
    <category>推理速度</category>
    <category>显存</category>
    <category>LiteAvatar</category>
    <category>FlashHead</category>
    <category>CyberVerse</category>
    <category>OpenAvatarChat</category>
  </item>
  <item>
    <title>数字人工程解读（三）：OpenAvatarChat 源码：LiteAvatar 实时数字人的模块化设计与 WebRTC 部署</title>
    <link>https://gongshangzheng.github.io/open-avatar-chat-liteavatar.html</link>
    <guid isPermaLink="true">https://gongshangzheng.github.io/open-avatar-chat-liteavatar.html</guid>
    <description>深度解读 OpenAvatarChat 仓库：LiteAvatar 实时数字人的 Handler Pipeline 架构、RTC 流媒体数据通路、coturn TURN 服务部署与端口（3478/5349/49152+）配置详解。</description>
    <pubDate>Tue, 16 Jun 2026 14:56:00 GMT</pubDate>
    <author>Xinyu ZHENG</author>
    <category>数字人</category>
    <category>OpenAvatarChat</category>
    <category>LiteAvatar</category>
    <category>WebRTC</category>
    <category>TURN</category>
    <category>coturn</category>
    <category>fastrtc</category>
    <category>数字人部署</category>
    <category>端口配置</category>
    <category>Handler Pipeline</category>
  </item>
  <item>
    <title>每日 arXiv 论文简报 · 2026-06-16</title>
    <link>https://gongshangzheng.github.io/arxiv-digest-2026-06-16.html</link>
    <guid isPermaLink="true">https://gongshangzheng.github.io/arxiv-digest-2026-06-16.html</guid>
    <description>自动追踪 diffusion、autoregressive、image compression、1D visual tokenizer 与 diffusion visual encoder 方向的 arXiv 每日论文。</description>
    <pubDate>Tue, 16 Jun 2026 10:00:00 GMT</pubDate>
    <author>Xinyu ZHENG</author>
    <category>arXiv</category>
    <category>论文</category>
    <category>AI</category>
    <category>视觉编码器</category>
  </item>
  <item>
    <title>红外图像压缩论文精读（二）：SA-ICM，用 SAM 边缘信息训练给机器看的图像压缩器</title>
    <link>https://gongshangzheng.github.io/sa-icm-2024.html</link>
    <guid isPermaLink="true">https://gongshangzheng.github.io/sa-icm-2024.html</guid>
    <description>深读 ICIP 2024 论文 Image Coding for Machines with Edge Information Learning Using Segment Anything：从 ICM 三路线、SAM 边缘 mask、SA-ICM/SA-NeRV 损失函数，到 COCO、VisDrone、Cityscapes 与 SFU-HW-Objects 实验。</description>
    <pubDate>Mon, 15 Jun 2026 22:53:14 GMT</pubDate>
    <author>Xinyu ZHENG</author>
    <category>SA-ICM</category>
    <category>Image Coding for Machines</category>
    <category>边缘图像压缩</category>
    <category>Segment Anything</category>
    <category>NeRV</category>
  </item>
  <item>
    <title>数字人论文精读（二十四）：UIKA，任意数量 Pose-Free 图像的前馈式通用头部头像</title>
    <link>https://gongshangzheng.github.io/uika-2026.html</link>
    <guid isPermaLink="true">https://gongshangzheng.github.io/uika-2026.html</guid>
    <description>深度解读 CVPR 2026 Highlight 论文 UIKA：如何从任意数量、无相机标定的图像中，通过 UV 引导的显式对应建模与双流注意力，一次前馈生成可实时驱动的 3D Gaussian 头部头像。</description>
    <pubDate>Mon, 15 Jun 2026 16:11:09 GMT</pubDate>
    <author>Xinyu ZHENG</author>
    <category>数字人</category>
    <category>UIKA</category>
    <category>论文精读</category>
    <category>3D Gaussian Splatting</category>
    <category>Head Avatar</category>
  </item>
  </channel>
</rss>