<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>LiDAR | The personal website of Shuai Zhang</title><link>https://www.shuaizhang-hkust.cn/tags/lidar/</link><atom:link href="https://www.shuaizhang-hkust.cn/tags/lidar/index.xml" rel="self" type="application/rss+xml"/><description>LiDAR</description><generator>HugoBlox Kit (https://hugoblox.com)</generator><language>en-us</language><lastBuildDate>Fri, 08 May 2026 00:00:00 +0000</lastBuildDate><image><url>https://www.shuaizhang-hkust.cn/media/icon_hu_1e527f1804e84236.png</url><title>LiDAR</title><link>https://www.shuaizhang-hkust.cn/tags/lidar/</link></image><item><title>UniD-Shift: Towards Unified Semantic Segmentation via Interpretable Share-Private Multimodal Decomposition</title><link>https://www.shuaizhang-hkust.cn/publications/preprint/unid-shift/</link><pubDate>Fri, 08 May 2026 00:00:00 +0000</pubDate><guid>https://www.shuaizhang-hkust.cn/publications/preprint/unid-shift/</guid><description>&lt;h2 id="highlights"&gt;Highlights&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Unified Multimodal Framework&lt;/strong&gt;: Joint 2D-3D semantic segmentation achieving high single-domain accuracy and strong cross-domain generalization through structured feature interaction.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Shared-Private Decomposition&lt;/strong&gt;: Explicit disentanglement of modality-invariant and modality-specific representations, improving semantic alignment and interpretability.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;SAM + SPTNet Dual-Branch Design&lt;/strong&gt;: Integrates SAM-based vision encoder with sparse convolution-transformer backbone to combine semantic richness and geometric precision.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;SOTA Performance&lt;/strong&gt;: 81.0% mIoU on nuScenes validation, 81.2% on nuScenes test, and 71.8% on SemanticKITTI, outperforming 2DPASS, CSFNet, and other multimodal fusion baselines.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Cross-Domain Robustness&lt;/strong&gt;: 74.5% mIoU on nuScenes USA→Singapore cross-domain benchmark, demonstrating strong generalization under distribution shifts.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Interpretable Fusion&lt;/strong&gt;: Attention-based Shared Attention Fusion (SAF) with Gram alignment and decorrelation regularization ensures stable optimization and meaningful feature separation.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="method-overview"&gt;Method Overview&lt;/h2&gt;
&lt;p&gt;The proposed UniD-Shift framework adopts a dual-branch architecture:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;3D Branch&lt;/strong&gt;: SPTNet backbone (sparse convolution + transformer) extracts hierarchical geometric features from LiDAR point clouds.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;2D Branch&lt;/strong&gt;: SAM-based vision encoder provides semantically rich visual representations from RGB images.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Shared-Private Decomposition&lt;/strong&gt;: Features from both modalities are decomposed into shared (modality-invariant semantics) and private (modality-specific) components.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Shared Attention Fusion (SAF)&lt;/strong&gt;: The shared components are fused via cross-attention (3D→2D query) to produce a consistent multimodal representation.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Regularized Training&lt;/strong&gt;: Gram matrix alignment encourages shared consistency, while decorrelation loss promotes subspace independence.&lt;/li&gt;
&lt;/ol&gt;
&lt;h2 id="key-results"&gt;Key Results&lt;/h2&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Benchmark&lt;/th&gt;
&lt;th&gt;Metric&lt;/th&gt;
&lt;th&gt;UniD-Shift&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;nuScenes Validation&lt;/td&gt;
&lt;td&gt;mIoU (%)&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;81.0&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;nuScenes Test&lt;/td&gt;
&lt;td&gt;mIoU (%)&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;81.2&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;SemanticKITTI Validation&lt;/td&gt;
&lt;td&gt;mIoU (%)&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;71.8&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;nuScenes USA→Singapore&lt;/td&gt;
&lt;td&gt;mIoU (%)&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;74.5&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;The framework achieves competitive computational efficiency with &lt;strong&gt;240ms&lt;/strong&gt; inference latency on SemanticKITTI, demonstrating a practical balance between accuracy and speed.&lt;/p&gt;
&lt;h2 id="citation"&gt;Citation&lt;/h2&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-bibtex" data-lang="bibtex"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="nc"&gt;@misc&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;zhang2026unidshift&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="na"&gt;title&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;{UniD-Shift: Towards Unified Semantic Segmentation via Interpretable Share-Private Multimodal Decomposition}&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="na"&gt;author&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;{Shuai Zhang and Zhecheng Shi and Zhuxiao Li and Jing Ou and Tengxi Wang and Yuan Liu and Wufan Zhao}&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="na"&gt;year&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;{2026}&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="na"&gt;eprint&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;{2605.07356}&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="na"&gt;archivePrefix&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;{arXiv}&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="na"&gt;primaryClass&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;{cs.CV}&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="na"&gt;url&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;{https://arxiv.org/abs/2605.07356}&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;</description></item></channel></rss>