人工智能 | ShowMeAI资讯日报 #2022.06.22
ShowMeAI日?qǐng)?bào)系列全新升級(jí)!覆蓋AI人工智能 工具&框架 | 項(xiàng)目&代碼 | 博文&分享 | 數(shù)據(jù)&資源 | 研究&論文 等方向。點(diǎn)擊查看 歷史文章列表,在公眾號(hào)內(nèi)訂閱話題 #ShowMeAI資訊日?qǐng)?bào),可接收每日最新推送。點(diǎn)擊 專題合輯&電子月刊 快速瀏覽各專題全集。點(diǎn)擊 這里 回復(fù)關(guān)鍵字 日?qǐng)?bào) 免費(fèi)獲取AI電子月刊與資料包。
1.工具&框架
工具:Unclutter - Immersive Reading Mode,排除干擾信息專注閱讀的瀏覽器插件
‘Unclutter - Immersive Reading Mode - A reader mode browser extension to remove distractions from web articles.’ by lindylearn
GitHub: https://github.com/lindylearn/unclutter
工具庫(kù):scikit-opt - 一個(gè)純Python群體智能算法庫(kù)
包含很多算法(差分進(jìn)化算法、遺傳算法、粒子群算法、模擬退火算法、蟻群算法、魚群算法、免疫優(yōu)化算法),特點(diǎn)是輕量、易部署,支持GPU運(yùn)算。
GitHub: https://github.com/guofei9987/scikit-opt
工具:Hayabusa - 基于sigma的Windows事件日志分析工具
它協(xié)助安全人員快速找到安全威脅。
GitHub: https://github.com/Yamato-Security/hayabusa
工具:Gifsicle - 一個(gè)在瀏覽器里進(jìn)行g(shù)if編輯的工具。
Gifsicle可以對(duì)Gif圖片進(jìn)行壓縮,旋轉(zhuǎn),裁剪等操作
GitHub: https://github.com/renzhezhilu/gifsicle-wasm-browser
工具庫(kù):AREkit - 文檔級(jí)屬性關(guān)系提取工具包
‘AREkit - Document level Attitude and Relation Extraction toolkit (AREkit) for mass-media news and analytical articles’ by Nicolay Rusnachenko
GitHub: https://github.com/nicolay-r/AREkit
2.博文&分享
課程:新加坡國(guó)立大學(xué)《3D計(jì)算機(jī)視覺》
《3D Computer Vision | National University of Singapore - YouTube》
Link: https://www.youtube.com/playlist?list=PLxg0CGqViygP47ERvqHw_v7FVnUovJeaz
博文:Vim 命令、操作、快捷鍵全集
Link: https://weibo.com/ttarticle/p/show?id=2309404335205144998402
3.數(shù)據(jù)&資源
資源列表:深度學(xué)習(xí)3D視覺最新論文列表
‘Trending-in-3D-Vision - An on-going paper list on new trends in 3D vision with deep learning’ by Xiaolong
GitHub: https://github.com/dragonlong/Trending-in-3D-Vision
書籍:《Python Data Science Handbook》Python數(shù)據(jù)科學(xué)
介紹數(shù)據(jù)科學(xué)和應(yīng)用的書籍。內(nèi)容覆蓋:① 數(shù)據(jù)科學(xué)家需要的計(jì)算環(huán)境:IPython和Jupyter ② NumPy工具庫(kù)與科學(xué)計(jì)算 ③ Pandas與數(shù)據(jù)處理 ④ Matplotlib與數(shù)據(jù)可視化 ⑤ Scikit-Learn與機(jī)器學(xué)習(xí)。
英文原版地址: https://jakevdp.github.io/PythonDataScienceHandbook/
非官方中文翻譯地址: https://github.com/wangyingsm/Python-Data-Science-Handbook
4.研究&論文
可以點(diǎn)擊 這里 回復(fù)關(guān)鍵字 日?qǐng)?bào),免費(fèi)獲取整理好的6月論文合輯。
論文:Automatic Prosody Annotation with Pre-Trained Text-Speech Model
論文標(biāo)題:Automatic Prosody Annotation with Pre-Trained Text-Speech Model
論文時(shí)間:16 Jun 2022
所屬領(lǐng)域:語(yǔ)音
對(duì)應(yīng)任務(wù):Speech Synthesis,Text-To-Speech Synthesis,語(yǔ)音合成,文本到語(yǔ)音合成
論文地址:https://arxiv.org/abs/2206.07956
代碼實(shí)現(xiàn):https://github.com/daisyqk/automatic-prosody-annotation
論文作者:Ziqian Dai, Jianwei Yu, Yan Wang, Nuo Chen, Yanyao Bian, Guangzhi Li, Deng Cai, Dong Yu
論文簡(jiǎn)介:Prosodic boundary plays an important role in text-to-speech synthesis (TTS) in terms of naturalness and readability./就自然性和可讀性而言,韻律邊界在文本到語(yǔ)音合成 (TTS) 中起著重要作用。
論文摘要:Prosodic boundary plays an important role in text-to-speech synthesis (TTS) in terms of naturalness and readability. However, the acquisition of prosodic boundary labels relies on manual annotation, which is costly and time-consuming. In this paper, we propose to automatically extract prosodic boundary labels from text-audio data via a neural text-speech model with pre-trained audio encoders. This model is pre-trained on text and speech data separately and jointly fine-tuned on TTS data in a triplet format: {speech, text, prosody}. The experimental results on both automatic evaluation and human evaluation demonstrate that: 1) the proposed text-speech prosody annotation framework significantly outperforms text-only baselines; 2) the quality of automatic prosodic boundary annotations is comparable to human annotations; 3) TTS systems trained with model-annotated boundaries are slightly better than systems that use manual ones.
就自然性和可讀性而言,韻律邊界在文本到語(yǔ)音合成 (TTS) 中起著重要作用。然而,韻律邊界標(biāo)簽的獲取依賴于人工標(biāo)注,成本高且耗時(shí)。在本文中,我們建議通過帶有預(yù)訓(xùn)練音頻編碼器的神經(jīng)文本語(yǔ)音模型從文本音頻數(shù)據(jù)中自動(dòng)提取韻律邊界標(biāo)簽。該模型分別在文本和語(yǔ)音數(shù)據(jù)上進(jìn)行預(yù)訓(xùn)練,并在三元組格式的 TTS 數(shù)據(jù)上聯(lián)合微調(diào):{語(yǔ)音、文本、韻律}。自動(dòng)評(píng)估和人工評(píng)估的實(shí)驗(yàn)結(jié)果表明:1)所提出的文本語(yǔ)音韻律注釋框架顯著優(yōu)于純文本基線; 2)自動(dòng)韻律邊界標(biāo)注的質(zhì)量與人工標(biāo)注相當(dāng); 3) 使用模型標(biāo)注邊界訓(xùn)練的 TTS 系統(tǒng)比使用手動(dòng)邊界的系統(tǒng)稍好。
論文:Level 2 Autonomous Driving on a Single Device: Diving into the Devils of Openpilot
論文標(biāo)題:Level 2 Autonomous Driving on a Single Device: Diving into the Devils of Openpilot
論文時(shí)間:16 Jun 2022
所屬領(lǐng)域:計(jì)算機(jī)視覺
對(duì)應(yīng)任務(wù):無(wú)人駕駛,自動(dòng)駕駛
論文地址:https://arxiv.org/abs/2206.08176
代碼實(shí)現(xiàn):https://github.com/openperceptionx/openpilot-deepdive
論文作者:Li Chen, Tutian Tang, Zhitian Cai, Yang Li, Penghao Wu, Hongyang Li, Jianping Shi, Junchi Yan, Yu Qiao
論文簡(jiǎn)介:Equipped with a wide span of sensors, predominant autonomous driving solutions are becoming more modular-oriented for safe system design./配備廣泛的傳感器,主要的自動(dòng)駕駛解決方案正變得更加模塊化,以實(shí)現(xiàn)安全系統(tǒng)設(shè)計(jì)。
論文摘要:Equipped with a wide span of sensors, predominant autonomous driving solutions are becoming more modular-oriented for safe system design. Though these sensors have laid a solid foundation, most massive-production solutions up to date still fall into L2 phase. Among these, Comma.ai comes to our sight, claiming one $999 aftermarket device mounted with a single camera and board inside owns the ability to handle L2 scenarios. Together with open-sourced software of the entire system released by Comma.ai, the project is named Openpilot. Is it possible? If so, how is it made possible? With curiosity in mind, we deep-dive into Openpilot and conclude that its key to success is the end-to-end system design instead of a conventional modular framework. The model is briefed as Supercombo, and it can predict the ego vehicle’s future trajectory and other road semantics on the fly from monocular input. Unfortunately, the training process and massive amount of data to make all these work are not publicly available. To achieve an intensive investigation, we try to reimplement the training details and test the pipeline on public benchmarks. The refactored network proposed in this work is referred to as OP-Deepdive. For a fair comparison of our version to the original Supercombo, we introduce a dual-model deployment scheme to test the driving performance in the real world. Experimental results on nuScenes, Comma2k19, CARLA, and in-house realistic scenarios verify that a low-cost device can indeed achieve most L2 functionalities and be on par with the original Supercombo model. In this report, we would like to share our latest findings, shed some light on the new perspective of end-to-end autonomous driving from an industrial product-level side, and potentially inspire the community to continue improving the performance. Our code, benchmarks are at https://github.com/OpenPerceptionX/Openpilot-Deepdive
主要的自動(dòng)駕駛解決方案配備了廣泛的傳感器,在安全系統(tǒng)設(shè)計(jì)方面正變得更加模塊化。盡管這些傳感器已經(jīng)奠定了堅(jiān)實(shí)的基礎(chǔ),但迄今為止大多數(shù)量產(chǎn)解決方案仍處于 L2 階段。其中,Comma.ai 出現(xiàn)在我們的視線中,聲稱一款售價(jià) 999 美元的售后設(shè)備安裝了單個(gè)攝像頭和板卡,具有處理 L2 場(chǎng)景的能力。加上 Comma.ai 發(fā)布的整個(gè)系統(tǒng)的開源軟件,該項(xiàng)目被命名為 Openpilot。可能嗎?如果是這樣,它是如何實(shí)現(xiàn)的?帶著好奇心,我們深入研究了 Openpilot,并得出結(jié)論,它成功的關(guān)鍵是端到端的系統(tǒng)設(shè)計(jì),而不是傳統(tǒng)的模塊化框架。該模型簡(jiǎn)稱為 Supercombo,它可以從單目輸入動(dòng)態(tài)預(yù)測(cè)自我車輛的未來(lái)軌跡和其他道路語(yǔ)義。不幸的是,所有這些工作的訓(xùn)練過程和大量數(shù)據(jù)都沒有公開。為了進(jìn)行深入調(diào)查,我們嘗試重新實(shí)現(xiàn)訓(xùn)練細(xì)節(jié)并在公共基準(zhǔn)上測(cè)試管道。在這項(xiàng)工作中提出的重構(gòu)網(wǎng)絡(luò)被稱為 OP-Deepdive。為了將我們的版本與原始 Supercombo 進(jìn)行公平比較,我們引入了雙模型部署方案來(lái)測(cè)試現(xiàn)實(shí)世界中的駕駛性能。 nuScenes、Comma2k19、CARLA 和內(nèi)部真實(shí)場(chǎng)景的實(shí)驗(yàn)結(jié)果驗(yàn)證了低成本設(shè)備確實(shí)可以實(shí)現(xiàn)大多數(shù) L2 功能,并且與原始 Supercombo 模型相當(dāng)。在本報(bào)告中,我們想分享我們的最新發(fā)現(xiàn),從工業(yè)產(chǎn)品層面闡明端到端自動(dòng)駕駛的新視角,并可能激勵(lì)社區(qū)繼續(xù)提高性能。我們的代碼和基準(zhǔn)位于 https://github.com/OpenPerceptionX/Openpilot-Deepdive
論文:Discrete Contrastive Diffusion for Cross-Modal and Conditional Generation
論文標(biāo)題:Discrete Contrastive Diffusion for Cross-Modal and Conditional Generation
論文時(shí)間:15 Jun 2022
所屬領(lǐng)域:計(jì)算機(jī)視覺
對(duì)應(yīng)任務(wù):Contrastive Learning,Denoising,Image Generation,Music Generation,對(duì)比學(xué)習(xí),去噪,圖像生成,音樂生成
論文地址:https://arxiv.org/abs/2206.07771
代碼實(shí)現(xiàn):https://github.com/l-yezhu/cdcd
論文作者:Ye Zhu, Yu Wu, Kyle Olszewski, Jian Ren, Sergey Tulyakov, Yan Yan
論文簡(jiǎn)介:To this end, we introduce a Conditional Discrete Contrastive Diffusion (CDCD) loss and design two contrastive diffusion mechanisms to effectively incorporate it into the denoising process./為此,我們引入了條件離散對(duì)比擴(kuò)散 (CDCD) 損失,并設(shè)計(jì)了兩種對(duì)比擴(kuò)散機(jī)制,以有效地將其納入去噪過程。
論文摘要:Diffusion probabilistic models (DPMs) have become a popular approach to conditional generation, due to their promising results and support for cross-modal synthesis. A key desideratum in conditional synthesis is to achieve high correspondence between the conditioning input and generated output. Most existing methods learn such relationships implicitly, by incorporating the prior into the variational lower bound. In this work, we take a different route – we enhance input-output connections by maximizing their mutual information using contrastive learning. To this end, we introduce a Conditional Discrete Contrastive Diffusion (CDCD) loss and design two contrastive diffusion mechanisms to effectively incorporate it into the denoising process. We formulate CDCD by connecting it with the conventional variational objectives. We demonstrate the efficacy of our approach in evaluations with three diverse, multimodal conditional synthesis tasks: dance-to-music generation, text-to-image synthesis, and class-conditioned image synthesis. On each, we achieve state-of-the-art or higher synthesis quality and improve the input-output correspondence. Furthermore, the proposed approach improves the convergence of diffusion models, reducing the number of required diffusion steps by more than 35% on two benchmarks, significantly increasing the inference speed.
擴(kuò)散概率模型 (DPMs) 已成為一種流行的條件生成方法,因?yàn)樗鼈兙哂锌上驳慕Y(jié)果和對(duì)跨模態(tài)合成的支持。條件合成中的一個(gè)關(guān)鍵要求是在條件輸入和生成的輸出之間實(shí)現(xiàn)高度對(duì)應(yīng)。大多數(shù)現(xiàn)有方法通過將先驗(yàn)結(jié)合到變分下限中來(lái)隱式地學(xué)習(xí)這種關(guān)系。在這項(xiàng)工作中,我們采取了不同的路線 - 我們通過使用對(duì)比學(xué)習(xí)最大化它們的互信息來(lái)增強(qiáng)輸入-輸出連接。為此,我們引入了條件離散對(duì)比擴(kuò)散(CDCD)損失,并設(shè)計(jì)了兩種對(duì)比擴(kuò)散機(jī)制,以有效地將其納入去噪過程。我們通過將 CDCD 與傳統(tǒng)的變分目標(biāo)聯(lián)系起來(lái)來(lái)制定 CDCD。我們展示了我們的方法在評(píng)估三種不同的多模態(tài)條件合成任務(wù)中的有效性:舞蹈到音樂生成、文本到圖像合成和類條件圖像合成。在每一個(gè)方面,我們都實(shí)現(xiàn)了最先進(jìn)或更高的合成質(zhì)量,并改善了輸入-輸出的對(duì)應(yīng)關(guān)系。此外,所提出的方法提高了擴(kuò)散模型的收斂性,在兩個(gè)基準(zhǔn)上將所需的擴(kuò)散步驟數(shù)量減少了 35% 以上,顯著提高了推理速度。
論文:GLIPv2: Unifying Localization and Vision-Language Understanding
論文標(biāo)題:GLIPv2: Unifying Localization and Vision-Language Understanding
論文時(shí)間:12 Jun 2022
所屬領(lǐng)域:計(jì)算機(jī)視覺,自然語(yǔ)言處理
對(duì)應(yīng)任務(wù):Contrastive Learning,Image Captioning,Instance Segmentation,Language Modelling,Masked Language Modeling,object-detection,Object Detection,Phrase Grounding,Referring Expression Segmentation,Semantic Segmentation,Visual Question Answering,VQA,對(duì)比學(xué)習(xí),圖像字幕,實(shí)例分割,語(yǔ)言建模,蒙面語(yǔ)言建模,物體檢測(cè),物體檢測(cè),短語(yǔ)接地,參考表達(dá)分割,語(yǔ)義分割,視覺問答
論文地址:https://arxiv.org/abs/2206.05836
代碼實(shí)現(xiàn):https://github.com/microsoft/GLIP
論文作者:Haotian Zhang, Pengchuan Zhang, Xiaowei Hu, Yen-Chun Chen, Liunian Harold Li, Xiyang Dai, Lijuan Wang, Lu Yuan, Jenq-Neng Hwang, Jianfeng Gao
論文簡(jiǎn)介:We present GLIPv2, a grounded VL understanding model, that serves both localization tasks (e. g., object detection, instance segmentation) and Vision-Language (VL) understanding tasks (e. g., VQA, image captioning)./我們提出了 GLIPv2,一種基于 VL 的理解模型,它同時(shí)服務(wù)于本地化任務(wù)(例如,對(duì)象檢測(cè)、實(shí)例分割)和視覺語(yǔ)言 (VL) 理解任務(wù)(例如,VQA、圖像字幕)。
論文摘要:We present GLIPv2, a grounded VL understanding model, that serves both localization tasks (e.g., object detection, instance segmentation) and Vision-Language (VL) understanding tasks (e.g., VQA, image captioning). GLIPv2 elegantly unifies localization pre-training and Vision-Language Pre-training (VLP) with three pre-training tasks: phrase grounding as a VL reformulation of the detection task, region-word contrastive learning as a novel region-word level contrastive learning task, and the masked language modeling. This unification not only simplifies the previous multi-stage VLP procedure but also achieves mutual benefits between localization and understanding tasks. Experimental results show that a single GLIPv2 model (all model weights are shared) achieves near SoTA performance on various localization and understanding tasks. The model also shows (1) strong zero-shot and few-shot adaption performance on open-vocabulary object detection tasks and (2) superior grounding capability on VL understanding tasks. Code will be released at https://github.com/microsoft/GLIP
我們提出了 GLIPv2,一個(gè)基于 VL 的理解模型,它服務(wù)于本地化任務(wù)(例如,目標(biāo)檢測(cè)、實(shí)例分割)和視覺語(yǔ)言(VL)理解任務(wù)(例如,VQA、圖像字幕/看圖說(shuō)話)。 GLIPv2 優(yōu)雅地將定位預(yù)訓(xùn)練和視覺語(yǔ)言預(yù)訓(xùn)練 (VLP) 與三個(gè)預(yù)訓(xùn)練任務(wù)相結(jié)合:短語(yǔ)接地作為檢測(cè)任務(wù)的 VL 重構(gòu),區(qū)域-詞對(duì)比學(xué)習(xí)作為新的區(qū)域-詞級(jí)對(duì)比學(xué)習(xí)任務(wù),以及掩碼語(yǔ)言建模。這種統(tǒng)一不僅簡(jiǎn)化了之前的多階段 VLP 程序,而且實(shí)現(xiàn)了定位和理解任務(wù)之間的互相促進(jìn)。實(shí)驗(yàn)結(jié)果表明,單個(gè) GLIPv2 模型(所有模型權(quán)重共享)在各種定位和理解任務(wù)上實(shí)現(xiàn)了接近 SoTA 的性能。該模型還展示了(1)在開放詞匯目標(biāo)檢測(cè)任務(wù)上的強(qiáng)大的零樣本和少樣本適應(yīng)性能和(2)在 VL 理解任務(wù)上的出色接地能力。代碼將在 https://github.com/microsoft/GLIP 發(fā)布。
論文:Degradation-Aware Unfolding Half-Shuffle Transformer for Spectral Compressive Imaging
論文標(biāo)題:Degradation-Aware Unfolding Half-Shuffle Transformer for Spectral Compressive Imaging
論文時(shí)間:20 May 2022
所屬領(lǐng)域:計(jì)算機(jī)視覺
對(duì)應(yīng)任務(wù):Compressive Sensing,Image Reconstruction,Image Restoration,壓縮感知,圖像重建,圖像恢復(fù)
論文地址:https://arxiv.org/abs/2205.10102
代碼實(shí)現(xiàn):https://github.com/caiyuanhao1998/MST
論文作者:Yuanhao Cai, Jing Lin, Haoqian Wang, Xin Yuan, Henghui Ding, Yulun Zhang, Radu Timofte, Luc van Gool
論文簡(jiǎn)介:In coded aperture snapshot spectral compressive imaging (CASSI) systems, hyperspectral image (HSI) reconstruction methods are employed to recover the spatial-spectral signal from a compressed measurement./在編碼孔徑快照光譜壓縮成像 (CASSI) 系統(tǒng)中,采用高光譜圖像 (HSI) 重建方法從壓縮測(cè)量中恢復(fù)空間光譜信號(hào)。
論文摘要:In coded aperture snapshot spectral compressive imaging (CASSI) systems, hyperspectral image (HSI) reconstruction methods are employed to recover the spatial-spectral signal from a compressed measurement. Among these algorithms, deep unfolding methods demonstrate promising performance but suffer from two issues. Firstly, they do not estimate the degradation patterns and ill-posedness degree from the highly related CASSI to guide the iterative learning. Secondly, they are mainly CNN-based, showing limitations in capturing long-range dependencies. In this paper, we propose a principled Degradation-Aware Unfolding Framework (DAUF) that estimates parameters from the compressed image and physical mask, and then uses these parameters to control each iteration. Moreover, we customize a novel Half-Shuffle Transformer (HST) that simultaneously captures local contents and non-local dependencies. By plugging HST into DAUF, we establish the first Transformer-based deep unfolding method, Degradation-Aware Unfolding Half-Shuffle Transformer (DAUHST), for HSI reconstruction. Experiments show that DAUHST significantly surpasses state-of-the-art methods while requiring cheaper computational and memory costs. Code and models will be released at https://github.com/caiyuanhao1998/MST
在編碼孔徑快照光譜壓縮成像 (CASSI) 系統(tǒng)中,采用高光譜圖像 (HSI) 重建方法從壓縮測(cè)量中恢復(fù)空間光譜信號(hào)。在這些算法中,深度展開方法表現(xiàn)出良好的性能,但存在兩個(gè)問題。首先,它們沒有從高度相關(guān)的 CASSI 中估計(jì)退化模式和不適定度來(lái)指導(dǎo)迭代學(xué)習(xí)。其次,它們主要是基于 CNN 的,在捕獲遠(yuǎn)程依賴方面表現(xiàn)出局限性。在本文中,我們提出了一個(gè)原則性的退化感知展開框架(DAUF),它從壓縮圖像和物理掩碼中估計(jì)參數(shù),然后使用這些參數(shù)來(lái)控制每次迭代。此外,我們定制了一種新穎的 Half-Shuffle Transformer (HST),它同時(shí)捕獲本地內(nèi)容和非本地依賴項(xiàng)。通過將 HST 插入 DAUF,我們建立了第一個(gè)基于 Transformer 的深度展開方法,即 Degradation-Aware Unfolding Half-Shuffle Transformer (DAUHST),用于 HSI 重建。實(shí)驗(yàn)表明,DAUHST 顯著超越了最先進(jìn)的方法,同時(shí)所需計(jì)算量和內(nèi)存成本也降低了。代碼和模型將在 https://github.com/caiyuanhao1998/MST 發(fā)布
論文:HumanNeRF: Free-viewpoint Rendering of Moving People from Monocular Video
論文標(biāo)題:HumanNeRF: Free-viewpoint Rendering of Moving People from Monocular Video
論文時(shí)間:CVPR 2022
所屬領(lǐng)域:計(jì)算機(jī)視覺
論文地址:https://arxiv.org/abs/2201.04127
代碼實(shí)現(xiàn):https://github.com/chungyiweng/humannerf
論文作者:Chung-Yi Weng, Brian Curless, Pratul P. Srinivasan, Jonathan T. Barron, Ira Kemelmacher-Shlizerman
論文簡(jiǎn)介:Our method optimizes for a volumetric representation of the person in a canonical T-pose, in concert with a motion field that maps the estimated canonical representation to every frame of the video via backward warps./我們的方法優(yōu)化了人在標(biāo)準(zhǔn) T 姿勢(shì)中的體積表示,與運(yùn)動(dòng)場(chǎng)相一致,該運(yùn)動(dòng)場(chǎng)通過向后扭曲將估計(jì)的標(biāo)準(zhǔn)表示映射到視頻的每一幀。
論文摘要:We introduce a free-viewpoint rendering method – HumanNeRF – that works on a given monocular video of a human performing complex body motions, e.g. a video from YouTube. Our method enables pausing the video at any frame and rendering the subject from arbitrary new camera viewpoints or even a full 360-degree camera path for that particular frame and body pose. This task is particularly challenging, as it requires synthesizing photorealistic details of the body, as seen from various camera angles that may not exist in the input video, as well as synthesizing fine details such as cloth folds and facial appearance. Our method optimizes for a volumetric representation of the person in a canonical T-pose, in concert with a motion field that maps the estimated canonical representation to every frame of the video via backward warps. The motion field is decomposed into skeletal rigid and non-rigid motions, produced by deep networks. We show significant performance improvements over prior work, and compelling examples of free-viewpoint renderings from monocular video of moving humans in challenging uncontrolled capture scenarios.
我們介紹了一種自由視點(diǎn)渲染方法 - HumanNeRF - 它適用于人類執(zhí)行復(fù)雜身體運(yùn)動(dòng)的給定單目視頻,例如:來(lái)自 YouTube 的視頻。我們的方法可以在任何幀暫停視頻,并從任意新的攝像機(jī)視點(diǎn)甚至是該特定幀和身體姿勢(shì)的完整 360 度攝像機(jī)路徑渲染主體。這項(xiàng)任務(wù)特別具有挑戰(zhàn)性,因?yàn)樗枰铣缮眢w的逼真細(xì)節(jié),從輸入視頻中可能不存在的各種攝像機(jī)角度看,以及合成精細(xì)的細(xì)節(jié),如布料褶皺和面部外觀。我們的方法優(yōu)化了典型 T 姿勢(shì)中人的體積表示,與運(yùn)動(dòng)場(chǎng)相一致,該運(yùn)動(dòng)場(chǎng)通過向后扭曲將估計(jì)的典型表示映射到視頻的每一幀。運(yùn)動(dòng)場(chǎng)被分解為由深度網(wǎng)絡(luò)產(chǎn)生的骨骼剛性和非剛性運(yùn)動(dòng)。我們展示了相對(duì)于先前工作的顯著性能改進(jìn),以及在具有挑戰(zhàn)性的不受控制的捕獲場(chǎng)景中移動(dòng)人類的單目視頻的自由視點(diǎn)渲染示例。
論文:SemanticStyleGAN: Learning Compositional Generative Priors for Controllable Image Synthesis and Editing
論文標(biāo)題:SemanticStyleGAN: Learning Compositional Generative Priors for Controllable Image Synthesis and Editing
論文時(shí)間:CVPR 2022
所屬領(lǐng)域:計(jì)算機(jī)視覺
對(duì)應(yīng)任務(wù):Disentanglement,Facial Editing,Image Generation,Transfer Learning,解纏結(jié),人臉編輯,圖像生成,遷移學(xué)習(xí)
論文地址:https://arxiv.org/abs/2112.02236
代碼實(shí)現(xiàn):https://github.com/seasonSH/SemanticStyleGAN
論文作者:Yichun Shi, Xiao Yang, Yangyue Wan, Xiaohui Shen
論文簡(jiǎn)介:When combined with editing methods designed for StyleGANs, it can achieve a more fine-grained control to edit synthesized or real images./當(dāng)與為 StyleGAN 設(shè)計(jì)的編輯方法結(jié)合使用時(shí),它可以實(shí)現(xiàn)更細(xì)粒度的控制來(lái)編輯合成或真實(shí)圖像。
論文摘要:Recent studies have shown that StyleGANs provide promising prior models for downstream tasks on image synthesis and editing. However, since the latent codes of StyleGANs are designed to control global styles, it is hard to achieve a fine-grained control over synthesized images. We present SemanticStyleGAN, where a generator is trained to model local semantic parts separately and synthesizes images in a compositional way. The structure and texture of different local parts are controlled by corresponding latent codes. Experimental results demonstrate that our model provides a strong disentanglement between different spatial areas. When combined with editing methods designed for StyleGANs, it can achieve a more fine-grained control to edit synthesized or real images. The model can also be extended to other domains via transfer learning. Thus, as a generic prior model with built-in disentanglement, it could facilitate the development of GAN-based applications and enable more potential downstream tasks.
最近的研究表明,StyleGAN 為圖像合成和編輯的下游任務(wù)提供了有前途的先驗(yàn)?zāi)P汀H欢?#xff0c;由于 StyleGAN 的潛在代碼旨在控制全局樣式,因此很難實(shí)現(xiàn)對(duì)合成圖像的細(xì)粒度控制。我們提出了 SemanticStyleGAN,其中一個(gè)生成器被訓(xùn)練來(lái)分別對(duì)局部語(yǔ)義部分進(jìn)行建模,并以組合的方式合成圖像。不同局部部分的結(jié)構(gòu)和紋理由相應(yīng)的潛在代碼控制。實(shí)驗(yàn)結(jié)果表明,我們的模型在不同的空間區(qū)域之間提供了強(qiáng)大的解耦。當(dāng)與為 StyleGAN 設(shè)計(jì)的編輯方法相結(jié)合時(shí),它可以實(shí)現(xiàn)更細(xì)粒度的控制來(lái)編輯合成或真實(shí)圖像。該模型還可以通過遷移學(xué)習(xí)擴(kuò)展到其他領(lǐng)域。因此,作為具有內(nèi)置解纏結(jié)的通用先驗(yàn)?zāi)P?#xff0c;它可以促進(jìn)基于 GAN 的應(yīng)用程序的開發(fā)和支撐更多潛在的下游任務(wù)。
論文:3D-aware Image Synthesis via Learning Structural and Textural Representations
論文標(biāo)題:3D-aware Image Synthesis via Learning Structural and Textural Representations
論文時(shí)間:CVPR 2022
所屬領(lǐng)域:計(jì)算機(jī)視覺
對(duì)應(yīng)任務(wù):3D-Aware Image Synthesis,Image Generation,3D感知圖像合成,圖像生成
論文地址:https://arxiv.org/abs/2112.10759
代碼實(shí)現(xiàn):https://github.com/genforce/volumegan
論文作者:Yinghao Xu, Sida Peng, Ceyuan Yang, Yujun Shen, Bolei Zhou
論文簡(jiǎn)介:The feature field is further accumulated into a 2D feature map as the textural representation, followed by a neural renderer for appearance synthesis./特征場(chǎng)進(jìn)一步積累成二維特征圖作為紋理表示,然后是神經(jīng)渲染器進(jìn)行外觀合成。
論文摘要:Making generative models 3D-aware bridges the 2D image space and the 3D physical world yet remains challenging. Recent attempts equip a Generative Adversarial Network (GAN) with a Neural Radiance Field (NeRF), which maps 3D coordinates to pixel values, as a 3D prior. However, the implicit function in NeRF has a very local receptive field, making the generator hard to become aware of the global structure. Meanwhile, NeRF is built on volume rendering which can be too costly to produce high-resolution results, increasing the optimization difficulty. To alleviate these two problems, we propose a novel framework, termed as VolumeGAN, for high-fidelity 3D-aware image synthesis, through explicitly learning a structural representation and a textural representation. We first learn a feature volume to represent the underlying structure, which is then converted to a feature field using a NeRF-like model. The feature field is further accumulated into a 2D feature map as the textural representation, followed by a neural renderer for appearance synthesis. Such a design enables independent control of the shape and the appearance. Extensive experiments on a wide range of datasets show that our approach achieves sufficiently higher image quality and better 3D control than the previous methods.
使生成模型具有 3D 感知能力在 2D 圖像空間和 3D 物理世界之間架起一座橋梁,但仍然具有挑戰(zhàn)性。最近的嘗試為生成對(duì)抗網(wǎng)絡(luò) (GAN) 配備了神經(jīng)輻射場(chǎng) (NeRF),它將 3D 坐標(biāo)映射到像素值,作為 3D 先驗(yàn)。然而,NeRF 中的隱函數(shù)具有非常局部的感受野,使得生成器很難意識(shí)到全局結(jié)構(gòu)。同時(shí),NeRF 建立在體繪制之上,其成本太高而無(wú)法產(chǎn)生高分辨率結(jié)果,從而增加了優(yōu)化難度。為了緩解這兩個(gè)問題,我們提出了一種稱為 VolumeGAN 的新穎框架,用于通過顯式學(xué)習(xí)結(jié)構(gòu)表示和紋理表示來(lái)進(jìn)行高保真 3D 感知圖像合成。我們首先學(xué)習(xí)一個(gè)特征量來(lái)表示底層結(jié)構(gòu),然后使用類似 NeRF 的模型將其轉(zhuǎn)換為特征場(chǎng)。特征場(chǎng)進(jìn)一步累積成 2D 特征圖作為紋理表示,然后是用于外觀合成的神經(jīng)渲染器。這樣的設(shè)計(jì)能夠獨(dú)立控制形狀和外觀。在廣泛的數(shù)據(jù)集上進(jìn)行的大量實(shí)驗(yàn)表明,我們的方法比以前的方法實(shí)現(xiàn)了更高的圖像質(zhì)量和更好的 3D 控制。
我們是 ShowMeAI,致力于傳播AI優(yōu)質(zhì)內(nèi)容,分享行業(yè)解決方案,用知識(shí)加速每一次技術(shù)成長(zhǎng)!點(diǎn)擊查看 歷史文章列表,在公眾號(hào)內(nèi)訂閱話題 #ShowMeAI資訊日?qǐng)?bào),可接收每日最新推送。點(diǎn)擊 專題合輯&電子月刊 快速瀏覽各專題全集。點(diǎn)擊 這里 回復(fù)關(guān)鍵字 日?qǐng)?bào) 免費(fèi)獲取AI電子月刊與資料包。
- 作者:韓信子@ShowMeAI
- 歷史文章列表
- 專題合輯&電子月刊
- 聲明:版權(quán)所有,轉(zhuǎn)載請(qǐng)聯(lián)系平臺(tái)與作者并注明出處
- 歡迎回復(fù),拜托點(diǎn)贊,留言推薦中有價(jià)值的文章、工具或建議,我們都會(huì)盡快回復(fù)噠~
總結(jié)
以上是生活随笔為你收集整理的人工智能 | ShowMeAI资讯日报 #2022.06.22的全部?jī)?nèi)容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: VLOOKUP函数具体操作及注意事项
- 下一篇: 自定义滚动条(css)