ONNX 模型库
返回模型

说明文档

<font size=5>MuseV:基于视觉条件并行去噪的无限长度高保真虚拟人视频生成</font> </br> Zhiqiang Xia <sup>*</sup>, Zhaokang Chen<sup>*</sup>, Bin Wu<sup>†</sup>, Chao Li, Kwok-Wai Hung, Chao Zhan, Yingjie He, Wenjiang Zhou (<sup>*</sup>共同第一作者,<sup>†</sup>通讯作者,benbinwu@tencent.com) </br> Lyra Lab,腾讯音乐娱乐

github huggingface HuggingfaceSpace 项目主页 技术报告(即将推出)

我们自 2023年3月 起就确立了 世界模拟器愿景,坚信扩散模型能够模拟世界MuseV 是我们在 2023年7月 左右达成的里程碑成果。受到 Sora 进展的震撼,我们决定开源 MuseV,希望它能惠及社区。接下来我们将转向更有前景的扩散+Transformer 方案。

我们即将发布 MuseTalk,这是一个实时高质量的唇形同步模型,可以与 MuseV 配合使用,形成完整的虚拟人生成解决方案。敬请期待!

概述

MuseV 是一个基于扩散模型的虚拟人视频生成框架,具有以下特点:

  1. 采用全新的 视觉条件并行去噪方案,支持 无限长度 生成。
  2. 提供基于人体数据集训练的虚拟人视频生成检查点。
  3. 支持图像生成视频(Image2Video)、文本生成图像再生成视频(Text2Image2Video)、视频生成视频(Video2Video)。
  4. 兼容 Stable Diffusion 生态,包括 base_modelloracontrolnet 等。
  5. 支持多参考图像技术,包括 IPAdapterReferenceOnlyReferenceNetIPAdapterFaceID
  6. 训练代码(即将推出)。

动态

  • [2024/03/27] 发布 MuseV 项目及训练模型 musevmuse_referencenetmuse_referencenet_pose

模型

模型结构概览

模型结构

并行去噪

并行去噪

案例

所有帧均由 text2video 模型生成,未经任何后处理。

以下案例可在 configs/tasks/example.yaml 中找到

文本/图像生成视频

人物

<!-- 2列,一张图片,一个视频 -->

<table class="center"> <tr style="font-weight: bolder;text-align:center;"> <td>图像</td> <td>视频</td> <td>提示词</td> </tr>

<tr> <td> <img src=https://cdn-uploads.huggingface.co/production/uploads/65f9352ed760cfdf5eb80e16/cTQX49v7GT7GA-NEHj5vK.jpeg width="200"> </td> <td > <video width="400" controls autoplay src="https://cdn-uploads.huggingface.co/production/uploads/65f9352ed760cfdf5eb80e16/U4sHVv_fYVbFHveS7Sw7h.mp4"></video> </td> <td>(masterpiece, best quality, highres:1),(1girl, solo:1),(beautiful face, soft skin, costume:1),(eye blinks:{eye_blinks_factor}),(head wave:1.3) </td> </tr>

<tr> <td> <img src=https://cdn-uploads.huggingface.co/production/uploads/65f9352ed760cfdf5eb80e16/SPSgJpptVM4Qm11nqD07C.jpeg width="200"> </td> <td> <video width="400" controls autoplay src="https://cdn-uploads.huggingface.co/production/uploads/65f9352ed760cfdf5eb80e16/XMya1FCmRs6USzKp9qrAy.mp4"></video> </td> <td> (masterpiece, best quality, highres:1),(1girl, solo:1),(beautiful face, soft skin, costume:1),(eye blinks:{eye_blinks_factor}),(head wave:1.3) </td> </tr> <tr> <td> <img src=https://cdn-uploads.huggingface.co/production/uploads/65f9352ed760cfdf5eb80e16/l6chBPhUKeOLbnXnX-ewG.jpeg width="200"> </td> <td> <video width="400" controls autoplay src="https://cdn-uploads.huggingface.co/production/uploads/65f9352ed760cfdf5eb80e16/pPnU6QXgWuWxw5SWZdl8N.mp4"></video> </td>
<td> (masterpiece, best quality, highres:1), peaceful beautiful sea scene </td> </tr>

<tr> <td> <img src=https://cdn-uploads.huggingface.co/production/uploads/65f9352ed760cfdf5eb80e16/VQMEeGc1wTuiATtQLJjer.jpeg width="200"> </td> <td> <video width="400" controls autoplay src="https://cdn-uploads.huggingface.co/production/uploads/65f9352ed760cfdf5eb80e16/_9ZQEebUlmSNtXMJKiGPu.mp4"></video> </td> <td> (masterpiece, best quality, highres:1), peaceful beautiful sea scene </td> </tr> <!-- guitar --> <tr> <td> <img src=https://cdn-uploads.huggingface.co/production/uploads/65f9352ed760cfdf5eb80e16/Fk_eec7vqq4NfAYVPNLI-.jpeg width="200"> </td> <td> <video width="400" controls autoplay src="https://cdn-uploads.huggingface.co/production/uploads/65f9352ed760cfdf5eb80e16/p0gWRwZTDOrbPf8mZOphG.mp4"></video> </td> <td> (masterpiece, best quality, highres:1), playing guitar </td> </tr> <tr> <td> <img src=https://cdn-uploads.huggingface.co/production/uploads/65f9352ed760cfdf5eb80e16/Zcc5xj1-lA_EPS7gvJu99.jpeg width="200"> </td> <td> <video width="400" controls autoplay src="https://cdn-uploads.huggingface.co/production/uploads/65f9352ed760cfdf5eb80e16/4mT4KL3q4FzyQQKJfgXVG.mp4"></video> </td> <td> (masterpiece, best quality, highres:1), playing guitar </td> </tr> <tr> <td> <img src=https://cdn-uploads.huggingface.co/production/uploads/65f9352ed760cfdf5eb80e16/iT5OsCpRNnntuS0TH1cG5.jpeg width="200"> </td> <td> <video width="400" controls autoplay src="https://cdn-uploads.huggingface.co/production/uploads/65f9352ed760cfdf5eb80e16/6RWBw73-oE4rJH808FzIK.mp4"> </td> <td> (masterpiece, best quality, highres:1), playing guitar </td> </tr> <tr> <td> <img src=https://cdn-uploads.huggingface.co/production/uploads/65f9352ed760cfdf5eb80e16/Ym582ZF-MbYkRW1sAE5r3.jpeg width="200"> </td> <td> <video width="400" controls autoplay src="https://cdn-uploads.huggingface.co/production/uploads/65f9352ed760cfdf5eb80e16/lYqpeRcIiK7WEqRe4d8dZ.mp4"></video> </td> <td> (masterpiece, best quality, highres:1), playing guitar </td> </tr> <!-- famous people --> <tr> <td> <img src=https://cdn-uploads.huggingface.co/production/uploads/65f9352ed760cfdf5eb80e16/JsAjbl4AeYz089kWHjjUJ.jpeg width="200"> </td> <td> <video width="400" controls autoplay src="https://cdn-uploads.huggingface.co/production/uploads/65f9352ed760cfdf5eb80e16/KCjF9HutBo7el15gm3YV3.mp4"></video> </td> <td> (masterpiece, best quality, highres:1),(1man, solo:1),(beautiful face, soft skin, costume:1),(eye blinks:{eye_blinks_factor}),(head wave:1.3) </td> </tr>

<tr> <td> <img src=https://cdn-uploads.huggingface.co/production/uploads/65f9352ed760cfdf5eb80e16/v-X3Wrkm14YwLGGloNlMK.jpeg width="200"> </td> <td> <video width="400" controls autoplay src="https://cdn-uploads.huggingface.co/production/uploads/65f9352ed760cfdf5eb80e16/P_Y5jUO1EJ6n3Z4qd1xh1.mp4"></video> </td>
<td> (masterpiece, best quality, highres:1),(1girl, solo:1),(beautiful face, soft skin, costume:1),(eye blinks:{eye_blinks_factor}),(head wave:1.3) </td> </tr> <tr> <td> <img src=https://cdn-uploads.huggingface.co/production/uploads/65f9352ed760cfdf5eb80e16/cNe41NV1OfLF5AmMKD6mi.jpeg width="200"> </td> <td> <video width="400" controls autoplay src="https://cdn-uploads.huggingface.co/production/uploads/65f9352ed760cfdf5eb80e16/L6mA8uuckRJhzhAJHayJT.mp4"></video>
</td> <td> (masterpiece, best quality, highres:1),(1man, solo:1),(beautiful face, soft skin, costume:1),(eye blinks:{eye_blinks_factor}),(head wave:1.3) </td> </tr> <tr> <td> <img src=https://cdn-uploads.huggingface.co/production/uploads/65f9352ed760cfdf5eb80e16/6iHIaa15eBgop7BsE0Nps.jpeg width="200"> </td> <td> <video width="400" controls autoplay src="https://cdn-uploads.huggingface.co/production/uploads/65f9352ed760cfdf5eb80e16/foPX3iRk2TzjRl_V52T21.mp4"></video> </td> <td> (masterpiece, best quality, highres:1),(1girl, solo:1),(beautiful face, soft skin, costume:1),(eye blinks:{eye_blinks_factor}),(head wave:1.3) </td> </tr> <tr> <td> <img src=https://cdn-uploads.huggingface.co/production/uploads/65f9352ed760cfdf5eb80e16/R7rp7t4DPkws0dXRxi0bf.jpeg width="200"> </td> <td> <video width="400" controls autoplay src="https://cdn-uploads.huggingface.co/production/uploads/65f9352ed760cfdf5eb80e16/tGTSNe9i08pvTMNe6SOBg.mp4"></video> </td> <td> (masterpiece, best quality, highres:1),(1girl, solo:1),(beautiful face, soft skin, costume:1),(eye blinks:{eye_blinks_factor}),(head wave:1.3) </td> </tr> </table >

场景

<table class="center"> <tr style="font-weight: bolder;text-align:center;"> <td>图像</td> <td>视频</td> <td>提示词</td> </tr>

<tr> <td> <img src=https://cdn-uploads.huggingface.co/production/uploads/65f9352ed760cfdf5eb80e16/EIembLBwySZTBjFZStFr_.jpeg width="200"> </td> <td> <video width="400" controls autoplay src="https://cdn-uploads.huggingface.co/production/uploads/65f9352ed760cfdf5eb80e16/jfHV6a186BzAu-Tz0ET1o.mp4"></video> </td> <td> (masterpiece, best quality, highres:1), peaceful beautiful waterfall, an endless waterfall </td> </tr>

<tr> <td> <img src=https://cdn-uploads.huggingface.co/production/uploads/65f9352ed760cfdf5eb80e16/u2_mzl5m-Z0nwSYFcTLxs.jpeg width="200"> </td> <td> <video width="400" controls autoplay src="https://cdn-uploads.huggingface.co/production/uploads/65f9352ed760cfdf5eb80e16/eXrRVejCZs3QaA4-JK6Le.mp4"></video> </td> <td>(masterpiece, best quality, highres:1), peaceful beautiful river </td> </tr>

<tr> <td> <img src=https://cdn-uploads.huggingface.co/production/uploads/65f9352ed760cfdf5eb80e16/NIHfIi7onyJ5ELetE2f_Z.jpeg width="200"> </td> <td> <video width="400" controls autoplay src="https://cdn-uploads.huggingface.co/production/uploads/65f9352ed760cfdf5eb80e16/eaZyhukcfaoY7dIGp2cs6.mp4"></video> </td> <td>(masterpiece, best quality, highres:1), peaceful beautiful sea scene </td> </tr> </table >

视频中间帧生成视频

姿态生成视频(pose2video)

在 duffy 案例中,视觉条件帧的姿态与控制视频第一帧的姿态不一致。posealign 模块可以解决这个问题。

<table class="center"> <tr style="font-weight: bolder;text-align:center;"> <td>图像</td> <td>视频</td> <td>提示词</td> </tr>

<tr> <td> <img src="https://cdn-uploads.huggingface.co/production/uploads/65f9352ed760cfdf5eb80e16/fX1ND0YqDp1LV0LEh2eFN.png" width="200"> <img src="https://cdn-uploads.huggingface.co/production/uploads/65f9352ed760cfdf5eb80e16/pe2aQt5FU66tplNZCOZaB.png" width="200"> </td> <td> <video width="900" src="https://cdn-uploads.huggingface.co/production/uploads/65f9352ed760cfdf5eb80e16/IMPIDjR7-w5A_xc6ZHIzT.mp4" controls preload></video> </td> <td> (masterpiece, best quality, highres:1) </td> </tr>

<tr> <td> <img src="https://cdn-uploads.huggingface.co/production/uploads/65f9352ed760cfdf5eb80e16/FlLWP8IqM_X2K4hXAOPHO.png" width="200"> </td> <td> <video width="900" src="https://cdn-uploads.huggingface.co/production/uploads/65f9352ed760cfdf5eb80e16/OT22TR7e7Lcoxci9aoDBA.mp4" controls preload></video> </td> <td> (masterpiece, best quality, highres:1) </td> </tr> </table >

MuseTalk

<table class="center"> <tr style="font-weight: bolder;"> <td>名称</td> <td>视频</td> </tr> <tr> <td> 说话 </td> <td> <video width="350" src="https://cdn-uploads.huggingface.co/production/uploads/65f9352ed760cfdf5eb80e16/wUhNS7j5UQ28eXu4JVQfF.mp4" controls preload></video> </td> </tr>

<tr> <td> 说话 </td> <td> <video width="350" src="https://cdn-uploads.huggingface.co/production/uploads/65f9352ed760cfdf5eb80e16/bH-j15douHJcZXIEvMIDa.mp4" controls preload></video> </td> </tr>

<tr> <td> 唱歌 </td> <td> <video width="350" src="https://cdn-uploads.huggingface.co/production/uploads/65f9352ed760cfdf5eb80e16/l5ZTbUQ11gK6FUtoaRz4S.mp4" controls preload></video> </td> </tr> </table >

快速开始

请参考 MuseV

致谢

  1. MuseV 大量参考了 TuneAVideodiffusersMoore-AnimateAnyoneanimatediffIP-AdapterAnimateAnyoneVideoFusioninsightface
  2. MuseV 基于 ucf101webvid 数据集构建。

感谢开源!

局限性

目前仍存在许多局限性,包括:

  1. 泛化能力不足。某些视觉条件图像效果良好,某些效果较差。某些 t2i 预训练模型效果良好,某些效果较差。
  2. 视频生成类型有限,运动范围有限,部分原因是训练数据类型有限。发布的 MuseV 在约 60K 对分辨率为 512*320 的人体文本-视频对上进行了训练。MuseV 在较低分辨率下具有更大的运动范围,但视频质量较低。MuseV 倾向于在保持高视频质量的同时生成较小的运动范围。在更大、更高分辨率、更高质量的文本-视频数据集上训练可能会使 MuseV 更好。
  3. 由于 webvid 的原因,可能会出现水印。使用无水印的更干净数据集可以解决这个问题。
  4. 长视频生成类型有限。视觉条件并行去噪可以解决视频生成的累积误差问题,但当前方法仅适用于相对固定的摄像机场景。
  5. 由于时间和资源有限,referencenet 和 IP-Adapter 训练不足。
  6. 代码结构不够完善。MuseV 支持丰富且动态的功能,但代码复杂且未经重构。熟悉需要时间。

<!-- # Contribution 暂时不需要组织开源共建 -->

引用

@article{musev,
  title={MuseV: Infinite-length and High Fidelity Virtual Human Video Generation with Visual Conditioned Parallel Denoising},
  author={Xia, Zhiqiang and Chen, Zhaokang and Wu, Bin and Li, Chao and Hung, Kwok-Wai and Zhan, Chao and He, Yingjie and Zhou, Wenjiang},
  journal={arxiv},
  year={2024}
}

免责声明/许可协议

  1. 代码:MuseV 的代码基于 MIT 许可证发布。学术和商业用途均无限制。
  2. 模型:训练模型仅可用于非商业研究目的。
  3. 其他开源模型:使用的其他开源模型必须遵守其许可协议,如 insightfaceIP-Adapterft-mse-vae 等。
  4. 测试数据收集自互联网,仅可用于非商业研究目的。
  5. AIGC:本项目致力于对 AI 驱动的视频生成领域产生积极影响。用户被授予使用此工具创作视频的自由,但应遵守当地法律并负责任地使用。开发者不对用户的潜在滥用行为承担任何责任。

TMElyralab/MuseV

作者 TMElyralab

text-to-video diffusers
↓ 0 ♥ 133

创建时间: 2024-03-19 07:18:42+00:00

更新时间: 2024-04-01 15:48:25+00:00

在 Hugging Face 上查看

文件 (71)

.gitattributes
Dockerfile
IP-Adapter/models/image_encoder/config.json
IP-Adapter/models/image_encoder/model.safetensors
IP-Adapter/models/image_encoder/models/buffalo_l.zip
IP-Adapter/models/image_encoder/models/buffalo_l/1k3d68.onnx ONNX
IP-Adapter/models/image_encoder/models/buffalo_l/2d106det.onnx ONNX
IP-Adapter/models/image_encoder/models/buffalo_l/det_10g.onnx ONNX
IP-Adapter/models/image_encoder/models/buffalo_l/genderage.onnx ONNX
IP-Adapter/models/image_encoder/models/buffalo_l/w600k_r50.onnx ONNX
IP-Adapter/models/image_encoder/pytorch_model.bin
IP-Adapter/models/ip-adapter-faceid_sd15.bin
IP-Adapter/models/ip-adapter-plus-face_sd15.bin
IP-Adapter/models/ip-adapter-plus_sd15.bin
IP-Adapter/models/ip-adapter_sd15.bin
README.md
data/models/musev_structure.png
data/models/parallel_denoise.png
embedding/EasyNegativeV2.safetensors
embedding/bad_prompt_version2-neg.pt
embedding/badhandv4.pt
embedding/ng_deepnegative_v1_75t.pt
lcm/lcm-lora-sdv1-5/.gitattributes
lcm/lcm-lora-sdv1-5/README.md
lcm/lcm-lora-sdv1-5/image.png
lcm/lcm-lora-sdv1-5/pytorch_lora_weights.safetensors
motion/musev/unet/config.json
motion/musev/unet/diffusion_pytorch_model.bin
motion/musev_referencenet/ip_adapter_image_proj.bin
motion/musev_referencenet/referencenet/config.json
motion/musev_referencenet/referencenet/diffusion_pytorch_model.bin
motion/musev_referencenet/unet/config.json
motion/musev_referencenet/unet/diffusion_pytorch_model.bin
motion/musev_referencenet_pose/ip_adapter_image_proj.bin
motion/musev_referencenet_pose/unet/config.json
motion/musev_referencenet_pose/unet/diffusion_pytorch_model.bin
t2i/sd1.5/fantasticmix_v10/feature_extractor/preprocessor_config.json
t2i/sd1.5/fantasticmix_v10/model_index.json
t2i/sd1.5/fantasticmix_v10/safety_checker/config.json
t2i/sd1.5/fantasticmix_v10/safety_checker/model.safetensors
t2i/sd1.5/fantasticmix_v10/safety_checker/pytorch_model.bin
t2i/sd1.5/fantasticmix_v10/scheduler/scheduler_config.json
t2i/sd1.5/fantasticmix_v10/text_encoder/config.json
t2i/sd1.5/fantasticmix_v10/text_encoder/pytorch_model.bin
t2i/sd1.5/fantasticmix_v10/tokenizer/merges.txt
t2i/sd1.5/fantasticmix_v10/tokenizer/special_tokens_map.json
t2i/sd1.5/fantasticmix_v10/tokenizer/tokenizer_config.json
t2i/sd1.5/fantasticmix_v10/tokenizer/vocab.json
t2i/sd1.5/fantasticmix_v10/unet/config.json
t2i/sd1.5/fantasticmix_v10/unet/diffusion_pytorch_model.bin
t2i/sd1.5/fantasticmix_v10/vae/config.json
t2i/sd1.5/fantasticmix_v10/vae/diffusion_pytorch_model.bin
t2i/sd1.5/majicmixRealv6Fp16/feature_extractor/preprocessor_config.json
t2i/sd1.5/majicmixRealv6Fp16/model_index.json
t2i/sd1.5/majicmixRealv6Fp16/model_index.json.bk
t2i/sd1.5/majicmixRealv6Fp16/scheduler/scheduler_config.json
t2i/sd1.5/majicmixRealv6Fp16/text_encoder/config.json
t2i/sd1.5/majicmixRealv6Fp16/text_encoder/pytorch_model.bin
t2i/sd1.5/majicmixRealv6Fp16/tokenizer/merges.txt
t2i/sd1.5/majicmixRealv6Fp16/tokenizer/special_tokens_map.json
t2i/sd1.5/majicmixRealv6Fp16/tokenizer/tokenizer_config.json
t2i/sd1.5/majicmixRealv6Fp16/tokenizer/vocab.json
t2i/sd1.5/majicmixRealv6Fp16/unet/config.json
t2i/sd1.5/majicmixRealv6Fp16/unet/diffusion_pytorch_model.bin
t2i/sd1.5/majicmixRealv6Fp16/vae/config.json
t2i/sd1.5/majicmixRealv6Fp16/vae/diffusion_pytorch_model.bin
vae/sd-vae-ft-mse/.gitattributes
vae/sd-vae-ft-mse/README.md
vae/sd-vae-ft-mse/config.json
vae/sd-vae-ft-mse/diffusion_pytorch_model.bin
vae/sd-vae-ft-mse/diffusion_pytorch_model.safetensors