FaceFusion语音+视频口型同步功能,本地安装升级详细步骤。
Summary
TLDRThis video explores the latest updates to FaceFusion, a tool that now enables video characters to speak by syncing audio with lip movements. It also features an improved face recognition model, YOLOface, enhancing accuracy in low light and for side or down-turned faces. The tutorial guides users through upgrading from older versions, including updating files, dependencies, and ensuring GPU acceleration. It demonstrates how to use the new voice-driven lip-syncing feature and provides tips on optimizing settings for better performance. Additionally, it discusses the nuances of using YOLOface for face swapping in various scenarios.
Takeaways
- 😀 FaceFusion now has a new feature that allows video characters to speak by synchronizing voice and lip movements using a provided audio and a frontal video.
- 😎 The update includes a new YOLOface recognition model, which significantly improves accuracy in poor lighting and for low-angle and side-face shots.
- 🛠️ To upgrade from an older version (2.2.1 or earlier), users need to use Git commands and update dependencies, ensuring they have the correct CUDA version for NVIDIA GPUs.
- 🌐 Users in Mainland China are advised to use a proxy for smoother updates and model downloads due to potential connectivity issues.
- 💻 After upgrading, users can verify the success by checking the version number and ensuring the new features appear on the interface.
- 🎨 The new version supports manual download of models if automatic download is slow, by accessing the official model list and placing them in the models directory.
- 🚀 A one-click run script can be created to simplify the launch process by modifying a text file into a batch file with the necessary run commands.
- 🗣️ The voice-driven lip-sync feature works best with English audio and mid-shot videos, though it can also handle songs, though results may vary.
- 🔍 The YOLO model offers better performance in recognizing multiple faces in a scene compared to the previous RetinaFace model, with optimizations for various angles and lighting.
- 🛠️ For complex facial scenes, users may need to experiment with different model and mask combinations to achieve optimal results, as some extreme angles may still pose challenges.
- 🎵 The script concludes with a demonstration of the new model’s effectiveness on a video with many low-angle and side-face shots, inviting viewers to spot any imperfections.
Q & A
What are the new features introduced in the latest update of FaceFusion?
-The latest update of FaceFusion includes the ability to make video characters speak by syncing provided audio with their lip movements, and a new facial recognition model that improves accuracy in low light conditions and for identifying faces in low angles or side profiles.
What are the prerequisites for upgrading FaceFusion from an older version?
-To upgrade FaceFusion, you must have successfully installed version 2.2.1 or earlier using the git tool for manual installation. If you haven't installed any version before, you can refer to the video to install the latest version directly.
What steps are involved in upgrading FaceFusion using git?
-The steps include: 1) Navigating to the FaceFusion installation directory and opening the command window. 2) Running the command 'git pull' to update the program files. 3) Activating the virtual environment. 4) Updating the installed dependencies using a specific command. 5) Verifying the upgrade by running the program and checking the version number.
Why is it recommended to use a proxy when updating FaceFusion in Mainland China?
-Using a proxy is recommended to ensure smooth updates, especially when downloading dependencies and models, as it can help bypass potential network restrictions and improve download speeds.
How can users enable GPU acceleration after upgrading FaceFusion?
-To enable GPU acceleration, users need to check the CUDA version used in the original installation (e.g., cu118 for CUDA 11.8) and then run the 'python install.py' command, selecting the appropriate CUDA version during the installation process.
What is the purpose of the YOLOface model in FaceFusion?
-The YOLOface model is designed to improve facial recognition accuracy, especially in challenging conditions such as low light, low angles, and side profiles. It is more efficient and accurate compared to the previous RetinaFace model.
How can users manually download and install the new models if the automatic download is slow?
-Users can visit the official project's model list page, download the 5 new models for version 2.3, and then copy them into the 'models' directory of the FaceFusion installation folder.
What are the recommended settings for using the voice-driven lip-syncing feature in FaceFusion?
-For voice-driven lip-syncing, it is recommended to use the YOLO model for facial recognition, enable facial enhancement to improve lip clarity, and adjust the thread count to optimize processing speed based on GPU memory.
How does the new YOLO model perform compared to the old RetinaFace model in different scenarios?
-The YOLO model is faster and more accurate in general facial recognition tasks, especially for low angles and side profiles. However, it may perform slightly worse with closed masks compared to RetinaFace. Users should choose the model based on the specific requirements of their video content.
What are some limitations of FaceFusion when dealing with complex facial angles?
-FaceFusion may struggle with extremely complex facial angles, such as when the head is turned close to 90 degrees. In such cases, the model may produce deformed results or fail to handle the facial features correctly.
How can users create a one-click script to run FaceFusion easily?
-Users can create a one-click script by creating a new text file in the FaceFusion installation directory, renaming it with a '.bat' extension, and pasting the appropriate command inside the file. This allows them to run FaceFusion directly by double-clicking the script.
Outlines
🚀 Upgrading Facefusion and Exploring New Features
This paragraph provides a comprehensive guide on upgrading Facefusion to version 2.3.0 and introduces new features such as voice-driven lip-syncing and an improved YOLOface recognition model. The process involves using Git commands to update the software, activating a virtual environment, and updating dependencies. It also highlights the importance of checking for GPU acceleration and manually downloading models if necessary. Additionally, it explains how to create a one-click script to simplify the launch process. The new features include voice-driven lip-syncing, which requires selecting the appropriate options and adjusting parameters like thread count for optimal performance. The paragraph also discusses the improved YOLOface model's accuracy in low-light conditions and its ability to handle different facial orientations.
🔍 Detailed Usage of New Features and Model Comparisons
This paragraph delves into the detailed usage of the new features in Facefusion, focusing on voice-driven lip-syncing and the YOLOface model. It explains the importance of adjusting parameters such as thread count to improve video processing speed and notes a potential bug with direct input values. The paragraph also highlights the YOLOface model's performance, comparing it with the previous RetinaFace model in terms of speed, memory usage, and accuracy in various scenarios. It provides specific recommendations for using different models and masks based on the video content, such as using YOLO with face enhancement for voice-driven lip-syncing and RetinaFace with closed masks for heavily occluded faces. The paragraph concludes by emphasizing the need for multiple attempts to achieve the best results, especially for complex facial scenes, and notes limitations in handling extreme head rotations.
🎵 Conclusion and Demonstration of Model Effectiveness
This paragraph concludes the video script by inviting viewers to listen to a music segment and observe the effectiveness of the new model in handling a video with numerous low-angle and side-face shots. It challenges viewers to spot any imperfections in the face-swapping results and expresses confidence in the new model's capabilities. The paragraph also includes a promotional statement about Facefusion being a next-generation face swapper and enhancer, emphasizing its superiority over other tools. It ends with a thank-you note to the viewers and a hint for future content.
Mindmap
Keywords
💡Facefusion
💡Speech-Driven Lip-Syncing
💡YOLOface
💡CUDA Acceleration
💡Virtual Environment
💡Dependency Packages
💡Git
💡Digital Human Videos
💡Face Enhancement
💡Model Download
💡Batch Script
Highlights
Facefusion now supports not only face swapping but also lip-syncing with voice input.
The new version includes a YOLOface model, improving face recognition accuracy in poor lighting and for low-angle or side faces.
Upgrade from previous versions requires git pull, activating virtual environment, and updating dependencies.
For Nvidia GPU acceleration, additional steps are needed to confirm and update CUDA versions.
New models will be automatically downloaded on first use, but manual download is recommended for faster setup in China.
A one-click run script can be created to simplify the program launch process.
Voice-driven lip-syncing works best with English and in mid-shot videos rather than close-ups.
The YOLO model offers faster face replacement speed compared to the previous RetinaFace model.
The new YOLO model includes age and gender prediction features.
Face replacement with YOLO model is recommended for videos with unobstructed faces and box masks.
For videos with partially obstructed faces and many low-angle shots, YOLO with box masks is suggested.
For heavily obstructed faces, the old RetinaFace model with closed masks is recommended.
The new model struggles with extreme head rotations close to 90 degrees.
The new version's YOLO model has optimized face recognition with additional feature points.
Despite optimizations, YOLO's performance may decrease when using closed masks.
Transcripts
Hello!大家好!
欢迎来到AI探索与发现
facefusion又有新功能了
现在不仅可以换脸
还能让视频里的人开口说话
只要你提供一段语音
和一个正脸拍摄的视频 经过融合之后
就能实现语音和口型的同步
低成本视频数字人又多了一个可选方案
这次更新另一个重大提升是
加入了新的人脸识别模型
大大提高了在光线不好的情况下
人脸识别的准确性
特别是识别低头 侧脸的画面时更加精准
今天视频就来详细介绍
如何从老版本升级
并演示新增功能的使用和效果
从老版本升级的前提是
已经成功安装过2.2.1或者更早的版本
并且是使用git工具克隆的方式手动安装的
如果你没有安装过任何版本
也可以参考这期视频
直接安装最新的版本
首先到安装的facefusion目录下
比如我的安装目录是D:\AI\facefusion
路径栏输入CMD打开命令窗口
输入第一条指令 git pull
这里提醒一下 中国大陆的朋友
如果能科学上网 建议先打开代理再更新
成功更新程序文件以后
输入第2条指令 激活创建的虚拟环境
然后输入第3条指令
更新安装的依赖包
更新依赖包的过程如果没有出错
那程序升级就顺利完成了
最后我们来运行程序
验证升级是否成功
输入运行指令
并加上这个跳过下载的参数
网页里输入本地运行地址
操作主界面能成功打开
这里版本也显示为2.3.0
这个选项就是新增的语音驱动口型的功能
拉到底部
也可以看到新增的人脸识别模型YOLOface
整个操作界面看上去
好像已经升级成功了
但仔细观察一下
这里的选项是不对的
没有显卡加速
因此如果你的是英伟达显卡
还要再进行升级的最后一步操作
切回到命令窗口
按Ctrl+C终止运行
先输入 pip list
查看一下原来安装时用的CUDA版本
只需要确认torch这一行
看加号后面的字符标识
这里显示是cu118
代表我原来版本安装的是cuda11.8的Pytorch
确认好以后
再输入这条指令 python install.py
这里会出现第一个安装选项
要我们选择CUDA版本
前面已经确认过cuda是11.8
所以选最后一个
然后出现第二个选项
也同样选择
跟之前安装时一样的cuda版本
选好以后回车
安装程序就会自动更新相关的依赖包
同样这里的更新
在中国大陆的朋友也需要科学上网
整个更新过程不需要干预
更不要用鼠标点击命令窗口
只需要耐心等待全部更新完成
整个过程如果没有出现错误
就代表更新成功了
输入运行指令
刷新一下网页
现在这里就能看到cuda选项了
新版本增加的模型
会在第一次使用时自动下载
比如现在选择语音驱动口型的功能
程序如果在本地找不到相关模型
就会先下载
在命令窗口可以看到下载进度
不过在中国大陆的朋友
这里即使开了科学上网
下载的速度也非常慢
解决办法是手动下载模型
打开官方项目的模型列表页面
红框标识的就是2.3版本新增的模型
总共有5个
下载好以后 我们把它剪切粘贴到
facefusion安装目录下的models目录里
到这里新版本升级和模型的更新就全部完成了
重新运行程序就能使用新版本了
如果嫌每次输入运行指令太麻烦
这里跟大家介绍 一个创建一键运行脚本的方法
首先到facefusion安装目录下
新建一个文本文件
然后把它重新命名
比如叫一键启动
双击打开它
复制这段指令粘贴过来
保存关闭
最后把这个文件的后缀TXT改成BAT
如果在你系统上看不到TXT后缀
可以点击这里的[查看]
勾选上[文件扩展名]就能看到了
现在双击一键启动
就能直接运行facefusion了
下面来介绍新增的两项功能
首先是用语音驱动视频里的人物口型
先勾选这个选项
然后去掉默认的换脸选项
因为口型同步和换脸功能是不能同时使用的
把准备好的语音文件拖放到这里
我就用这段8秒的语音来测试
然后再把视频拖进来
勾选上cuda加速
调整线程数量
这个参数很重要
直接影响到视频的处理速度
不过程序在这里有一点bug
有时直接输入值会无效
建议用后面的上下按钮来调整大小
一般最大可以调整到显存的两倍
比如8G显存最大可以调到16
下面的参数基本不用改了
默认值就是最佳设置
然后是右边参数
可以看到这里默认的人脸识别模型是YOLO
这个模型在deepfacelive这一期视频有过介绍
全称叫You Only Look Once(你只需看一次)
谷歌出品
无论是识别速度还是准确性都非常优秀
使用语音驱动口型功能时建议就选它
其他所有参数也不用改默认就好
不过在预览这里可以明显看到
使用音频驱动口型以后
嘴巴部分变得非常模糊
所以建议再加上脸部增强
我们来替换一下看看效果
口型基本对上了
不过牙齿的效果不是太好
这里的语音如果换成歌曲也是可以的
融合以后是这样的效果
这个功能我测试下来
发现使用英文语音的效果明显好于中文
如果视频拍摄的是中景
替换以后的唇部动作
要比脸部特写的视频更加自然
最后来介绍下使用新的YOLO模型
换脸时的一些操作细节
左侧参数的设置跟之前版本一样
没什么变化
添加人脸增强
设置cuda加速
去掉默认的CPU选项
增加线程数量提高换脸速度
右侧这个参数
人脸选择模式这里有细微变化
如果视频画面中
同时出现两个或两个以上的人脸时
只替换其中一个人面部
建议使用[one模式]来指定
默认的YOLO模型
在画面中出现多张人脸时
这里如果使用默认的第一个reference模式
指定替换人脸有时会不起作用
我们来替换一下看看效果
新模型在替换速度上
稍快于之前版本的retinaface
另外 替换时显存的使用率也有所提高
当设置了12个线程时
8G显存接近拉满
替换完成
在不使用封闭遮罩的情况下
YOLO模型的整体效果
要好于老版本的retinaface
主要是新版本在人脸识别上做了很多优化
我们可以打开调试选项
就能看到这些变化
包括增加了5个特征点的人脸识别
原来是68个特征点
虽然特征点越多
在识别正面人脸时更加精确
但识别低头侧脸
或者光线不好时的人脸
就容易误判
另外这次还增加了人脸的年龄预测和性别预测功能
不过新的YOLO在配合封闭遮罩时
感觉整体性能有所下降
所以可以根据要替换的视频内容
来灵活选择不同的人脸识别模型
以下是几种场景下的模型和遮罩组合的推荐
如果使用语音驱动视频功能
人脸识别模型就使用YOLO加人脸增强
换脸的情况下
如果视频中人脸没有遮挡
推荐使用YOLO加box遮罩
如果视频中人脸有大量遮挡
推荐使用老版本retinaface加封闭遮罩
如果视频中人脸只有少量遮挡
但存在大量低头侧脸的画面
推荐使用YOLO加box遮罩
以上推荐则是基于当前版本的测试
可能以后新版本会有所变化
实际操作时对于复杂的面部镜头
还是要多次尝试才能达到最佳的效果
当有些面部镜头超出了基于2D人脸识别
和交换模型的处理范围时
比如这个视频
在头部转向接近90度时
模型就无法处理了
无论如何调整参数
重绘以后的面部都会存在严重变形
甚至出现多个眼睛
最后来请大家听一段音乐
这个视频里有大量低头侧脸的画面
但角度还没有超过模型的处理范围
使用新版模型替换之后
不知道你能否找到其中的破绽
好 今天的分享就到这里
感谢观看 咱们下期见
你的辫子长长
你的眼睛亮亮
我的心儿慌慌
我的大脑缺氧
fusion is a face swapper and enhancer of the next generation
face fusion is the best face swapper out there
5.0 / 5 (0 votes)