FaceFusion语音+视频口型同步功能，本地安装升级详细步骤。

AI探索与发现

26 Feb 202412:00

Summary

TLDRThis video explores the latest updates to FaceFusion, a tool that now enables video characters to speak by syncing audio with lip movements. It also features an improved face recognition model, YOLOface, enhancing accuracy in low light and for side or down-turned faces. The tutorial guides users through upgrading from older versions, including updating files, dependencies, and ensuring GPU acceleration. It demonstrates how to use the new voice-driven lip-syncing feature and provides tips on optimizing settings for better performance. Additionally, it discusses the nuances of using YOLOface for face swapping in various scenarios.

Takeaways

😀 FaceFusion now has a new feature that allows video characters to speak by synchronizing voice and lip movements using a provided audio and a frontal video.
😎 The update includes a new YOLOface recognition model, which significantly improves accuracy in poor lighting and for low-angle and side-face shots.
🛠️ To upgrade from an older version (2.2.1 or earlier), users need to use Git commands and update dependencies, ensuring they have the correct CUDA version for NVIDIA GPUs.
🌐 Users in Mainland China are advised to use a proxy for smoother updates and model downloads due to potential connectivity issues.
💻 After upgrading, users can verify the success by checking the version number and ensuring the new features appear on the interface.
🎨 The new version supports manual download of models if automatic download is slow, by accessing the official model list and placing them in the models directory.
🚀 A one-click run script can be created to simplify the launch process by modifying a text file into a batch file with the necessary run commands.
🗣️ The voice-driven lip-sync feature works best with English audio and mid-shot videos, though it can also handle songs, though results may vary.
🔍 The YOLO model offers better performance in recognizing multiple faces in a scene compared to the previous RetinaFace model, with optimizations for various angles and lighting.
🛠️ For complex facial scenes, users may need to experiment with different model and mask combinations to achieve optimal results, as some extreme angles may still pose challenges.
🎵 The script concludes with a demonstration of the new model’s effectiveness on a video with many low-angle and side-face shots, inviting viewers to spot any imperfections.

Q & A

What are the new features introduced in the latest update of FaceFusion?
-The latest update of FaceFusion includes the ability to make video characters speak by syncing provided audio with their lip movements, and a new facial recognition model that improves accuracy in low light conditions and for identifying faces in low angles or side profiles.
What are the prerequisites for upgrading FaceFusion from an older version?
-To upgrade FaceFusion, you must have successfully installed version 2.2.1 or earlier using the git tool for manual installation. If you haven't installed any version before, you can refer to the video to install the latest version directly.
What steps are involved in upgrading FaceFusion using git?
-The steps include: 1) Navigating to the FaceFusion installation directory and opening the command window. 2) Running the command 'git pull' to update the program files. 3) Activating the virtual environment. 4) Updating the installed dependencies using a specific command. 5) Verifying the upgrade by running the program and checking the version number.
Why is it recommended to use a proxy when updating FaceFusion in Mainland China?
-Using a proxy is recommended to ensure smooth updates, especially when downloading dependencies and models, as it can help bypass potential network restrictions and improve download speeds.
How can users enable GPU acceleration after upgrading FaceFusion?
-To enable GPU acceleration, users need to check the CUDA version used in the original installation (e.g., cu118 for CUDA 11.8) and then run the 'python install.py' command, selecting the appropriate CUDA version during the installation process.
What is the purpose of the YOLOface model in FaceFusion?
-The YOLOface model is designed to improve facial recognition accuracy, especially in challenging conditions such as low light, low angles, and side profiles. It is more efficient and accurate compared to the previous RetinaFace model.
How can users manually download and install the new models if the automatic download is slow?
-Users can visit the official project's model list page, download the 5 new models for version 2.3, and then copy them into the 'models' directory of the FaceFusion installation folder.
What are the recommended settings for using the voice-driven lip-syncing feature in FaceFusion?
-For voice-driven lip-syncing, it is recommended to use the YOLO model for facial recognition, enable facial enhancement to improve lip clarity, and adjust the thread count to optimize processing speed based on GPU memory.
How does the new YOLO model perform compared to the old RetinaFace model in different scenarios?
-The YOLO model is faster and more accurate in general facial recognition tasks, especially for low angles and side profiles. However, it may perform slightly worse with closed masks compared to RetinaFace. Users should choose the model based on the specific requirements of their video content.
What are some limitations of FaceFusion when dealing with complex facial angles?
-FaceFusion may struggle with extremely complex facial angles, such as when the head is turned close to 90 degrees. In such cases, the model may produce deformed results or fail to handle the facial features correctly.
How can users create a one-click script to run FaceFusion easily?
-Users can create a one-click script by creating a new text file in the FaceFusion installation directory, renaming it with a '.bat' extension, and pasting the appropriate command inside the file. This allows them to run FaceFusion directly by double-clicking the script.

Outlines

00:00

🚀 Upgrading Facefusion and Exploring New Features

This paragraph provides a comprehensive guide on upgrading Facefusion to version 2.3.0 and introduces new features such as voice-driven lip-syncing and an improved YOLOface recognition model. The process involves using Git commands to update the software, activating a virtual environment, and updating dependencies. It also highlights the importance of checking for GPU acceleration and manually downloading models if necessary. Additionally, it explains how to create a one-click script to simplify the launch process. The new features include voice-driven lip-syncing, which requires selecting the appropriate options and adjusting parameters like thread count for optimal performance. The paragraph also discusses the improved YOLOface model's accuracy in low-light conditions and its ability to handle different facial orientations.

05:01

🔍 Detailed Usage of New Features and Model Comparisons

This paragraph delves into the detailed usage of the new features in Facefusion, focusing on voice-driven lip-syncing and the YOLOface model. It explains the importance of adjusting parameters such as thread count to improve video processing speed and notes a potential bug with direct input values. The paragraph also highlights the YOLOface model's performance, comparing it with the previous RetinaFace model in terms of speed, memory usage, and accuracy in various scenarios. It provides specific recommendations for using different models and masks based on the video content, such as using YOLO with face enhancement for voice-driven lip-syncing and RetinaFace with closed masks for heavily occluded faces. The paragraph concludes by emphasizing the need for multiple attempts to achieve the best results, especially for complex facial scenes, and notes limitations in handling extreme head rotations.

10:01

🎵 Conclusion and Demonstration of Model Effectiveness

This paragraph concludes the video script by inviting viewers to listen to a music segment and observe the effectiveness of the new model in handling a video with numerous low-angle and side-face shots. It challenges viewers to spot any imperfections in the face-swapping results and expresses confidence in the new model's capabilities. The paragraph also includes a promotional statement about Facefusion being a next-generation face swapper and enhancer, emphasizing its superiority over other tools. It ends with a thank-you note to the viewers and a hint for future content.

Mindmap

Keywords

💡Facefusion

Facefusion is a software tool that allows users to swap faces in videos and now also enables the synchronization of speech with lip movements. It is central to the video's theme as the entire script revolves around its new features and upgrades. The script mentions how Facefusion has evolved from just face swapping to incorporating speech-driven lip-syncing, making it a more versatile tool for creating digital human videos.

💡Speech-Driven Lip-Syncing

This refers to the new feature in Facefusion that allows the software to match the lip movements of a person in a video with a provided audio track. It is a significant update discussed in the video, enabling more realistic and engaging digital human videos. The script explains how users can now provide a voice clip and a frontal video, and the software will synchronize the speech with the lip movements, enhancing the capabilities of low-cost digital human video creation.

💡YOLOface

YOLOface is a new facial recognition model introduced in the latest version of Facefusion. It is designed to improve the accuracy of facial recognition, especially in challenging conditions such as poor lighting or when the face is in a low-angle or side profile. The script highlights how YOLOface enhances the software's ability to identify and replace faces more accurately compared to the previous model, RetinaFace, particularly in complex scenarios.

💡CUDA Acceleration

CUDA is a parallel computing platform and application programming interface (API) model created by NVIDIA. In the context of Facefusion, CUDA acceleration allows the software to leverage the power of NVIDIA GPUs to speed up processing tasks. The script mentions that users with NVIDIA GPUs need to perform an additional step to enable CUDA acceleration, which significantly improves the performance and speed of the software, especially when processing high-resolution videos.

💡Virtual Environment

A virtual environment is an isolated space where software dependencies can be managed without affecting the system-wide settings. In the script, the virtual environment is used to manage the dependencies required by Facefusion. The update process involves activating the virtual environment to ensure that all necessary packages are updated correctly, allowing the software to run smoothly with the new features.

💡Dependency Packages

These are software libraries and modules that Facefusion relies on to function properly. The script explains that after updating the software using 'git pull', users need to update the dependency packages. This step ensures that all the necessary components are compatible with the new version of Facefusion, allowing it to utilize its new features such as speech-driven lip-syncing and the YOLOface model.

💡Git

Git is a version control system used for tracking changes in source code during software development. In the context of this video, Git is used to manage the installation and updates of Facefusion. The script instructs users to navigate to the Facefusion directory and use the 'git pull' command to update the software, highlighting Git's role in keeping the software up-to-date with the latest features and improvements.

💡Digital Human Videos

These are videos that feature digital representations of humans, often created using software like Facefusion. The script discusses how Facefusion's new features, such as speech-driven lip-syncing, contribute to the creation of more realistic and engaging digital human videos. This concept is central to the video's theme as it showcases the potential applications of Facefusion in various fields, including entertainment, education, and social media.

💡Face Enhancement

Face enhancement is a feature in Facefusion that improves the visual quality of faces in videos. The script mentions that when using the speech-driven lip-syncing feature, the mouth area may appear blurry, so adding face enhancement can improve the overall visual quality. This feature is important for creating more realistic and visually appealing digital human videos, especially when dealing with low-quality source videos.

💡Model Download

In the context of Facefusion, model download refers to the process of obtaining the necessary pre-trained models required for the software to function properly. The script explains that the new version of Facefusion will automatically download the required models during the first use. However, due to potential slow download speeds in certain regions, users are advised to manually download and place the models in the appropriate directory. This ensures that the software can access the necessary components to utilize its new features effectively.

💡Batch Script

A batch script is a text file containing a series of commands that can be executed automatically. The script mentions creating a batch script to simplify the process of running Facefusion. By creating a batch file with the necessary commands, users can easily launch the software without having to manually enter the commands each time, making the usage of Facefusion more convenient and efficient.

Highlights

Facefusion now supports not only face swapping but also lip-syncing with voice input.

The new version includes a YOLOface model, improving face recognition accuracy in poor lighting and for low-angle or side faces.

Upgrade from previous versions requires git pull, activating virtual environment, and updating dependencies.

For Nvidia GPU acceleration, additional steps are needed to confirm and update CUDA versions.

New models will be automatically downloaded on first use, but manual download is recommended for faster setup in China.

A one-click run script can be created to simplify the program launch process.

Voice-driven lip-syncing works best with English and in mid-shot videos rather than close-ups.

The YOLO model offers faster face replacement speed compared to the previous RetinaFace model.

The new YOLO model includes age and gender prediction features.

Face replacement with YOLO model is recommended for videos with unobstructed faces and box masks.

For videos with partially obstructed faces and many low-angle shots, YOLO with box masks is suggested.

For heavily obstructed faces, the old RetinaFace model with closed masks is recommended.

The new model struggles with extreme head rotations close to 90 degrees.

The new version's YOLO model has optimized face recognition with additional feature points.

Despite optimizations, YOLO's performance may decrease when using closed masks.