Make Small Animations with Modelscope
Introduction: This article takes you to use ModelScope to convert the actual video into animation. The basic principle is to decode the video into an image, use the portrait cartoon model to cartoonize the video frame by frame, and then combine multiple frames of images into a video to complete the animation. generate
Platform overview
The ModelScope (https://modelscope.cn/#/models) platform released by Alibaba recently aims to create an open-source model-as-a-service sharing platform to provide pan-AI developers with flexible, easy-to-use, and low-cost one-stop model services products, making model application easier. The model richness on this platform is OK. There are currently 138 models, of which 55 can experience the effect through online demo, and 4 can support finetune (finetune needs to be strengthened).
Introduction to cartoon models
Open the model library, what catches your eye is the cartoon model of the portrait. I have to say that this page is quite beautiful. I don’t know how the actual effect of the model is. Is it just superficially well done? Then take the first model. Come and try the knife handle.
image.png
Looking at the conversion effect of the animation on the model page is still very good, I can't help thinking, using you to make a cartoon of the video, can I generate animation? Wouldn't it be nice to cartoonize the background if there were no people in the video? There are various frames of varying quality in the video, which can also be used to test various corner cases of the model. Without further ado, let's get started.
I will not repeat the basic use and environment construction of ModelScope. You can refer to the documents yourself:
https://modelscope.cn/#/docs/%E5%BF%AB%E9%80%9F%E5%BC%80%E5%A7%8B
Video cartoonization
First of all, we need to understand the difference between video and picture. Video generally has two parts: video and audio track. The video part can be understood as several consecutive pictures. If we can extract these pictures, use the model to make cartoons frame by frame. Generate a new image, and then synthesize the generated image into a video to get a small animation.
Therefore, we can decompose the video cartoonization into the following steps:
1. Video decoding
2. Batch image cartoonization
3. Video synthesis
4. Track Recovery
Below we will introduce the implementation step by step, and give the completed executable python code at the end of this article
video decoding
First, we use opencv to decode the video, and store the decoded image in frames
video = cv2.VideoCapture(video_file)
if (video.isOpened() == False):
print("Error reading video file")
# Read frame
frames = []
i = 0
while(video.isOpened()):
i += 1
# Capture frame-by-frame
if(i%10):
print(f'loading {i} frames')
ret, frame = video.read()
if ret == True:
frames.append(frame)
else:
break
# When everything done, release the video capture object
video.release()
print('loading video done.')
Batch image cartoonization
First, initialize the cartoonized pipeline
img_cartoon = pipeline('image-portrait-stylization', model='damo/cv_unet_person-image-cartoon_compound-models')
Looking at the documentation example, the pipeline supports the input of image file names. I don't know if it supports the direct input of image data. I tried it myself, and then I used the official pipeline documentation (https://modelscope.cn/#/docs/% E6%A8%A1%E5%9E%8B%E7%9A%84%E6%8E%A8%E7%90%86Pipeline), I think it should support the input of multiple pictures. , I am proud of my little wit, and I have to say that the details of the official documentation are still not in place.
Finally, the following code is used to complete the cartoonization of batch pictures, and the cartoonized pictures are stored in result_frames
results = img_cartoon(frames)
result_frames = [r['output_img'] for r in results]
In addition, I have to complain, the local gpu inference of multiple pictures is really slow, and later I found out that all of them are using cpu.
video synthesis
Finally, multiple pictures are synthesized into videos through opencv. It needs to be emphasized here:
The size of the output image is different from the input image, so when setting the output video size, the original input video size cannot be used. This place made me debug for half an hour.
frame_height, frame_width, _ = result_frames[0].shape
size = (frame_width, frame_height)
# FutureWarning: `rcond` parameter will change to the default of machine precision times ``max(M, N)`` where M and N are the input matrix dimensions.
# To use the future default and silence this warning we advise to pass `rcond=None`, to keep using the old, explicitly pass `rcond=-1`.
# r, _, _, _ = lstsq(X, U)
for idx in range(len(result_frames)):
result_frames[idx] = result_frames[idx].astype(np.uint8)
print(f'saving video to file {out_file}')
out = cv2.VideoWriter(out_file,cv2.VideoWriter_fourcc(*'mp4v'), fps, size)
for f in result_frames:
out.write(f)
out.release()
print(f'saving video done')
In addition, another problem was found when saving the video. Some video frame conversions will print the following logs, resulting in the output picture is no longer unint8, but float32 type, so add the logic of forcing each frame picture to be converted to uint8
FutureWarning: `rcond` parameter will change to the default of machine precision times ``max(M, N)`` where M and N are the input matrix dimensions.
To use the future default and silence this warning we advise to pass `rcond=None`, to keep using the old, explicitly pass `rcond=-1`.
r, _, _, _ = lstsq(X, U)
track restoration
opencv does not have the function of extracting audio tracks, audio tracks and video synthesis, we use MoviePy (https://zulko.github.io/moviepy/) to do it.
First install MoviePy
pip install ffmpeg moviepy
Extract the original audio track with moviepy
import moviepy.editor as mp
audio_file = 'out.mp3'
my_clip = mp.VideoFileClip(video_file)
my_clip.audio.write_audiofile(audio_file)
Read synthetic video and raw audio tracks to generate animations with sound
from moviepy.editor import VideoFileClip, AudioFileClip
# loading video dsa gfg intro video
clip = VideoFileClip(out_tmp_file)
# loading audio file
audioclip = AudioFileClip(audio_file)
# adding audio to the video clip
videoclip = clip.set_audio(audioclip)
videoclip.write_videofile(out_file)
# save to gif
# videoclip.write_gif(out_gif_file)
Show results
Trees on Gulangyu Island - Detailed Comparison
I think this video of the tree has the best effect among the several videos I tried. Although there is no task, the effect looks very good. It seems that the cartoon background part will have a better effect on the pictures with denser textures and various colors. .
https://www.bilibili.com/video/BV1fV4y1W7sp/?vd_source=372c0a968e69a0f0a2113d88112f87eb
the sea
The effect of the sea is okay, and the seagulls can also see it. Note that this is a cartoon background without a portrait, which looks and is acceptable.
https://www.bilibili.com/video/BV1Ba411P7vB/?vd_source=372c0a968e69a0f0a2113d88112f87eb
beach
The conversion effect of this video is not very good. There is no strong light on the child's face in the original video. I don't know why the face appears strange after cartoonization.
https://www.bilibili.com/video/BV1GY4y1c7fi/
problem sorting
1. The model has been in the model loading process after uploading the image, which has been fixed after feedback from the user group
2. It seems that the gpu has not been used. Although the video memory is occupied, the cpu utilization rate is very high when the cpu is used for calculation.
3. The bad case on the child's face in the third beach video needs to be located and solved.
full code
Instructions:
Modify the video_file variable to point to your input video path
Modify the out_file variable to specify the output video path
python run the following code
import cv2
import numpy as np
from modelscope.hub.snapshot_download import snapshot_download
from modelscope.pipelines import pipeline
from moviepy.editor import VideoFileClip, AudioFileClip
import logging
logging.basicConfig(level=logging.INFO)
img_cartoon = pipeline('image-portrait-stylization', model='damo/cv_unet_person-image-cartoon_compound-models')
video_file = 'apps/gulangyu-tree.mp4'
out_file = 'apps/gulangyu-tree_out.mp4'
out_tmp_file = 'video_tmp.mp4'
audio_file = 'audio_tmp.mp3'
my_clip = VideoFileClip(video_file)
my_clip.audio.write_audiofile(audio_file)
logging.info('save audio file done')
logging.info(f'load video {video_file}')
video = cv2.VideoCapture(video_file)
fps = video.get(cv2.CAP_PROP_FPS)
if (video.isOpened() == False):
logging.info("Error reading video file")
# Read frame
frames = []
i = 0
while(video.isOpened()):
i += 1
# Capture frame-by-frame
if(i %10):
logging.info(f'loading {i} frames')
ret, frame = video.read()
if ret == True:
# Display the resulting frame
frames.append(frame)
else:
break
# When everything done, release the video capture object
video.release()
logging.info('loading video done.')
results = img_cartoon(frames)
result_frames = [r['output_img'] for r in results]
# We need to set resolutions for writing video and convert them from float to integer.
frame_height, frame_width, _ = result_frames[0].shape
size = (frame_width, frame_height)
# FutureWarning: `rcond` parameter will change to the default of machine precision times ``max(M, N)`` where M and N are the input matrix dimensions.
# To use the future default and silence this warning we advise to pass `rcond=None`, to keep using the old, explicitly pass `rcond=-1`.
# r, _, _, _ = lstsq(X, U)
for idx in range(len(result_frames)):
result_frames[idx] = result_frames[idx].astype(np.uint8)
logging.info(f'saving video to file {out_tmp_file}')
out = cv2.VideoWriter(out_tmp_file,cv2.VideoWriter_fourcc(*'mp4v'), fps, size)
for f in result_frames:
out.write(f)
out.release()
logging.info(f'saving video done')
logging.info(f'merging audio and video')
# loading video dsa gfg intro video
clip = VideoFileClip(out_tmp_file)
# loading audio file
audioclip = AudioFileClip(audio_file)
# adding audio to the video clip
videoclip = clip.set_audio(audioclip)
videoclip.write_videofile(out_file)
# save to gif
# videoclip.write_gif(out_gif_file)
logging.info('finished!')
Platform overview
The ModelScope (https://modelscope.cn/#/models) platform released by Alibaba recently aims to create an open-source model-as-a-service sharing platform to provide pan-AI developers with flexible, easy-to-use, and low-cost one-stop model services products, making model application easier. The model richness on this platform is OK. There are currently 138 models, of which 55 can experience the effect through online demo, and 4 can support finetune (finetune needs to be strengthened).
Introduction to cartoon models
Open the model library, what catches your eye is the cartoon model of the portrait. I have to say that this page is quite beautiful. I don’t know how the actual effect of the model is. Is it just superficially well done? Then take the first model. Come and try the knife handle.
image.png
Looking at the conversion effect of the animation on the model page is still very good, I can't help thinking, using you to make a cartoon of the video, can I generate animation? Wouldn't it be nice to cartoonize the background if there were no people in the video? There are various frames of varying quality in the video, which can also be used to test various corner cases of the model. Without further ado, let's get started.
I will not repeat the basic use and environment construction of ModelScope. You can refer to the documents yourself:
https://modelscope.cn/#/docs/%E5%BF%AB%E9%80%9F%E5%BC%80%E5%A7%8B
Video cartoonization
First of all, we need to understand the difference between video and picture. Video generally has two parts: video and audio track. The video part can be understood as several consecutive pictures. If we can extract these pictures, use the model to make cartoons frame by frame. Generate a new image, and then synthesize the generated image into a video to get a small animation.
Therefore, we can decompose the video cartoonization into the following steps:
1. Video decoding
2. Batch image cartoonization
3. Video synthesis
4. Track Recovery
Below we will introduce the implementation step by step, and give the completed executable python code at the end of this article
video decoding
First, we use opencv to decode the video, and store the decoded image in frames
video = cv2.VideoCapture(video_file)
if (video.isOpened() == False):
print("Error reading video file")
# Read frame
frames = []
i = 0
while(video.isOpened()):
i += 1
# Capture frame-by-frame
if(i%10):
print(f'loading {i} frames')
ret, frame = video.read()
if ret == True:
frames.append(frame)
else:
break
# When everything done, release the video capture object
video.release()
print('loading video done.')
Batch image cartoonization
First, initialize the cartoonized pipeline
img_cartoon = pipeline('image-portrait-stylization', model='damo/cv_unet_person-image-cartoon_compound-models')
Looking at the documentation example, the pipeline supports the input of image file names. I don't know if it supports the direct input of image data. I tried it myself, and then I used the official pipeline documentation (https://modelscope.cn/#/docs/% E6%A8%A1%E5%9E%8B%E7%9A%84%E6%8E%A8%E7%90%86Pipeline), I think it should support the input of multiple pictures. , I am proud of my little wit, and I have to say that the details of the official documentation are still not in place.
Finally, the following code is used to complete the cartoonization of batch pictures, and the cartoonized pictures are stored in result_frames
results = img_cartoon(frames)
result_frames = [r['output_img'] for r in results]
In addition, I have to complain, the local gpu inference of multiple pictures is really slow, and later I found out that all of them are using cpu.
video synthesis
Finally, multiple pictures are synthesized into videos through opencv. It needs to be emphasized here:
The size of the output image is different from the input image, so when setting the output video size, the original input video size cannot be used. This place made me debug for half an hour.
frame_height, frame_width, _ = result_frames[0].shape
size = (frame_width, frame_height)
# FutureWarning: `rcond` parameter will change to the default of machine precision times ``max(M, N)`` where M and N are the input matrix dimensions.
# To use the future default and silence this warning we advise to pass `rcond=None`, to keep using the old, explicitly pass `rcond=-1`.
# r, _, _, _ = lstsq(X, U)
for idx in range(len(result_frames)):
result_frames[idx] = result_frames[idx].astype(np.uint8)
print(f'saving video to file {out_file}')
out = cv2.VideoWriter(out_file,cv2.VideoWriter_fourcc(*'mp4v'), fps, size)
for f in result_frames:
out.write(f)
out.release()
print(f'saving video done')
In addition, another problem was found when saving the video. Some video frame conversions will print the following logs, resulting in the output picture is no longer unint8, but float32 type, so add the logic of forcing each frame picture to be converted to uint8
FutureWarning: `rcond` parameter will change to the default of machine precision times ``max(M, N)`` where M and N are the input matrix dimensions.
To use the future default and silence this warning we advise to pass `rcond=None`, to keep using the old, explicitly pass `rcond=-1`.
r, _, _, _ = lstsq(X, U)
track restoration
opencv does not have the function of extracting audio tracks, audio tracks and video synthesis, we use MoviePy (https://zulko.github.io/moviepy/) to do it.
First install MoviePy
pip install ffmpeg moviepy
Extract the original audio track with moviepy
import moviepy.editor as mp
audio_file = 'out.mp3'
my_clip = mp.VideoFileClip(video_file)
my_clip.audio.write_audiofile(audio_file)
Read synthetic video and raw audio tracks to generate animations with sound
from moviepy.editor import VideoFileClip, AudioFileClip
# loading video dsa gfg intro video
clip = VideoFileClip(out_tmp_file)
# loading audio file
audioclip = AudioFileClip(audio_file)
# adding audio to the video clip
videoclip = clip.set_audio(audioclip)
videoclip.write_videofile(out_file)
# save to gif
# videoclip.write_gif(out_gif_file)
Show results
Trees on Gulangyu Island - Detailed Comparison
I think this video of the tree has the best effect among the several videos I tried. Although there is no task, the effect looks very good. It seems that the cartoon background part will have a better effect on the pictures with denser textures and various colors. .
https://www.bilibili.com/video/BV1fV4y1W7sp/?vd_source=372c0a968e69a0f0a2113d88112f87eb
the sea
The effect of the sea is okay, and the seagulls can also see it. Note that this is a cartoon background without a portrait, which looks and is acceptable.
https://www.bilibili.com/video/BV1Ba411P7vB/?vd_source=372c0a968e69a0f0a2113d88112f87eb
beach
The conversion effect of this video is not very good. There is no strong light on the child's face in the original video. I don't know why the face appears strange after cartoonization.
https://www.bilibili.com/video/BV1GY4y1c7fi/
problem sorting
1. The model has been in the model loading process after uploading the image, which has been fixed after feedback from the user group
2. It seems that the gpu has not been used. Although the video memory is occupied, the cpu utilization rate is very high when the cpu is used for calculation.
3. The bad case on the child's face in the third beach video needs to be located and solved.
full code
Instructions:
Modify the video_file variable to point to your input video path
Modify the out_file variable to specify the output video path
python run the following code
import cv2
import numpy as np
from modelscope.hub.snapshot_download import snapshot_download
from modelscope.pipelines import pipeline
from moviepy.editor import VideoFileClip, AudioFileClip
import logging
logging.basicConfig(level=logging.INFO)
img_cartoon = pipeline('image-portrait-stylization', model='damo/cv_unet_person-image-cartoon_compound-models')
video_file = 'apps/gulangyu-tree.mp4'
out_file = 'apps/gulangyu-tree_out.mp4'
out_tmp_file = 'video_tmp.mp4'
audio_file = 'audio_tmp.mp3'
my_clip = VideoFileClip(video_file)
my_clip.audio.write_audiofile(audio_file)
logging.info('save audio file done')
logging.info(f'load video {video_file}')
video = cv2.VideoCapture(video_file)
fps = video.get(cv2.CAP_PROP_FPS)
if (video.isOpened() == False):
logging.info("Error reading video file")
# Read frame
frames = []
i = 0
while(video.isOpened()):
i += 1
# Capture frame-by-frame
if(i %10):
logging.info(f'loading {i} frames')
ret, frame = video.read()
if ret == True:
# Display the resulting frame
frames.append(frame)
else:
break
# When everything done, release the video capture object
video.release()
logging.info('loading video done.')
results = img_cartoon(frames)
result_frames = [r['output_img'] for r in results]
# We need to set resolutions for writing video and convert them from float to integer.
frame_height, frame_width, _ = result_frames[0].shape
size = (frame_width, frame_height)
# FutureWarning: `rcond` parameter will change to the default of machine precision times ``max(M, N)`` where M and N are the input matrix dimensions.
# To use the future default and silence this warning we advise to pass `rcond=None`, to keep using the old, explicitly pass `rcond=-1`.
# r, _, _, _ = lstsq(X, U)
for idx in range(len(result_frames)):
result_frames[idx] = result_frames[idx].astype(np.uint8)
logging.info(f'saving video to file {out_tmp_file}')
out = cv2.VideoWriter(out_tmp_file,cv2.VideoWriter_fourcc(*'mp4v'), fps, size)
for f in result_frames:
out.write(f)
out.release()
logging.info(f'saving video done')
logging.info(f'merging audio and video')
# loading video dsa gfg intro video
clip = VideoFileClip(out_tmp_file)
# loading audio file
audioclip = AudioFileClip(audio_file)
# adding audio to the video clip
videoclip = clip.set_audio(audioclip)
videoclip.write_videofile(out_file)
# save to gif
# videoclip.write_gif(out_gif_file)
logging.info('finished!')
Related Articles
-
A detailed explanation of Hadoop core architecture HDFS
Knowledge Base Team
-
What Does IOT Mean
Knowledge Base Team
-
6 Optional Technologies for Data Storage
Knowledge Base Team
-
What Is Blockchain Technology
Knowledge Base Team
Explore More Special Offers
-
Short Message Service(SMS) & Mail Service
50,000 email package starts as low as USD 1.99, 120 short messages start at only USD 1.00