Non-Mega (v10) vs Mega
Is anyone else finding v10 to be superior to Mega in some cases? For me I've noticed that v10 just produces more realistic and more varied motion than Mega, at least for no or simple prompts.
Are you referring to Mega v3? I2V or T2V? What use case is "no prompts"? Are you using recommended samplers? Are you using the different "Mega" workflow?
Currently, Mega V2 NSFW and the "long video" workflow that was posted here yesterday but the same happens with the standard Mega workflow. I think it's boiling down to the amount of specific prompting that is needed. With non-Mega I can achieve very good results with very simple prompts. With Mega it seems like it needs a lot of specificity and experimentation with the prompts and even then the movement is too repetitive and simple compared to what I see in v10. Also, in Mega it seems like people are often times continually opening and closing their mouth. I don't see that with v10.
I'll try to create a v10 vs Mega sample as an example.
Currently, Mega V2 NSFW and the "long video" workflow that was posted here yesterday but the same happens with the standard Mega workflow. I think it's boiling down to the amount of specific prompting that is needed. With non-Mega I can achieve very good results with very simple prompts. With Mega it seems like it needs a lot of specificity and experimentation with the prompts and even then the movement is too repetitive and simple compared to what I see in v10. Also, in Mega it seems like people are often times continually opening and closing their mouth. I don't see that with v10.
I'll try to create a v10 vs Mega sample as an example.
Wow,so, how about your result of these 2 different versions?
So it does appear to be a problem specifically with the long video workflow. The motion that is produced is just not good and does not follow the prompt well. Here is an example (the prompt is "the woman puts the popsicle into her mouth and begins sucking on it"):
NSFW V10:
Mega NSFW V3:
Mega NSFW V3 "long video":
For MEGA you have to work with more detailed and longer prompts as it seems. Did not much testing with Mega myself but it feels like this:
Mega: Follows your prompt and when that is done and there are still frames/seconds left it does not know what to do and does shit ^^.
V9-10: Follows your "short" prompt and than kind of guess what to do next when there are frames/seconds left.
I think V10 was definitely better than Mega. I don't get the same quality as Mega V3 when I run both with the same prompt (I2V, T2V NSFW versions).
There should be further development with AiO v10+.
I agree with matixxx. I almost exclusively do I2V and V10 is excellent. Mega I get too much face shifting, strange oversaturation, not following prompts, and I also have to turn the res down because it uses more resources. It must be others are looking for different results. For example, people rave about Vace but for what I'm doing it just isn't a good fit.
Glad I'm not the only one that has noticed differences, I thought I was doing something wrong. Has anyone found a good workflow for making long videos with v10?
I have a long running very well working WF for V1-V10. It needs some cleaning and fine tune before I can release it here. I think it should be done until friday. Not much time to work on it until then.
I think V10 was definitely better than Mega. I don't get the same quality as Mega V3 when I run both with the same prompt (I2V, T2V NSFW versions).
There should be further development with AiO v10+.
As I said in the other thread, there isn't anything more to do with "v10" to make a "v11". v10 is still very capable and if you get the results you want with it, please continue to use it. There is nothing wrong with using v10. "Mega" just opens up more possibilities with VACE in 1 model, which is the new pathway worth tuning. It also saves a ton of time only maintaining and tuning 1 base model.
I believe most of the issue is, VACE works with the T2V model, so there isn't an I2V model which likely does better keeping details from the initial frame.
However, have you guys tried using the "reference image" of the starting image with Mega? That might help improve facial consistency. I only skip using it because it requires generating 4 extra "junk" latent frames and you have to use the "TrimLatents" node to get rid of them after the KSampler.
@Phr00t If you wire the starting image to reference image it is not what that is intended to be used for. I explained it somewhere already. The vace interface is very complicated. What happends when you do so is that yes, the character is more consistent but the AI is always in a situation to follow your prompt but at the same time keeping every frame look like the reference frame. So you get animations but everything is "magneting back" to the reference image.
With an additional reference image, things don't get any better.
Immediately after the first frame, the character consistency is lost. It seems that the prompt is primarily followed.
The same input in V10 maintains consistency.
Perhaps in a future Mega version, an additional variable could be used to tell the model that it's an I2V request ...?
With an additional reference image, things don't get any better.
Immediately after the first frame, the character consistency is lost. It seems that the prompt is primarily followed.
The same input in V10 maintains consistency.Perhaps in a future Mega version, an additional variable could be used to tell the model that it's an I2V request ...?
I'm at the mercy of the models and their capabilities. If I find something to mix in to improve "Mega" consistency, I'll add it. Also, I presume there may be NSFW LORAs that are causing some facial shifting if that is the merge you are using. Perhaps using the base "Mega" and adding fewer LORAs that you need might help. Or you can stick with v10 for now, which has a dedicated I2V model.
@Phr00t If you wire the starting image to reference image it is not what that is intended to be used for. I explained it somewhere already. The vace interface is very complicated. What happends when you do so is that yes, the character is more consistent but the AI is always in a situation to follow your prompt but at the same time keeping every frame look like the reference frame. So you get animations but everything is "magneting back" to the reference image.
I am aware of this, just trying to provide options for people who may be OK with tradeoffs to improve consistency.