- We observe after training to a certain extent, the reward continues to increase, but the quality of the generated videos does not further improve. The model trickly learns some shortcuts (by adding artifacts in the background) to increase the reward (i.e., reward hacking).
- Currently, there is still a lack of suitable preference models for video generation. Directly using image preference models cannot evaluate preferences along the temporal dimension (such as dynamism and consistency). Further more, We find using image preference models leads to a decrease in the dynamism of generated videos. Although this can be mitigated by computing the reward using only the first frame of the decoded video, the impact still persists.
This model is sourced from an external transfer (transfer address:
https://huggingface.co/alibaba-pai/CogVideoX-Fun-V1.5-Reward-LoRAs/tree/main ),if the original author has objections to this transfer, you can click,
Appeal
We will, within 24 hours, edit, delete, or transfer the model to the original author according to the original author's request