Add three images, draw a mask to define the reference area, and you can get a scene generated by the fusion of the three.
If you want a better effect, you can provide detailed descriptions of each reference.
For example: white hoodie, bright scene, and other detailed descriptions~