Research Context​​

​​Objective​​:

To verify whether a diffusion model can generate images that match target features when conditioned on two paired input features(initially simplified by using features extracted from the same source image).

  • •​​Ideal outcome​​: The model should generate images whose features align with the original image's features.
  • •​​Current setup​​: In real-world scenarios, the correspondence between features and original images is unknown. This experiment uses paired featuresas a simplified proxy.

​​Methodology​​

  • •​​Input​​: Two paired features extracted from the same image (e.g., Feature A + Feature B).
  • •​​Control​​: Constrain the generated image’s features to match the target feature (e.g., Feature A) via similarity metrics (L2 loss or feature-space distance).
  • •​​Model​​: Modified DDPM architecture with dual-feature conditioning.

​​Observations​​

  • 1.​​Pixel-space mismatch​​: Generated images show no visual resemblance to the original.
  • 2.​​Feature-space discrepancy​​: Re-extracted features from generated images exhibit low similarity to targets (e.g., cosine similarity ≈ 0.3).
  • ​​Attempted Solutions​​

    • •Adjusted loss weights (feature-matching loss vs. diffusion noise).
    • •Tested different noise schedules (linear vs. cosine).

    ​​Key Questions​​

    ​​Model Selection​​

  • 1.Is this task suitable for diffusion models?
  • 2.If proceeding with diffusion, are specialized conditioning mechanisms needed (e.g., cross-attention instead of FiLM)?
  • ​​Feature Representation​​

  • 1.Could feature-space disparities (e.g., scale mismatches between EfficientNet/ResNet features) hinder convergence? •Would feature normalization/disentanglement help?
  • ​​Requests for Advice​​

  • 1.​​Literature​​: Are there papers on diffusion models generating images from non-image features(e.g., paired embeddings)?
  • 2.​​Technical suggestions​​: •Loss function design (e.g., hybrid pixel/feature losses). •Architectural modifications (e.g., cross-attention, feature fusion strategies).
  • More Zheng Yilin's questions See All
    Similar questions and discussions