In Short:
Researchers from MIT and Google have developed a digital diffusion model called Alchemist that can alter the material properties of objects in images. Alchemist allows users to adjust attributes like roughness, metallicity, albedo, and transparency of images. The system, based on a denoising diffusion model, outperforms other methods in changing material properties. The research could benefit video games, AI in visual effects, and robotic training. The model can refine robotic training and image classification data. Although Alchemist has limitations like inferring illumination, it opens new possibilities for refining 3D assets and inferring material properties from images. The project was supported by Google, Amazon, and NSF.
Researchers from the MIT Computer Science and Artificial Intelligence Laboratory (CSAIL) and Google Research have developed a system named Alchemist that can alter the material properties of objects in images. This system allows users to adjust four attributes – roughness, metallicity, albedo, and transparency – in both real and AI-generated pictures. By using a continuous scale of -1 to 1, users can create a new visual by modifying these properties. The application of this technology could enhance models in video games, advance AI capabilities in visual effects, and improve robotic training data.
Magic Behind Alchemist
The system is based on a denoising diffusion model called Stable Diffusion 1.5, which is praised for its photorealistic results and editing capabilities. Unlike previous models which focused on higher-level changes, Alchemist emphasizes low-level attributes to revise the fine details of an object’s material properties. This unique slider-based interface sets it apart from other similar models.
Precise Control
According to Prafull Sharma, lead author of the study, Alchemist allows precise control over the material properties of an input image. This is a significant advancement as it enables users to modify specific properties after the initial image is provided. The method could benefit robotic training data, image classification, and incorporate generative models into existing content creation software interfaces.
Key Findings
Comparisons show that Alchemist outperforms other models in faithfully editing requested objects with high accuracy. Despite using synthetic data instead of real data, the model was preferred in a user study for its photorealistic results. However, it has limitations in correctly inferring illumination and generating physically implausible transparencies.
Future Implications
The researchers aim to improve Alchemist’s ability to enhance 3D assets for graphics at the scene level and infer material properties from images, unlocking new possibilities for linking visual and mechanical traits of objects in the future.
This collaborative project involved researchers from MIT and Google Research, supported by grants from the National Science Foundation, Google, and Amazon. The work will be presented at CVPR in June.