3D-aware picture synthesis encompasses quite a lot of duties, reminiscent of scene technology and novel view synthesis from photos. Regardless of quite a few task-specific strategies, growing a complete mannequin stays difficult. On this paper, we current SSDNeRF, a unified strategy that employs an expressive diffusion mannequin to study a generalizable prior of neural radiance fields (NeRF) from multi-view photos of numerous objects. Earlier research have used two-stage approaches that depend on pretrained NeRFs as actual information to coach diffusion fashions. In distinction, we suggest a brand new single-stage coaching paradigm with an end-to-end goal that collectively optimizes a NeRF auto-decoder and a latent diffusion mannequin, enabling simultaneous 3D reconstruction and prior studying, even from sparsely obtainable views. At check time, we are able to straight pattern the diffusion prior for unconditional technology, or mix it with arbitrary observations of unseen objects for NeRF reconstruction. SSDNeRF demonstrates strong outcomes akin to or higher than main task-specific strategies in unconditional technology and single/sparse-view 3D reconstruction.