My paper list on deep generative models

I am quite interested in generative models. So I think it's a good idea to summarize what I know so far about it and which papers I think are "must read" if you want to understand what's going on in the field. I haven't finish all of them though and this will also be a good chance to create a "checklist" -- I'll go back and tick those unfinished papers! Also this list can be very long if I include many recent advances so I'll just keep those with big ideas -- can do another list for them later!

 

Variational Auto-Encoders (VAE)

I think the VAE idea has revolutionized the approximate inference and graphical models field. Since neural networks are so successful as functional approximators, why not use them to compute joint/conditional distributions/factors/potential functions for a directed/undirected graphical model, as well as for the approximate posterior/marginal distribution/factors? Hundreds of papers have then built on this idea.

The must-read papers start from Kingma & Welling's Auto-Encoding Variational Bayes paper. Roughly the same time Deepmind came up with the same idea. Then the above two papers authors teamed up together and published a paper on semi-supervised learning. In this context a recent attempt which adds auxiliary variables also deserves a detailed read. I'm not going to recommend the variational Gaussian process paper for practitioners but essentially it contains similar ideas to the auxiliary DGM paper.

The original framework uses the variational lower-bound to do approximate MLE. IWAE improved on this by proposing a new objective function involving importance sampling. Has been tested on RNN versions as well. Shameless plug here: I have shown that IWAE does not return the best approximation to the marginal likelihood. Should also read a similar algorithm "reweighed wake-sleep" if you want to know more and probably you might need to check out the original wake-sleep paper.

VAEs are very flexible as you can imagine from the summary. So they can be easily extended to model sequential data. In this regime the variational RNN paper could be interesting and you might want to check out some related papers it cited as well.

RNNs + attention model also work well on images, and the DRAW paper is the one that combined them with the VAE framework. Similar papers on the experimental side include the one for image captioning and the convolutional version of DRAW.

There are papers that combine the VAE idea with invertible transformations as well. But I'll list them in later paragraphs just to make sure this part is not too long.

 

Generative Adversarial Networks (GAN)

GANs are even more flexible than VAEs. In short you want to train a powerful generative model by fooling another powerful discriminator that "the image I'm showing you is from the real data". I have a post that discussed the maths in detail and I also recommend reading a series of blog post by Ferenc Huszár.

Of course the must-read is the original paper. But the GAN idea has also been discussed in statistical inference context, which has been largely ignored and I think it shouldn't happen. A closely related idea is Noise Contrastive Estimation (NCE), and Ian Goodfellow also discussed the link and difference between both.

GAN is notoriously hard to train. Two experimental papers from Facebook & NYU (i.e. LAPGAN & DCGAN) are must-read for practitioners. I haven't read the recent proposal from OpenAI in detail though. But you should definitely try to train a GAN yourself to get a feeling about how to tune it -- I did so. I'll say this difficulty is one of the main reason that bias me towards VAEs, even when GANs have superior performance in image domain, but see this interesting paper that tries to combine the advantages from both side. Alternatively, the maximum mean discrepancy (MMD) and kernel two sample test have been considered, e.g. see this and this paper.

Some other interesting papers include adversarial auto-encoders (well this one is actually much easier to train), a DRAW-like version of GAN, and the one that includes some contextual information into the modeling process. I also like this paper that extend the GAN framework to f-divergence via duality. Using GAN principles to learn an inference network (and also the same idea here) seems promising, but my experience of playing them seems to suggest that this approach can be even more unstable than the original framework.

 

Generative models with invertible transformations

If you don't like approximate inference and want something exact, here's the part for you. I remember seeing this type of methods first from the NICE paper (nice name). The improved version is also on arxiv.

From my perspective the most interesting work in this regime is the combination of VAE and invertible transformations. I know this idea has been roughly around for 2-3 years (e.g. see this paper) but this ICML paper formalized it with lots of discussions. This year's NIPS has two interesting follow-up papers: one uses autoregressive functions and another tries to skip the randomness of transformations. Should consider blog posts for these papers later.

 

Deep Boltzmann machines & deep belief networks

These models are less considered in recent days, but I still love the beautiful derivations, and I think everyone who want to work on generative models should get to know these type of models. No need to introduce the seminal 2006 paper that took neural networks back to the stage. No need either to mention then people tried different variations of RBMs/deep belief networks for image/speech recognition. But then after the introduction of deep Boltzmann machines, people found that these models are very hard to train with contrastive divergence like algorithms. Pre-training as a DBN then switch to DBM training sort of worked and this paper provided quite a nice interpretation on that, but then in the fine-tuning step we still have this training difficulty problem. There's another pre-training technique that approximately performs variational inference, but you need to be careful about the rescaling part. The doubly intractable problem has scared lots of practitioners away from continue investigating DBMs.

On deep sigmoid belief network side, these two years Lawrence Carin's group has produced quite a few papers related to it. I haven't read them in detail though.

 

Other attempts

Although not very experimentally appealing, I found this diffusion process paper from Surya Ganguli's group an interesting read.

As Bayesian people have made Gaussian processes deep, there you go an attempt on using deep GP to generate images from Neil Lawrence's group. It uses VAE idea to derive inference steps.

David Blei and his students love conjugate models and SVI, so it's no surprising that they've come up with the deep version of exponential families.

If you like the Gibbs sampling procedure of RBMs for data generation, here's another direction to go deeper other than DBMs.

I found this paper's approach a bit strange but those who interested in wake-sleep should get to know it.

Auto-regressive models are also an important type of generative models. For neural network assisted version, must read NADE (and subsequence papers). This year's ICML best paper (pixel RNN) also belongs to this class.

Finally this note on how to evaluate the performance of generative models deserves a detailed read.