This is an incomplete of the appearance of various “likelihoodfree” inference methods. Please let me know if there are any mistakes or things I should add.
The methods listed perform statistical inference based on repeated model simulations rather than likelihood evaluations, which are expensive or impossible for some complex models. There are some other ways to avoid likelihood evaluations  e.g. empirical likelihood  which could also be thought as a “likelihoodfree”, so perhaps there’s a need for a more precise but equally catchy name in the future!
I’ve also avoided listing papers on selecting summary statistics for use in these methods. See Blum et al, 2013 and Prangle, 2015 for reviews.
1980s

1984 Diggle and Gratton on inference for implicit models. This paper also discusses some precursors in the 1970s which use adhoc likelihoodfree methods for particular applications.

1984 Rubin presents a likelihoodfree rejection sampling algorithm as an intuitive explanation of Bayesian methods, but not as a practical method.

1989 Simulated method of moments, McFadden (econometrics).
1990s

1992 GLUE, Beven and Binley (hydrology).

1993 Indirect inference, Gourieroux et al (econometrics).

1997 Approximate Bayesian computation (ABC) (population genetics). Early papers include Tavare et al and Fu and Li.
2000s

2003 ABCMCMC, Marjoram et al.

2006 Convolution filter, Campillo and Rossi.

2006 Iterated filtering, Ionides et al.

20072012 ABCSMC/PMC, Sisson et al, Beaumont et al, Toni et al, Del Moral, et al.
2010s

2010 Synthetic likelihood, Wood.

2010 Coupled ABC, Neal (epidemiology) based on utilising latent variables.

2015 Bayesian indirect likelihood, Drovandi et al.

2015 Classifierbased approaches: the random forest method of Pudlo et al and the likelihood ratio estimation method of Cranmer et al. A related but more expensive approach from 2014 is by Pham et al.

2015 Optimisation Monte Carlo, Meeds and Welling and the closely related reverse sampler of Forneron and Ng. Both exploit a latent variable formulation.

2016 Graham and Stokey use constrained Hamiltonian Monte Carlo to perform joint updates on parameters and latent variables conditioned on observations.

2016 Automatic variational ABC, Moreno et al, using latent variable draws in the estimation of loss function gradients.

2016 Papamakarios and Murray learn a mixture density network to predict the parameter posterior from observations.