Using normal JPEG code streams, but trying to make the encoder & decoder as good as possible, you should do something like :
- RDO based structure; eg. encoder is given lambda and finds optimal R/D point. Unfortunately this has to be iterative
because of huffman codes, decisions in one pass affect the huffman codes for the next pass.
- A good perceptual metric to target. Maybe SSIM or x264's funny SATD activity thing, or something else.
- Trellis quantization; the JPEG-huff code block structure lends itself to trellis state optimization pretty directly.
- Better chroma subsample (aware of the up-filter).
- Quant matrix optimization for perceptual metric.
- Deblocking filter, or maybe the "Unblock" histogram non-filter approach or
- Luma-aided chroma upsample
- Expectation-in-bucket instead of mean-in-bucket dequantization.
- Noise reinjection , perhaps predicting where some of the zeros in the DCT should in fact be small non-zeros.
- Shape-aware deringing ; similar to camera denoisers, there's a lot of work on this in the literature.