I'm curious how those little hand held electric guitar tuner things work. I can't really find a description on the net, if anyone knows gimme a shout. They might use oscillators tuned to the notes and do some kind of interference thing? You certainly could make a pretty simple DSP thing using Goertzel response filters; you only need to track 2 filter responses, one on each side of the target frequency.
Paper : "A Smarter Way to Find Pitch" : this is basically just autocorrelation. They normalize in a slightly funny way, what they're doing is :
< x * y > / ( sdev(x) + sdev(y) )/2
That is, dividing the raw correlation by the average of the two variances. The true mathematical correlation measure is :
< x * y > / ( rmse(x) * rmse(y) )
Which is a geometric average of the variances rather than a linear average. It's unclear to me why you would choose one or the other, both are correlations in -1 to 1, but the latter is more well justified as a true measure of correlation.
I also went ahead and did fractional bilinear bin indexing for autocorrelation, but the fact is it's just too noisy for that to be worth anything (also the parabolic interpolation is probably a better way to do the same thing).
Paper : "YIN, a fundamental frequency estimator for speech and music" by Cheveigne and Kawahara. The "YIN" method is autocorrelation with lots of heuristic improvements. One good idea there is that rather than sampling fractional bins you can just use parabolic fit of the autocor results to give you sub-bin accuracy. Another good idea is searching around trying different offsets for your autocor (sliding the phase of the autocor window) in addition to searching for different periods. In theory different offsets shouldn't matter but with noise and transients finding the best offset might help a lot. At the end they talk about noise and very ways to address it but it's all very nasty. This is one of those messes where it's hack piled on top of hack you just know this can't be the right way to approach the problem.
Paper : "Accurate and Efficient Fundamental Frequency Determination from Precise Partial Estimates" : goes over mostly the same material I went over and is a pretty well written paper.
Most of the modern papers on the subject are about second level topics, like finding the fundamental frequency from a harmonic spectrum, or tracking pitches through time. There are lots of these, but the most elegant that I've found is "Maximum a Posteriori Pitch Tracking" which is quite solidly built. For their basic pitch likelihood they use the true autocorrelation.
There are some quite strange things with sound and pitch and human hearing. Especially with voice and some stringed instruments, the fundamental frequency, what we actually hear as "the pitch" can actually be very quiet. eg. you perceive the sound to be of pitch "F" , but the actual peak at F might be tiny, and instead there are peaks at 2F and 3F. Which is another weird thing, the sound at 3F is actually a different note (unlike 2F and 4F which are the same note at different octaves), but you don't really hear it as a different note, it's perceived as part of the timbre of the instrument. Over time as a string sounds, the intensity of the various harmonics can rise and fall; the fundamental might sound strongly at some point but may be totally quiet at other points; the funny thing is that we just sort of hear this as a pleasant variation of the sound, we don't hear the "pitch" changing at all. With a guitar the magnitude of the various harmonics can be very strongly effected by where exactly on the string you pluck it. If you pluck right in the middle of the string you will get more of just the fundamental. The sort of normal place to pluck is about 1/4 of the way along the string which gives you F and 2F very strongly, but quite often 2F is much stronger and we still perceive the pitch as F. I'm way out of my element on this stuff but it's interesting.
Okay, so 3F is the "Fifth" ; Wikipedia has a good page on Equal temperament . So like, when you play a "C" , often the fifth harmonic (a "G") is very strong at 3F, but if you play a "G", you won't hear a spike at "C", the fifth from "G" is a "D" , so hearing C+G = C with fifth , while hearing G+D = G with fifth, etc.
Also I found this research app : Tartini which is pretty sweet in theory ; it tracks pitches over time and makes nice graphs and can output musical scores and all that, it's got a nice useable GUI. Unfortunately they use a pretty ass pitch detection method, it's super noisy and not useable on my setup. If you had a professional mic and a sound booth and all that stuff it might be pretty nice.