It's not that hard getting something that's close on average -- what's hard is figuring out the conditions when it's not, and when the average isn't good enough.
Just to give you a bit more context, here's a different rider in a road race. First, the raw HR and power data (HR in red, using right hand scale):
It's obvious that there is *some* relationship between HR and power but it's very, very noisy. Here's a first cut (i.e., not what I'd do if I were really interested in modeling the HR-power relationship but it'll give you an idea of the degree of difficulty), which is smoothing the raw data over 60 second spans:
Better, eh? Still, even at 60-second smoothing, what's the correlation?
That correlation coeff means that the R^2 is still < 70%.
Now, this is, of course, a naive model of an inherently bursty power demand so it's likely that if the demand were closer to steady-state the fit would be better. How much better is the issue. As I said, it's not how close you can get on average -- it's knowing the conditions when it's not.