Maybe I missed it, but once more, what are the individual data points in this new model?
If each workout is a single data point, then I'd still think, that the NN overfits, and quite badly at that.
Because even on logistic regression, which is much more stable and, with appropriate controls, much less likely to overfit, you need at least 100 datapoints just to estimate the model intercept reliably. There's no way to get around the maths, unfortunately, so while a proposed model might work quite well on a limited sample of athletes (note, that in this particular case - on people with VO2 max 65+), its generalization to "mortal" population might be much less predictive.
And NN eventually - just a bunch of polynomial regressions put into a single model ;) https://arxiv.org/abs/1806.06850
On the other hand, I absolutely applaud the effort of trying to modernize the nowadays-30+ years old performance prediction models. As correctly noted by Alan, real life experience clearly shows that the performance doesn't depend only on such a simplified metric as TRIMPS/TSS, so incorporating additional parameters ("features") in the model should bring improvements in prediction accuracy, provided an individual athlete has a sufficient training sample size (which might be the real issue here).
One should also, however, carefully think on model selection. While "nonlinearity" of NNs seems attractive at first, logistic regression with penalized splines is able to model these quite well provided the additivity assumptions still hold, with the big added bonus of clearly explainable parameter impact on the final model. Stepping a couple of steps further, I'd guess adoption of the Bayesian modelling framework should work even better here because at the end of the day that would allow to obtain full predictive performance distribution for particular athlete, and credibility intervals are just that much better explainable in practice.
----------------------------
Need more W/CdA.
If each workout is a single data point, then I'd still think, that the NN overfits, and quite badly at that.
Because even on logistic regression, which is much more stable and, with appropriate controls, much less likely to overfit, you need at least 100 datapoints just to estimate the model intercept reliably. There's no way to get around the maths, unfortunately, so while a proposed model might work quite well on a limited sample of athletes (note, that in this particular case - on people with VO2 max 65+), its generalization to "mortal" population might be much less predictive.
And NN eventually - just a bunch of polynomial regressions put into a single model ;) https://arxiv.org/abs/1806.06850
On the other hand, I absolutely applaud the effort of trying to modernize the nowadays-30+ years old performance prediction models. As correctly noted by Alan, real life experience clearly shows that the performance doesn't depend only on such a simplified metric as TRIMPS/TSS, so incorporating additional parameters ("features") in the model should bring improvements in prediction accuracy, provided an individual athlete has a sufficient training sample size (which might be the real issue here).
One should also, however, carefully think on model selection. While "nonlinearity" of NNs seems attractive at first, logistic regression with penalized splines is able to model these quite well provided the additivity assumptions still hold, with the big added bonus of clearly explainable parameter impact on the final model. Stepping a couple of steps further, I'd guess adoption of the Bayesian modelling framework should work even better here because at the end of the day that would allow to obtain full predictive performance distribution for particular athlete, and credibility intervals are just that much better explainable in practice.
----------------------------
Need more W/CdA.