Login required to started new threads

Login required to post replies

Prev Next
WKO4's Power-Duration Curve Model Fundamentally Predisposed to Underestimate Power Output?
Quote | Reply
So I've been having a play around with WKO4 in the off-season and have some questions about the Power-Duration model. I'm not looking to start an argument, just genuinely interested in some other opinions from those that have a bit more experience of it.

As a bit of a background, my last race of the season was at the end of August, I had about 4 weeks of relaxing/easy unstructured training and then 6 full weeks of structured base training. This weekend I conducted a bit of baseline testing to try to track my winter training progress, with a traditional 20-min FTP test on Saturday and a short-duration test of mostly anaerobic intervals (30s, 1 min, 2 min and 5 min durations) on Tuesday, loosely based on the traditional Hunter Allen & Andrew Coggan power profiling assessment. I synced all of this data into WKO4 and took a look at the Power-Duration model that it had calculated for me; the dataset includes another 60 or so workouts, however these are mostly eclipsed by the maximal efforts from these tests.

As expected the raw data was rather jagged, with obvious "steps" in the Mean-Max Power profile evident for each discrete test duration (see below). The issue that I have is that the Power-Duration curve that WKO4 has calculated basically takes an averaged fit across this whole profile rather than better accounting for durations of truly maximal effort. As a result, your PD curve is dragged down by any duration period for which you haven't performed an effort reasonably close to maximal. My case is clearly an extreme example of this, being based on a somewhat limited dataset, but it appears to me that even with a (much) larger and more diverse set of data, it's still fundamentally predisposed to underestimate your true capacity at a given duration.

As an example, just to the right-of-centre of my chart are 2 big steps for my 5 min and 20 min testing intervals; the MMP data drops off sharply after these durations and drags the PD-curve down to significantly underestimate my capacity at these intervals. The obvious riposte to this is that the model can only work with the data that it's fed, but I'd argue that it should be calculated to better account for truly maximal efforts and that any MMP data point should represent a minimum value for the corresponding point on the PD-curve. It seems that if you're using the PD-curve to judge recent performance (e.g. the last 90 days) then most athletes won't be conducting enough truly maximal efforts across a broad enough range of durations for the current model to be accurate.

Any thoughts?

Please excuse my woeful numbers here, it's the off-season; that's my excuse and I'm sticking to it! :D


Last edited by: awenborn: Nov 8, 17 3:03
Quote Reply
Re: WKO4's Power-Duration Curve Model Fundamentally Predisposed to Underestimate Power Output? [awenborn] [ In reply to ]
Quote | Reply
The model is fit using OLS. As such, it will always faithfully track the raw data. The latter is invariably noisy, including points where your true maximal capability is underestimated due to lack of a maximal effort, *but also including points where your true capabilities are overestimated due to, e.g., power meter error or just a "special", i.e., irreproducible performance."*

As it turns out, for individuals who are training hard and racing regularly, these two sources of error generally cancel each other, such that the parameter estimates are accurate (as I have shown with sample sizes up to ~200). If all you feed the model is submaximal data, though, then yes, the output will be incorrect. IOW, GIGO.

There are alternatives to OLS, and I explored many of them when developing and fitting the model (note that the structure of the model and how it is fit are two different things). Unfortunately, none of these approaches are really valid solutions, as you end up"chasing noise" and routinely overestimating what someone can do. IOW, use of an "envelope fit", i.e., fitting only the extremes of the extremes, results in bias (this I have also demonstrated with large samples).

Finally, note that years on from its introduction, the WKO4 model is the only such model that has been evaluated/validated using large numbers of individuals (more than all papers in the scientific literature *combined*), and none of its critics have demonstrated any willingness or desire to do anything but sit on the sidelines and snipe. In particular, none has taken the simple step of attempting to repeat my extensive validation, despite readily having the ability to do so.
Quote Reply
Re: WKO4's Power-Duration Curve Model Fundamentally Predisposed to Underestimate Power Output? [Andrew Coggan] [ In reply to ]
Quote | Reply
For those in the back 'OLS' is Ordinary Least Squares curve fitting. I'm moderately versed in the subject and had to do a little bit of googling to figure out the acronym.

Just trying to save some other people from the same process.
Quote Reply
Re: WKO4's Power-Duration Curve Model Fundamentally Predisposed to Underestimate Power Output? [awenborn] [ In reply to ]
Quote | Reply
awenborn wrote:
...
Basically, you are right. Don't expect Andrew to admit it, though..

Also, Andrew tries to fit the data, instead of trying to predict possible performance. Most people expect the second thing. I still don't truly understand the purpose of the first thing. But maybe my mind is to feeble to understand?
Quote Reply
Re: WKO4's Power-Duration Curve Model Fundamentally Predisposed to Underestimate Power Output? [asgagd] [ In reply to ]
Quote | Reply
asgagd wrote:
awenborn wrote:
...

Basically, you are right. Don't expect Andrew to admit it, though..

Also, Andrew tries to fit the data, instead of trying to predict possible performance. Most people expect the second thing. I still don't truly understand the purpose of the first thing. But maybe my mind is to feeble to understand?

Fit the data has always gotta be better than guessing
Quote Reply
Re: WKO4's Power-Duration Curve Model Fundamentally Predisposed to Underestimate Power Output? [Andrew Coggan] [ In reply to ]
Quote | Reply
Andrew Coggan wrote:
There are alternatives to OLS, and I explored many of them when developing and fitting the model (note that the structure of the model and how it is fit are two different things). Unfortunately, none of these approaches are really valid solutions, as you end up"chasing noise" and routinely overestimating what someone can do. IOW, use of an "envelope fit", i.e., fitting only the extremes of the extremes, results in bias (this I have also demonstrated with large samples).


Thanks for the reply. I, as many others do I'm sure, do genuinely appreciate you coming on here to explain, elaborate and defend your work from armchair critics like me. My post wasn't meant as a snipe as such, more of an observation!

That's an interesting (and somewhat disappointing) point regarding the possible bias of other fit methods. I guess, as you say, with a more comprehensive set of data the errors start to cancel each other out and this is clear just by looking at my own data over longer time-frames.


asgagd wrote:
Also, Andrew tries to fit the data, instead of trying to predict possible performance. Most people expect the second thing. I still don't truly understand the purpose of the first thing. But maybe my mind is to feeble to understand?

That's is an interesting way to look at it. I think there's clearly merit and utility in both approaches, the former for analysing and comparing prior performances (e.g. comparing 2017 vs 2016 power profiles) and the latter for prescribing a training structure. How well each one is catered for I guess depends on the quality and diversity of your MMP data input.
Last edited by: awenborn: Nov 8, 17 5:16
Quote Reply
Re: WKO4's Power-Duration Curve Model Fundamentally Predisposed to Underestimate Power Output? [asgagd] [ In reply to ]
Quote | Reply
asgagd wrote:
awenborn wrote:
...
Basically, you are right. Don't expect Andrew to admit it, though..

No, he isn't correct, as the model is not "fundamentally predisposed to underestimate power output."

OTOH, the use of OLS to fit the model will do so if the data set in question doesn't contain enough maximal efforts, as I have always emphasized (contrary to asgagd's patently-false statement). However, 1) that's not a limitation of the model, and 2) there are no viable alternatives. in particular, any sort of "envelope fit" results in biased parameter estimates.

asgagd wrote:
Also, Andrew tries to fit the data, instead of trying to predict possible performance. Most people expect the second thing. I still don't truly understand the purpose of the first thing. But maybe my mind is to feeble to understand?

The purposes are 1) to extract accurate (i.e., unbiased) and precise* parameter estimates reflective of the physiological characteristics that determine your performance ability, and 2) to smooth the mean maximal power curve to minimize noise/error in subsequent calculations (e.g., calculation of individually-based training levels).

OTOH, if all you want to do is predict someone's performance, I would suggest following the advice of whomever it was that originally opined "the best predictor of performance is performance itself."

*Note that to this day WKO4 remains the only program on the market that natively provides valid goodness-of-fit statistics. Other programs either leave you in the dark, or lie to you by providing R^2 values for non-linear curve fits (which is not a statistically valid approach).
Last edited by: Andrew Coggan: Nov 8, 17 7:28
Quote Reply
Re: WKO4's Power-Duration Curve Model Fundamentally Predisposed to Underestimate Power Output? [awenborn] [ In reply to ]
Quote | Reply
BTW, it might open a few eyes if you plot the normalized residuals of that curve fit against time (instead of the log of time).

If you do so, you will see that the predicted and measured values are invariably within a few percent of each other.
Quote Reply
Re: WKO4's Power-Duration Curve Model Fundamentally Predisposed to Underestimate Power Output? [Andrew Coggan] [ In reply to ]
Quote | Reply
Andrew Coggan wrote:
OTOH, the use of OLS *will* do so if the dataset to which you fit the model doesn't contain enough maximal efforts...but 1) that's not a limitation of the *model*, and 2) there are no viable alternatives.
So this is the point I was trying to make in my OP. Please correct me if (or more likely where!) I'm wrong, but the above statement is only true if "enough" = infinite.

I doesn't really need stating, but it's fundamentally impossible for a given dataset to have maximal efforts at all possible durations; any dataset will have some points of maximal effort and some (arguably lots more) points of sub-maximal effort. If all datapoints are given an equal weighting, as in the OLS method, then the PD curve will seemingly always underestimate your true capacity at a given duration.

The margin by which it underestimates will be dependent on the number of maximal efforts that you have at and around that given duration, and the practical significance of that margin of error may well be negligible. Regardless, given the inherently finite and transitory nature of the data being analysed here, is my original statement that it will always underestimate your true capacity not valid?

On your other points, notably the suitability of alternatives, I will happily defer to your substantially greater experience.



Andrew Coggan wrote:
BTW, it might open a few eyes if you plot the normalized residuals of that curve fit against time (instead of the log of time).

If you do so, you will see that the predicted and measured values are invariably within a few percent of each other.
I recall some plots like this in the WKO4 literature/tutorials and indeed, they were quite informative in demonstrating the accuracy of the model. I'll see if I can figure that out when I get back to my computer with WKO4 on it.
Quote Reply
Re: WKO4's Power-Duration Curve Model Fundamentally Predisposed to Underestimate Power Output? [awenborn] [ In reply to ]
Quote | Reply
awenborn wrote:
Andrew Coggan wrote:
OTOH, the use of OLS *will* do so if the dataset to which you fit the model doesn't contain enough maximal efforts...but 1) that's not a limitation of the *model*, and 2) there are no viable alternatives.
So this is the point I was trying to make in my OP. Please correct me if (or more likely where!) I'm wrong, but the above statement is only true if "enough" = infinite.

I doesn't really need stating, but it's fundamentally impossible for a given dataset to have maximal efforts at all possible durations; any dataset will have some points of maximal effort and some (arguably lots more) points of sub-maximal effort. If all datapoints are given an equal weighting, as in the OLS method, then the PD curve will seemingly always underestimate your true capacity at a given duration.

The margin by which it underestimates will be dependent on the number of maximal efforts that you have at and around that given duration, and the practical significance of that margin of error may well be negligible. Regardless, given the inherently finite and transitory nature of the data being analysed here, is my original statement that it will always underestimate your true capacity not valid?

On your other points, notably the suitability of alternatives, I will happily defer to your substantially greater experience.

You are ignoring biological and technological variability. You are also conflating the map with the territory.

IOW, instead of thinking of your mean maximal power data as "clean" curve, you should think of it as a blurred/smudged line (with a width of +/- ~5%). Only if a sufficient number of points at critical durations* fall significantly below this zone or region will the model parameters be significantly biased. Furthermore, you need to recognize that just because a model predicts that you can do something, doesn't mean that you actually can. IOW, just because your actual data fall below (or above) the fitted curve doesn't necessarily mean that isn't a valid measure of your maximal performance at that duration.

Empirically, looking backwards 90 d is generally adequate to avoid issues in racing cyclists, at least throughout most of the year. OTOH, if, e.g., you become a strict trainer drone in winter, or are a triathlete or runner, it is less likely that you will spontaneously generate data robust enough to provide valid estimates of all of the parameters, and some formal "curve maintenance" testing may be required.

*Although all points used in the fitting exert some influence, some have more leverage than others. On average, for example, your maximal 47 s power has the greatest influence on the estimated FRC, but (obviously, I would think) has little to not impact on your estimated stamina.
Last edited by: Andrew Coggan: Nov 8, 17 8:23
Quote Reply
Re: WKO4's Power-Duration Curve Model Fundamentally Predisposed to Underestimate Power Output? [Andrew Coggan] [ In reply to ]
Quote | Reply
Andrew Coggan wrote:
looking backwards 90 d is generally adequate to avoid issues in racing cyclists, at least throughout most of the year. OTOH, if, e.g., you become a strict trainer drone in winter, or are a triathlete or runner, it is less likely that you will spontaneously generate data robust enough to provide valid estimates of all of the parameters, and some formal "curve maintenance" testing may be required.

That's an important nuance that the OP is now realizing.
Quote Reply
Re: WKO4's Power-Duration Curve Model Fundamentally Predisposed to Underestimate Power Output? [lyrrad] [ In reply to ]
Quote | Reply
lyrrad wrote:
asgagd wrote:
awenborn wrote:
...

Basically, you are right. Don't expect Andrew to admit it, though..

Also, Andrew tries to fit the data, instead of trying to predict possible performance. Most people expect the second thing. I still don't truly understand the purpose of the first thing. But maybe my mind is to feeble to understand?


Fit the data has always gotta be better than guessing

Fitting to submaximal data is worse than guessing. MMP is categorically submaximal (typically > 90% of the data used)

Fitting to known maximal data is good.

This topic has been done to death and Coggan either doesn't get it or chooses to ignore the issues.

As ever, fitting data to tests is by far and away the most robust approach.

Mark
Quote Reply
Re: WKO4's Power-Duration Curve Model Fundamentally Predisposed to Underestimate Power Output? [liversedge] [ In reply to ]
Quote | Reply
liversedge wrote:
lyrrad wrote:
asgagd wrote:
awenborn wrote:
...

Basically, you are right. Don't expect Andrew to admit it, though..

Also, Andrew tries to fit the data, instead of trying to predict possible performance. Most people expect the second thing. I still don't truly understand the purpose of the first thing. But maybe my mind is to feeble to understand?


Fit the data has always gotta be better than guessing


Fitting to submaximal data is worse than guessing. MMP is categorically submaximal (typically > 90% of the data used)

Fitting to known maximal data is good.

This topic has been done to death and Coggan either doesn't get it or chooses to ignore the issues.

As ever, fitting data to tests is by far and away the most robust approach.

Mark

Everyone knows that Dr. Coggan does not need someone to come to his defense....but I think if you read the intended use case (i.e. a road racing cyclist) that you will see that your assertion that >90% of the data being sub-maximal would become false.
Quote Reply
Re: WKO4's Power-Duration Curve Model Fundamentally Predisposed to Underestimate Power Output? [pyrahna] [ In reply to ]
Quote | Reply
pyrahna wrote:
Everyone knows that Dr. Coggan does not need someone to come to his defense....but I think if you read the intended use case (i.e. a road racing cyclist) that you will see that your assertion that >90% of the data being sub-maximal would become false.

I refer you to the curve in the OP.
Quote Reply
Re: WKO4's Power-Duration Curve Model Fundamentally Predisposed to Underestimate Power Output? [liversedge] [ In reply to ]
Quote | Reply
liversedge wrote:
pyrahna wrote:

Everyone knows that Dr. Coggan does not need someone to come to his defense....but I think if you read the intended use case (i.e. a road racing cyclist) that you will see that your assertion that >90% of the data being sub-maximal would become false.


I refer you to the curve in the OP.

And I refer you to the part of the OP where he says it's only about 60 or so workouts (that appear to be after his season end, and are therefore not racing, but training workouts).

See, I kind sound unpleasant as well.
Quote Reply
Re: WKO4's Power-Duration Curve Model Fundamentally Predisposed to Underestimate Power Output? [pyrahna] [ In reply to ]
Quote | Reply
In Reply To:
pyrahna wrote:
liversedge wrote:
lyrrad wrote:
asgagd wrote:
awenborn wrote:
...

Basically, you are right. Don't expect Andrew to admit it, though..

Also, Andrew tries to fit the data, instead of trying to predict possible performance. Most people expect the second thing. I still don't truly understand the purpose of the first thing. But maybe my mind is to feeble to understand?


Fit the data has always gotta be better than guessing


Fitting to submaximal data is worse than guessing. MMP is categorically submaximal (typically > 90% of the data used)

Fitting to known maximal data is good.

This topic has been done to death and Coggan either doesn't get it or chooses to ignore the issues.

As ever, fitting data to tests is by far and away the most robust approach.

Mark



Everyone knows that Dr. Coggan does not need someone to come to his defense....but I think if you read the intended use case (i.e. a road racing cyclist) that you will see that your assertion that >90% of the data being sub-maximal would become false.

In a road race I try to do the least number of maximum efforts as possible. If I know that I am the stongest sprinter in the group I try to do exactly one single maximum effort of may be 10 seconds!
Quote Reply
Re: WKO4's Power-Duration Curve Model Fundamentally Predisposed to Underestimate Power Output? [pyrahna] [ In reply to ]
Quote | Reply
pyrahna wrote:
liversedge wrote:
pyrahna wrote:

Everyone knows that Dr. Coggan does not need someone to come to his defense....but I think if you read the intended use case (i.e. a road racing cyclist) that you will see that your assertion that >90% of the data being sub-maximal would become false.


I refer you to the curve in the OP.


And I refer you to the part of the OP where he says it's only about 60 or so workouts (that appear to be after his season end, and are therefore not racing, but training workouts).

See, I kind sound unpleasant as well.

  • An MMP curve of 0s to 2hrs contains 7,200 data points.
  • It would take an athlete (7200^2 + 7200)/2 seconds to record maximal efforts at each duration - that's 300 days straight without rest.
  • If we add a minimum of 1hr between each effort to recover that would add another 300 days.
  • If you then added 8hrs sleep for each of those 600 days you would then add another 200 days.


If you spent 800 days straight solely doing maximal tests you could produce a MMP curve from 0s to 2hrs comprised 100% of maximal efforts.
Quote Reply
Re: WKO4's Power-Duration Curve Model Fundamentally Predisposed to Underestimate Power Output? [pyrahna] [ In reply to ]
Quote | Reply
pyrahna wrote:
liversedge wrote:
pyrahna wrote:

Everyone knows that Dr. Coggan does not need someone to come to his defense....but I think if you read the intended use case (i.e. a road racing cyclist) that you will see that your assertion that >90% of the data being sub-maximal would become false.

I refer you to the curve in the OP.

And I refer you to the part of the OP where he says it's only about 60 or so workouts (that appear to be after his season end, and are therefore not racing, but training workouts).

See, I kind sound unpleasant as well.
I acknowledged in the OP that this dataset is an extreme example, but one that overtly highlights the issue of the curve fitting method employed.

Sure, a more comprehensive dataset with more maximal efforts at a greater variety of durations would be a lot closer to one's (theoretical) physiological power-duration capabilities, but I'd argue that it would also obfuscate identification of those areas of the PD-curve that are being modelled on, and confounded by, sub-maximal data.


Andrew Coggan wrote:
Furthermore, you need to recognize that just because a model predicts that you can do something, doesn't mean that you actually can. IOW, just because your actual data fall below (or above) the fitted curve doesn't necessarily mean that isn't a valid measure of your maximal performance at that duration.

I appreciate that technological and biological variability play a role here, but is this not a contradiction of your "performance is the best predictor of performance" mantra?
Last edited by: awenborn: Nov 8, 17 13:24
Quote Reply
Re: WKO4's Power-Duration Curve Model Fundamentally Predisposed to Underestimate Power Output? [Andrew Coggan] [ In reply to ]
Quote | Reply
Andrew Coggan wrote:
BTW, it might open a few eyes if you plot the normalized residuals of that curve fit against time (instead of the log of time).


If you do so, you will see that the predicted and measured values are invariably within a few percent of each other.


As requested, albeit on a log-x-axis:

(pdcurve(meanmax(bikepower)) - meanmax(bikepower)) / meanmax(bikepower)


Quote Reply
Re: WKO4's Power-Duration Curve Model Fundamentally Predisposed to Underestimate Power Output? [awenborn] [ In reply to ]
Quote | Reply
No, it is confirmation of it.
Quote Reply
Re: WKO4's Power-Duration Curve Model Fundamentally Predisposed to Underestimate Power Output? [liversedge] [ In reply to ]
Quote | Reply
For starters, you incorrectly ass u me that just because someone produces a maximal performance at X seconds, they can't produce a maximal performance at X+1 seconds within the same effort.

For finishers, your false objection primarily influences the precision of the parameter estimates, and not their actual value.
Quote Reply
Re: WKO4's Power-Duration Curve Model Fundamentally Predisposed to Underestimate Power Output? [BergHugi] [ In reply to ]
Quote | Reply
And I try my best to turn it into a maximal effort TT for everyone, as that plays to my strengths.

Regardless, your reply is really a non-sequitur, as no one here has suggested attempting to extract reliable parameter estimates from a single file (although in fact you sometimes can)
Quote Reply
Re: WKO4's Power-Duration Curve Model Fundamentally Predisposed to Underestimate Power Output? [liversedge] [ In reply to ]
Quote | Reply
Yes, the topic has been "done to death", but I am the only one who has presented actual DATA in support of their position. As such, your comments are essentially as meaningful as a 2 a.m. Trump tweet.
Last edited by: Andrew Coggan: Nov 8, 17 14:50
Quote Reply
Re: WKO4's Power-Duration Curve Model Fundamentally Predisposed to Underestimate Power Output? [Andrew Coggan] [ In reply to ]
Quote | Reply
Andrew Coggan wrote:

Finally, note that years on from its introduction, the WKO4 model is the only such model that has been evaluated/validated using large numbers of individuals (more than all papers in the scientific literature *combined*), and none of its critics have demonstrated any willingness or desire to do anything but sit on the sidelines and snipe. In particular, none has taken the simple step of attempting to repeat my extensive validation, despite readily having the ability to do so.

Have you published the "large numbers of individuals" dataset? If the dataset is good then it can be used to validate and compare other models.
Quote Reply
Re: WKO4's Power-Duration Curve Model Fundamentally Predisposed to Underestimate Power Output? [awenborn] [ In reply to ]
Quote | Reply
awenborn wrote:
It seems that if you're using the PD-curve to judge recent performance (e.g. the last 90 days) then most athletes won't be conducting enough truly maximal efforts across a broad enough range of durations for the current model to be accurate.

Any thoughts?
It sucks that these discussions always devolve so quickly.

Here are my thoughts: it seems unlikely that the WKO model is biased (low or high) on average across all athletes. It was developed by testing the model on data for many athletes (hundreds?) and, even though Andy hasn't released all the details of this process, I trust that he developed a model that was as unbiased as he could get it. In one of his responses above he mentions that there are forces leading to overestimates (noise, mostly) and forces leading to underestimates (lack of maximal efforts), and they approximately cancel out across data for all athletes.

What about your particular case? First, you are asking a lot: you want good estimates of FTP without much data to back that up. *You* may know how 231 watts feels, and hence have a better sense of whether that is your FTP or not, but all WKO can see is your data. When I look at just your data, I can't tell whether 231 is too high or too low. It is definitely not obvious that it's too low, again just from the data.

Now, that said, I do believe the WKO model has consistent biases for certain athletes. I.e, even though it may be unbiased across all athletes, there may be some athletes for which WKO consistently overestimates FTP, and others for which it consistently underestimates FTP. For *me*, during a busy race season, so when there is lots of data on maximal efforts of different lengths, WKO mFTP is always about 3% low and TTE about 33% low. I'm not sure why this happens but it might be because I'm religious about power meter calibration so there is less noise in my data, but it could be something else.

I have a ton of data, so I don't need a model to tell my FTP. Where the WKO model really shines is for people like you who do not have much data. In that case I think it does as well or better than any other model, in addition to being easier to use. For example, Andy has extensively tested the WKO model against the critical power model on data for many athletes and found that the WKO model does better. (I believe liversedge was supporting the CP model.)
Quote Reply

Prev Next