On the top graph, 1 increment represents 1 month for the 1st 4 ones, then a jump of 5 months for the October point. Kind of misleading…
Conclusion from the data presented: you have had a fairly linear progression over these 8 months, and the effect of product X is unknown. You may have progressed more if you had not used it, or you may have seen a drop in the increase of power, one can’t tell from the presented data.
Obviously the way the data is presented affects perception.
Four data points is pretty weak for linear correlation (second plot), and the variance (R^2) is weak. But that said, even though it looks like you have a much better R^2 with five data points in the bottom graph, the fifth point is well out there which gives it excessive weight. Some would suggest a correction factor for your linear regression equation. I dislike the first graph on the grounds that precision is lost by simply lumping the data into month based categories. Hmm, I guess your point about product X is that there is only one data point after the introduction of Product X, as compared with four points before which may show a weak increasing trend before product X (from the middle graph). The previous post identified the non-linearity of the x-axis when using the months as labels versus a numerical axis. (Damn I missed that the first time, I should get my butt to bed.)
No conclusions could be drawn from any of the charts/plots/graphs.
In order to find relationships (e.g. correlation, variance, regression) you must conduct a statistical test that shows a significant difference (i.e. at a given probability) or similar test as appropriate to the experiment or data collection method used. Changing the scale of a chart as a means to change a line in a graph could only lead readers who have no statistical training to false conclusions. The plots are only used for illustration purposes once a statistical relationship is found.
hmmm…i guess these data show that a person is riding a bike at different speeds at different times. that’s about all i would stake my rep on. yes, their average speed is increasing over time, as predicted by the regression equation. yes, the introduction of product “a” is followed by the attainment of the highest speed by the subject. the downsides: manifold. 1. we’d expect increases in speed over time from training effects irrespective of any product. 2. correlation does not equal causation. 3. a single subject measured at different times violates the assumptions of linear regression (independence of data points, uncorrelated error terms, blah blah woof woof). 4. the r-squared terms are lovely (that’s variance explained by the regression equation, not variance btw), but guess what happens to the r2 if you have one data point and regress through the origin? or fit just two points? perfect correlation! beware the r2 with low sample sizes. a better thing to look at is confidence intervals for individual points (not for the regression line), which are fairly broad. 5. guess what the regression equation predicts the subjects speed will be in ten years–watch out lance. we need a quadratic fit with a longer time series 6. this ain’t science! we need multiple subjects! we need real product “a” and control product “a” (the latter being a sham without any effects beyond psychological)! we need replicable testing conditions! we need repeated-measures Analysis of Variance! we need aspirin!
that being said, if product “a” looks cool, where can i buy one?
I think it shows that you froze you butt off in February and were chasing women in October with a short term relationship in April.
It may also show that the solar motor works better on sunny days.
Another thing it could show is that you were eating McDonalds in February and Subway in October
All we can conclude from the graph is that the vehicle was faster later in the year than in the beginning. The assumption that the graph is linear is incorrect as well as the assumption that this is data from a person riding a bicycle.
So what is the punchline? Product X is/are Power Cranks? The data in the graph doesn’t “prove” anything about Prodcut X (for that matter, even if there was a strong positive correlation, that still doen’t “prove” anything, it only suggests that a real correlation exists). OTOH, you cannot say that the graphs invalidate Product X. At best, it is inconclusive. Why do I get the feeling someone is trying to put words in my mouth? “Daddy, I got cider in my ear!”
“What conclusions could one draw from these data presented? Does the manner in which data is plotted effect one’s perception?”
I guess the top graph would suggest to the eye that ‘X’ is doing something - at least more so than the others - but I agree you need more data to satisfy anyone with more than a credit card and an itchy trigger finger. Aren’t you trying to show something different with all the other analyses?
It is a possibility that product X is doing something, what that is I don’t know, so no, I’m not convinced. For all that we know, it could be limiting progess.
The only thing I’m convinced of is that whomever was collecting the data should be replaced. There’s nowhere near enough data to conclude anything about anything.
I also want to know the punchline. What product have we all been buying and touting based on this (lack of) data?
I suspect that if you had a few more data points you’d realize that a linear model is not the best fit. Your R^2 will start droping like a rock when your increases in average speed no longer keep up with the model.
Unless you are assuming that there is no limit to average speed…day 730(2yrs)=29.8mph, 1095(3yrs)=33.6mph, 1460(4yrs)=37.1mph. So in four years average speed has gone from average Cat 4 rider to beyond anything that has ever been achieved by a human athlete. Where do I sign up?
Good example of the limits of statistical modeling.
Chris
PS. I won’t even mention the first chart…feb, mar, apr, may, oct…!!!..come on.
“The only thing I’m convinced of is that whomever was collecting the data should be replaced. There’s nowhere near enough data to conclude anything about anything.” Whoa, big fella, this “information” is something I would do regarding my own training, and as such it has some value such as tracking/trending/monitoring. It is anecdotal in nature, not a proof. However, to sack the data taker is just as bad as staking a claim as both are done out of context. The forum has not been provided the full context of the information, let alone the nature of any claims associated with Product X based on the data shown. The graph suggests several possibilities, the strongest one is that the subject was improving steadily before Product X. One can use linear regression to predict beyond the data shown, but there is no statistical validation for those predicted values, super major caveat (someone earlier in the thread impied predicitive information from the graphs which is not reasonable or accurate). If I was tracking my own results (and I do), I would correlate the graph data with variables such as volume of workouts, intensity, resting HR, anabolic steoid use, err supplements, chocolate intake, etc. It would make a lot more sense to me, it still might not mean much to other people, though.
LSilverman (that’s me) said: “The only thing I’m convinced of is that whomever was collecting the data should be replaced. There’s nowhere near enough data to conclude anything about anything.” Parkito replied: Whoa, big fella, this “information” is something I would do regarding my own training, and as such it has some value such as tracking/trending/monitoring. It is anecdotal in nature, not a proof. However, to sack the data taker is just as bad as staking a claim as both are done out of context.
In the first graph there are data points in Feb, Mar, Apr, May, and Oct. June, July, Aug, and Sept are missing. There is absolutely no way to draw any kind of conclusion from a data set that’s missing almost half the data! Generically, I’d say that’s the fault of the person who was supposed to collect the data.
I’ll concede that it may not be the data collector’s fault. It’s possible that the rider didn’t do any time trials in the summer – but in my mind that invalidates the entire data set, or at least it invalidates the Oct data point as an outlier. It’s also possible that the data was collected but the summer results were not plotted/analyzed – which would be FAR worse than not gathering the data in the first place.
If you just look at the first four data points, you still can’t conclude anything. We don’t need to get into all the information that is missing that would help us know if the data points are comparable – partly because there’s so much missing that any list is going to be incomplete. Are these measured on the same course? Wind speed/direction on each day? Temp on each day? The list could go on forever.
Even if you’re just looking for anecdotal evidence, looking at only part of the data is often worse than having no data at all. That’s the reason why most “anecdotal” reports are disproven by rigorous scientific studies. You have to look at all the data or none at all.
I still want to know what Product X is. The suspense is killing me.
In the first graph there are data points in Feb, Mar, Apr, May, and Oct. June, July, Aug, and Sept are missing. There is absolutely no way to draw any kind of conclusion from a data set that’s missing almost half the data!
Agreed, if data was even taken then, we don’t know.
Generically, I’d say that’s the fault of the person who was supposed to collect the data. Fault is a judgment term. In a designed, controlled study this would be an obvious fault. But if I was tracking myself and didn’t test for three months, did I do something wrong?
I’ll concede that it may not be the data collector’s fault. It’s possible that the rider didn’t do any time trials in the summer – but in my mind that invalidates the entire data set, or at least it invalidates the Oct data point as an outlier. It’s also possible that the data was collected but the summer results were not plotted/analyzed – which would be FAR worse than not gathering the data in the first place. I think we have already agreed that any correlation of the October point with any variable is questionable. If you want to start hypothesis, you could assume the individual took the winter off, and that early improvement observed was due to returning to training, the possibilities are endless.
If you just look at the first four data points, you still can’t conclude anything. We don’t need to get into all the information that is missing that would help us know if the data points are comparable – partly because there’s so much missing that any list is going to be incomplete. Are these measured on the same course? Wind speed/direction on each day? Temp on each day? The list could go on forever. Yup.
Even if you’re just looking for anecdotal evidence, looking at only part of the data is often worse than having no data at all. That’s the reason why most “anecdotal” reports are disproven by rigorous scientific studies. You have to look at all the data or none at all.
Anecdotal reports are never given the same weight as rigorous studies. They may lead one to conduct a rigorous study to establish or disprove something reported in an anecdotal observation.
We’re on the same page I think, I’m just reading it upside down again.
I think the issue is how the information is presented.
If you print a flyer with just some graphs, a person’s analytical half of their brain lights up, and they start getting critical. We look at those graphs and imagine a hundred reasons for the possible trend.
On the other hand, if you have a cool looking product, that has some kind of reasonably believable mechanism for performance boost (It doesn’t have to be ultra-science, just a pretty good idea that people can look at and say, “Ok, I can see where there might be some truth to that.”), and then back it with plots just like the ones you’ve created, people’s trust threshold is much lower, and they’re more inclined to give it a try. Not to pick on powercranks, but that’s a good example of a product with what seems to be a kernel of truth (e.g. pedalling circles seems to make sense; it would sure seem to recruit more muscle fibers, etc.) and not a lot of hard data.
What frustrates me is why some academic folks like the ones who frequent the topica list don’t set up a nice experimental/control study with preferably a decent sample size and sort it out. I know when I was a college student I would have been happy to jump on a trainer once a week or so for maybe 5-10 bucks.
I still want to know what Product X is. The suspense is killing me.
Lee
Oh I used to be disgusted
and now I try to be amused.
But since their wings have got rusted,
you know, the angels wanna wear my red shoes. Red shoes, the angels wanna wear my red shoes…