What about repeating the same 5 bikes on a velodrome for testing?
Why? Field testing tells you about zero yaw only, and the error ranges are larger, aren't they? Help me understand how field testing is superior in any way except for contextualizing your real world CdA. And honestly, "real world CdA" is best calculated using a slew of assumptions for the variables in the equation -- which is HP =((Cd*A*1/2*rho*V^3)/550) + (mass*(Crr+Cbr)*V/550) -- or software such as Best Bike Split, from a few of your race results or a single race result, if you trust that model's (I think proprietary) histograms for a given course based on the weather API and the "open coast" or "desert planes" input, presumably. I don't frankly, because I just don't think it's that good -- but as a model for the real world, it does just fine. I also don't trust it because I can't see how that stuff is cranked behind the scenes. But it's possibly very good on average.
Besides all that, I'd rather test all these things in the tunnel again and then do a free shootout with the top three, say, using Chung tests/VE in Golden Cheetah. Because Chung testing is free and relatively easy if you have a place to do it.
All these approaches have strengths and weaknesses. If we're trying to tease out differences between these bikes -- which, per the Cervelo P5 white paper and the Trek Speed Concept White Paper, are somewhat minor -- then the wind tunnel is the best place to do that with the least range of error. I was once a skeptic, but now that I have seen how low the variability is and how tight the controls, I'm a believer.