We’ve examined the flawed approach of using past data for prediction in complex operations in dynamic environments. We’ve also described the problem with misapplying ML to these cases. ML attempts to respond to dynamic inputs by compensating in the “learning” phase—repeating this step as changing historical samples are collected. This effort uses prior information while updating probability models that attempt future predictions.
Data scientists and statisticians debate the virtues of this kind of Bayesian analysis vs. the rival frequentist approach. Bayesian modeling assumes data is a set of fixed observations and parameters are unknown. Frequentist analysis sees parameters as fixed and data as a repeatable random sample with a frequency. This frequentist approach views studies as a repeatable means for gathering multiple data samples. When applied carelessly to complex, dynamic business operations, this approach is also unsound.
Once we’ve collected data from the past, we’re limited to those hindsight observations, there’s no re-living the past to capture another data set. So, instead, we can come up with a sampling plan: We try to separate simultaneous studies by geography or other criteria, or to use sequential studies to sample data for our frequentist statistical models. Both sampling techniques introduce serious problems linked to the assumption that measures we’re tallying represent independent and identically distributed variables across studies. In complex systems, simultaneous, segmented data observations may represent quite different sets of operating conditions—they could originate from completely different data populations. Our business may not be homogenous across the segmented studies (geography, organizations, equipment variants, etc.). The assumptions of the inferential statistical model break down, and predictions will be inaccurate.
Collecting separate data sets over different periods of time may also result in business observations that are only loosely related or not connected at all. It’s likely that our business processes, equipment conditions, and the operating environment each change over time. Once again, the inferential model is sitting on a shaky foundation of false assumptions and doomed to fail. So while we haven’t fixed the problem of rear-facing data collection, we’ve now introduced a new set of complications: in complex, dynamic, uncertain systems the frequentist data observations are probably not from independent, identically distributed variables—and our predictive results are now even further from reality.