
Additional Research
Ensemble Methods Explained
Ensemble Methods is a time-tested, multiple-expert system designed to improve the accuracy of single-expert predictive algorithms or predictive engines.
The key components of Ensemble Methods include:
Multiple independent predictive engines
A mathematical mechanism to identify areas of consensus agreement across the underlying predictive engines.
The idea of seeking a second opinion (or third, or fourth) in matters of importance is second nature to most people. And for good reason. Consider a medical patient trying to decide between different treatment options. Surgery, drug therapy, homeopathic therapy? And then assume that this patient went to 5 medical experts and asked for their diagnosis and recommended treatment. If all 5 doctors provided the exact same diagnosis and recommended treatment, is there any doubt what that patient’s course of action would be? Of course not. In virtually all cases, this type of multiple-expert system is better than a single-expert system, regarding both expected outcome and confidence in that outcome.
In their groundbreaking book Ensemble Methods in Data Mining, Seni and Elder described Ensemble Methods as “the most influential development in Data Mining and Machine Learning in the past decade. They combine multiple [predictive] models into one [that is] usually more accurate than the best of its components.”
The concepts behind Ensemble Methods date back to the late 1970’s, and are considered a foundational approach for most AI and Machine Learning applications. Ensemble Methods are taught at virtually every university that has a Computer Science department, and have been successfully used in applications as varied as facial recognition, self-driving cars, weather prediction, computer security, medicine, and even wine selection11.
Why Do Ensemble Methods Work: Statistics
The science of statistics provides a clear explanation of how combining multiple predictive algorithms translates to improved outcomes. Just like the odds of flipping a coin and having heads appear twice in row is 25% [50% odds per flip, occurring twice in a row: 0.5 x 0.5 = 0.25], statistics can explain the expected improvements in predictive outcomes from Ensemble systems.
Assume someone was attempting to predict a binary outcome (e.g., whether a plane would land at O’Hare airport on time). In this hypothetical, they had built several ‘single expert’ algorithms, each of which were unique with independent errors1, and in this example each predictor hit a glass ceiling at precisely a 60% success rate. Fortunately, a multi-expert system, achieving consensus agreement among partially flawed predictive engines, can still deliver high predictive results.
Figure 1. Statistics Behind Ensemble,
multi-expert systems
As can be seen visually in Figure 1, the probability of an accurate prediction increases rapidly when the predictors agree (Note: this example assumes that the predictive algorithms are truly independent from one another).
For example:
If two predictors agree, the probability of success increases from 60% to 84%.
If five agree, the probability of success increases from 60% to 99%.
Obviously, multiple predictors rarely reach consensus agreement. But statistics can still shed light into how predictive success is improved through Ensemble Methods, even when consensus is not reached.
Building on the O’Hare example, this time an Ensemble of 21 independent and unique predictive algorithms were assembled to predict on-time landings, and each still had the same 60% success rate. In this case, they used a simple majority vote of the 21 algorithms to predict the outcome. Therefore, in order for this Ensemble prediction to be wrong, 11 or more of the predictors need to be wrong. The probability of such an outcome is only 17.4%, creating a success rate of 82.6% -- even though the underlying predictors all have a ‘glass ceiling’ at 60%.
1 Additional sources of information – Ensemble Methods:
G. Seni and J. Elder, (2010), Ensemble Methods in Data Mining. Morgan and Claypool Elder Research.
A. Theissler (2017), “Detecting Known and Unknown Faults in Automotive Systems Using Ensemble-Based Anomaly Detection,” ScienceDirect.
T. Gneitling and A. E. Raftery, (2005), “Weather Forecasting with Ensemble Methods,” Science.
J. Vanerio (2017), “Ensemble-Learning Approaches for Network Security and Anomaly Detection,” Proceedings of the workshop on Big Data Analytics and Machine Learning for Data Communication Networks - Big-DAMA ’17.
E. Rosales (2015), “Predicting Patient Satisfaction with Ensemble Methods,” Project report submitted to the faculty of Worcester Polytechnic Institute. pdf
D. Morrison, R. Wang, L.C. De Silva (2007), “Ensemble Methods for Spoken Emotion Recognition in Call-centres,” ScienceDirect.
F. Schimbinschi, L. Schomaker, M. Wiering (2015), “Ensemble Methods for Robust 3D Face Recognition Using Commodity Depth Sensors,” Conference Paper presented to the IEEE CIBIM, at Cape Town, South Africa. 1570175885_Schimbinschi_CIBIM.pdf
R. Polikar (2009), Ensemble learning - Scholarpedia 4(1):2776
E. Van Buskirk (2009), “BellKor’s Pragmatic Chaos Wins $1 Million Netflix Prize by Mere Minutes,” Wired.com.
E. Van Buskirk (2009), “How the Netflix Prize was Won,” Wired.com. Wired.com.
The Bias-Variance Conflict Explained
As practitioners of Machine Learning know, the two most common errors impacting a predictive algorithm are ‘Bias’ and ‘Variance’. Bias occurs when the underlying assumptions in the predictive algorithm are flawed.
A ‘High Bias’ predictor will generate results that are consistently off target (Chart 1, left side). Variance refers to its level of accuracy.
A ‘High Variance’ algorithm will deliver results that have low accuracy (Chart 1, right side).
Chart 1. Statistics Behind Ensemble,
multi-expert systems
Unfortunately, all predictive algorithms have both intentional biases as well as unintentional ones. And at a certain threshold efforts to reduce bias will frequently increase variance, and alternatively efforts to reduce variance will, often, increase bias. This is sometimes referred to as the Bias–Variance Conflict or the Bias–Variance Tradeoff, and it is a key contributor to single-experts’ glass ceilings.
This trade-off can be seen in Chart 2, where the point of lowest Total Error (black line, equal to the errors from Bias plus Variance) is prevented from reaching an optimal level of reduced error because:
1) as the Bias error is reduced to a minimum, Variance error increases exponentially;
2) as the Variance error is reduced to a minimum, Bias error increases exponentially.
Chart 2: Bias-Variance Tradeoff
Because of the Bias–Variance Conflict, other industries learned a long time ago that single predictive engines are structurally sub-optimal for solving complex predictive challenges. This is not conjecture, but settled science.
The solution to the Bias-Variance Conflict is a four-decades-old mathematical concept known as Ensemble Methods.
Building superior predictive engines is an activity for which virtually every industry in the world has become expert. The investment industry can leverage this broad set of insights and expertise by reframing the role of portfolio managers from “picking stocks that are predicted to outperform”, to “building security-based predictive forecasts”. The activity is the same, but the latter perspective opens the door to capture critical lessons from other industries, and pass those insights along to the world of investing.
This is where creating a multi-expert system through Ensemble Methods’ tools and techniques changes the dynamic. One of the more digestible concepts is ‘bias diversification’. Ensemble Methods actively link together multiple independent predictors, each with its own set of intentional and unintentional biases. Embedded diversification will allow the multiple biases to offset and partially neutralize each other, creating a new solution with a smaller bias, and without a reciprocal increase in variance.
2 Additional sources of information – Bias-Variance Conflict:
Kohavi, Ron; Wolpert, David H. (1996). "Bias Plus Variance Decomposition for Zero-One Loss Functions". Semantic Scholar.
Luxburg, Ulrike V.; Schölkopf, B. (2011). "Statistical learning theory: Models, Concepts, and Results". Handbook of the History of Logic. 10: Section 2.4. ScienceDirect.
Vapnik, Vladimir (2000). “The Nature of Statistical Learning Theory”. New York: Springer-Verlag. doi:10.1007/978-1-4757-3264-1. ISBN 978-1-4757-3264-1. S2CID 7138354. Semantic Scholar.
Vijayakumar, Sethu (2007). "The Bias–Variance Tradeoff" (PDF). University of Edinburgh. Retrieved 19 August 2014. pdf
Gagliardi, Francesco (May 2011). "Instance-based classifiers applied to medical databases: Diagnosis and knowledge extraction". Artificial Intelligence in Medicine. 52 (3): 123–139. doi:10.1016/j.artmed.2011.04.002. PMID 21621400. pdf
Fortmann-Roe, Scott (2012). “Understanding the Bias-Variance Tradeoff”.
Brain, Damian; Webb, Geoffrey (2002). “The Need for Low Bias Algorithms in Classification Learning from Large Data Sets”. Proceedings of the Sixth European Conference on Principles of Data Mining and Knowledge Discovery (PKDD 2002). SpringerLink
High Conviction Stock Selections
The research regarding manager skill is substantial. While most research indicates that active managers traditionally underperform in aggregate, there is a powerful set of research that indicates that managers’ highest conviction stock selections relative to the benchmark have the ability to generate persistent alpha. In addition to the information provided on the home page, the following documents provide additional supporting information.
3 Additional sources of information – High Conviction Stock Selections:
M. Anton, R. Cohen, C. Polk, (2021), “Best Ideas,” SSRN-id1364827. pdf
M. Cremers, A. Petajisto (2006), “How Active is Your Fund Manager? A New Measure That Predicts Performance” SSRN-id 891719 pdf
A. Panchekha (2019) “The Active Manager Paradox: High Conviction Overweight Positions.” CFA Institute.
S. Engstrom (2004), “Does Active Portfolio Management Create Value? An Evaluation of Fund Managers’ Decisions,” SSE/EFI Working Paper Series in Economics and Finance no. 553. pdf
Alexander, Gordon, Gjergji Cici, and Scott Gibson, 2006, Does Motivation Matter When Assessing Trade Performance? An Analysis of Mutual Funds. pdf
Berk, Jonathan and Jules van Binsbergen, 2014, Measuring Skill in the Mutual Fund Industry pdf
Carhart, Mark, 2012, On Persistence in Mutual Fund Performance, The Journal of Finance
Chen, Hsiu-Lang, Narasimhan Jegadeesh, and Russ Wermers, 2000, The Value of Active Mutual Fund Management, Journal of Financial and Quantitative Analysis 35 343–368. JFQA .
Pomorski, Lukasz, 2009, Acting on the Most Valuable Information: ‘Best Idea’ Trades of Mutual Fund Managers, University of Toronto. ssrn_id1366067.pdf