Sparse Regression Ensembles in Infinite and Finite Hypothesis Spaces

Rätsch, Gunnar; Demiriz, Ayhan; Bennett, Kristin P.

doi:10.1023/A:1013907905629

Sparse Regression Ensembles in Infinite and Finite Hypothesis Spaces

Published: July 2002

Volume 48, pages 189–218, (2002)
Cite this article

Download PDF

Save article

View saved research

Machine Learning Aims and scope Submit manuscript

Sparse Regression Ensembles in Infinite and Finite Hypothesis Spaces

Download PDF

Gunnar Rätsch¹,
Ayhan Demiriz² &
Kristin P. Bennett³

2913 Accesses
34 Citations
Explore all metrics

Abstract

We examine methods for constructing regression ensembles based on a linear program (LP). The ensemble regression function consists of linear combinations of base hypotheses generated by some boosting-type base learning algorithm. Unlike the classification case, for regression the set of possible hypotheses producible by the base learning algorithm may be infinite. We explicitly tackle the issue of how to define and solve ensemble regression when the hypothesis space is infinite. Our approach is based on a semi-infinite linear program that has an infinite number of constraints and a finite number of variables. We show that the regression problem is well posed for infinite hypothesis spaces in both the primal and dual spaces. Most importantly, we prove there exists an optimal solution to the infinite hypothesis space problem consisting of a finite number of hypothesis. We propose two algorithms for solving the infinite and finite hypothesis problems. One uses a column generation simplex-type algorithm and the other adopts an exponential barrier approach. Furthermore, we give sufficient conditions for the base learning algorithm and the hypothesis set to be used for infinite regression ensembles. Computational results show that these methods are extremely promising.

References

Bauer, E., & Kohavi, R. (1999). An empirical comparison of voting classification algorithm: Bagging, boosting and variants. Machine Learning, 36, 105–142.
Google Scholar
Bennett, K., Demiriz, A., & Shawe-Taylor, J. (2000). A column generation algorithm for boosting. In Pat Langley (Ed.), Proceedings Seventeenth International Conference on Machine Learning (pp. 65–72). San Francisco: Morgan Kaufmann.
Google Scholar
Bertoni, A., Campadelli, P., & Parodi, M. (1997). Aboosting algorithm for regression. In W. Gerstner, A. Germond, M. Hasler, & J.-D. Nicoud (Eds.), Proceedings ICANN'97, Int. Conf. on Artificial Neural Networks, Vol. V of LNCS (pp. 343–348), Berlin: Springer.
Google Scholar
Bertsekas, D. (1995). Nonlinear programming. Belmont, MA: Athena Scientific.
Google Scholar
Bradley, P., Mangasarian, O., & Rosen, J. (1998). Parsimonious least norm approximation. Computational Optimization and Applications, 11:1, 5–21.
Google Scholar
Breiman, L. (1999). Prediction games and arcing algorithms. Neural Computation, 11:7, 1493–1518. Also Technical Report 504, Statistics Department, University of California, Berkeley.
Google Scholar
Breneman, C., Sukumar, N., Bennett, K., Embrechts, M., Sundling, M., & Lockwood, L. (2000). Wavelet representations of molecular electronic properties: Applications in ADME, QSPR, and QSAR. Presentation, QSAR in Cells Symposium of the Computers in Chemistry Division's 220th American Chemistry Society National Meeting.
Censor,Y., & Zenios, S. (1997). Parallel optimization: Theory, algorithms and application. Numerical Mathematics and Scientific Computation. Oxford: Oxford University Press.
Google Scholar
Chen, S., Donoho, D., & Saunders, M. (1995). Atomic decomposition by basis pursuit Technical Report 479, Department of Statistics, Stanford University.
Collins, M., Schapire, R., & Singer,Y. (2000). Adaboost and logistic regression unified in the context of information geometry. In Proceedings of the Thirteenth Annual Conference on Computational Learning Theory.
Cominetti, R., & Dussault, J.-P. (1994). A stable exponential penalty algorithm with superlinear convergence J.O.T.A., 83:2.
Demiriz, A., Bennett, K., Breneman, C., & Embrechts, M. (2001). Support vector machine regression in chemometrics. Computer Science and Statistics. In Proceeding of the Conference on the 32 Symposium on the Interface, to appear.
Dietterich, T. (1999). An experimental comparison of three methods for constructing ensembles of decision trees: Bagging, boosting, and randomization. Machine Learning, 40:2.
Drucker, H., Schapire, R., & Simard, P. (1993). Boosting performance in neural networks. International Journal of Pattern Recognition and Artificial Intelligence, 7, 705–719.
Google Scholar
Duffy, N., & Helmbold, D. (2000). Leveraging for regression. In Colt'00 (pp. 208-219).
Embrechts, M., Kewley, R., & Breneman, C. (1998). Computationally intelligent data mining for the automated design and discovery of novel pharmaceuticals. In C. D. et al. (Ed.), Intelligent engineering systems through artifical neural networks, pp. 391-396. ASME Press.
Fisher, J., D. H. (Ed.). Improving regressors using boosting techniques. In Proceedings of the Fourteenth International Conference on Machine Learning.
Frean, M., & Downs, T. (1998). A simple cost function for boosting. Technical Report, Department of Computer Science and Electrical Engineering, University of Queensland.
Freund, Y., & Schapire, R. (1996). Game theory, on-line prediction and boosting. In COLT. San Mateo, CA: Morgan Kaufman. ACM Press, New York, NY, pp. 325–332.
Google Scholar
Freund, Y., & Schapire, R. (1994). A decision-theoretic generalization of on-line learning and an application to boosting. In EuroCOLT: European Conference on Computational Learning Theory. LNCS.
Freund, Y., & Schapire, R. (1996). Experiments with a new boosting algorithm. In Proc. 13th International Conference on Machine Learning (pp. 148–146). San Mateo, CA: Morgan Kaufmann.
Google Scholar
Friedman, J., Hastie, T., & Tibshirani, R. (1998). Additive logistic regression: A statistical view of boosting. Technical Report, Department of Statistics, Sequoia Hall, Stanford Univerity.
Friedman, J. (1999). Greedy function approximation. Technical Report, Department of Statistics, Stanford University.
Frisch, K. (1955). The logarithmic potential method of convex programming. Memorandum, University Institute of Economics, Oslo.
Google Scholar
Grove, A., & Schuurmans, D. (1998). Boosting in the limit: Maximizing the margin of learned ensembles. In Proceedings of the Fifteenth National Conference on Artifical Intelligence.
Hettich, R., & Kortanek, K. (1993). Semi-infinite programming: Theory, methods and applications. SIAM Review, 3, 380–429.
Google Scholar
Kaliski, J. A., Haglin, D. J., Roos, C., & Terlaky, T. (1997). Logarithmic barrier decomposition methods for semi-infinite programming. International Transactions in Operational Research, 4:4, 285–303.
Google Scholar
Kivinen, J., & Warmuth, M. (1999). Boosting as entropy projection. In Proc. 12th Annual Conference on Computational Learning Theory (pp. 134–144). New York: ACM Press.
Google Scholar
LeCun, Y., Jackel, L. D., Bottou, L., Brunot, A., Cortes, C., Denker, J., Drucker, H., Guyon, I., Müller, U. A., Säckinger, E., Simard, P., & Vapnik, V. (1995). Comparison of learning algorithms for handwritten digit recognition. In F. Fogelman-Soulié, & P. Gallinari (Eds.), Proceedings ICANN'95-International Conference on Artificial Neural Networks (Vol. II, pp. 53–60). Nanterre, France. EC2.
Google Scholar
Luenberger, D. (1984). Linear and nonlinear programming (2nd edn.). Reading: Addison-Wesley Publishing Co., Reprinted with corrections in May, 1989.
Mackey, M. C., & Glass, L. (1977). Oscillation and chaos in physiological control systems. Science, 197, 287–289.
Google Scholar
Maclin, R., & Opitz, D. (1997). An empirical evaluation of bagging and boosting. In Proc. of AAAI.
Mason, L., Bartlett, P., & Baxter, J. (1998). Improved generalization through explicit optimization of margins. Technical Report, Deparment of Systems Engineering, Australian National University.
Mason, L., Baxter, J., Bartlett, P., & Frean, M. (1999). Functional gradient techniques for combining hypotheses. In A. Smola, P. Bartlett, B. Schölkopf, & D. Schuurmans (Eds.), Advances in large margin classifiers (pp. 221–247). Cambridge, MA: MIT Press.
Google Scholar
Mika, S., Rätsch, G., & Müller, K.-R. (2001). A mathematical programming approach to the Kernel Fisher algorithm. In T. K. Leen, T. G. Dietterich, & V. Tresp (Eds.), Advances in Neural Information Processing Systems, 13, 591–597.
Mosheyev, L., & Zibulevsky, M. (2000). Penalty/barrier multiplier algorithm for semidefinite programming. Optimization Methods and Software, 13:4, 235–262.
Google Scholar
Müller, K.-R., Kohlmorgen, J., & Pawelzik, K. (1995). Analysis of switching dynamics with competing neural networks. IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences, E78-A:10, 1306–1315.
Google Scholar
Müller, K.-R., Smola, A., Rätsch, G., Schölkopf, B., Kohlmorgen, J., & Vapnik, V. (1999). Predicting time series with support vector machines. In B. Schölkopf, C. J. C. Burges, & A. J. Smola (Eds.), Advances in Kernel methods-support vector learning (pp. 243–254). Cambridge, MA: MIT Press. Short version appeared in ICANN'97, Springer Lecture Notes in Computer Science.
Google Scholar
Müller, K.-R., Smola, A., Rätsch, G., Schölkopf, B., Kohlmorgen, J., & Vapnik, V. (1997). Predicting time series with support vector machines. In W. Gerstner, A. Germond, M. Hasler, & J.-D. Nicoud (Eds.), Artificial neural networks-ICANN'97 (pp. 999–1004). Berlin: Springer. Lecture Notes in Computer Science, Vol. 1327.
Google Scholar
Pawelzik, K., Kohlmorgen, J., & Müller, K.-R. (1996). Annealed competition of experts for a segmentation and classification of switching dynamics. Neural Computation, 8:2, 342–358.
Google Scholar
Rätsch, G. (2001). Robust boosting via convex optimization. Ph.D. Thesis, University of Potsdam, Neues Palais 10, 14469 Potsdam, Germany.
Google Scholar
Rätsch, G., Onoda, T., & Müller, K.-R. (2001). Soft margins for AdaBoost. Machine Learning, 42:3, 287–320. also NeuroCOLT Technical Report NC-TR-1998-021.
Google Scholar
Rätsch, G., Schölkopf, B., Mika, S., & Müller, K.-R. (2000a). SVM and boosting: One class. Technical report 119, GMD FIRST, Berlin. Accepted for publication in IEEE TPAMI.
Rätsch, G., Schölkopf, B., Smola, A., Mika, S., Onoda, T., & Müller, K.-R. (2000b). Robust ensemble learning. In A. Smola, P. Bartlett, B. Schölkopf, & D. Schuurmans (Eds.), Advances in large margin classifiers (pp. 207–219). Cambridge, MA: MIT Press.
Google Scholar
Rätsch, G. R., & Warmuth, M. K. (2001). Marginal boosting. Royal Holloway College, NeuroCOLT2 Technical report, 97. London.
Rätsch, G., Warmuth, M., Mika, S., Onoda, T., Lemm, S., & Müller, K.-R. (2000). Barrier boosting. In COLT'2000 (pp. 170–179). San Mateo, CA: Morgan Kaufmann.
Google Scholar
Ridgeway, G. D., & Madigan, T. R. (1999). Boosting methodology for regression problems. In D. Heckerman, & J. Whittaker (Eds.), Proceedings of Artificial Intelligence and Statistics '99 (pp. 152-161). http:/www.rand.org/methodology/stat/members/gregr.
Schapire, R., Freund,Y., Bartlett, P., & Lee,W. (1997). Boosting the margin:Anewexplanation for the effectiveness of voting methods. In Proc. 14th International Conference on Machine Learning (pp. 322–330). San Mateo, CA: Morgan Kaufmann.
Google Scholar
Schölkopf, B., Burges, C., & Smola, A. (Eds.). (1999). Advances in Kernel methods-support vector learning. Cambridge, MA: MIT Press.
Google Scholar
Schölkopf, B., Smola, A., Williamson, R. C., & Bartlett, P. L. (2000). New support vector algorithms. Neural Computation, 12, 1207–1245.
Google Scholar
Schwenk, H., & Bengio, Y. (1997). AdaBoosting neural networks. In W. Gerstner, A. Germond, M. Hasler, & J.-D. Nicoud (Eds.), Proc. of the Int. Conf. on Artificial Neural Networks (ICANN'97), Vol. 1327 of LNCS (pp. 967–972). Berlin: Springer.
Google Scholar
Smola, A. J. (1998). Learning with Kernels. Ph.D. Thesis, Technische Universit¨at Berlin.
Smola, A., Schölkopf, B., & Rätsch, G. (1999). Linear programs for automatic accuracy control in regression. In Proceedings ICANN'99, Int. Conf. on Artificial Neural Networks, Berlin: Springer.
Google Scholar
Vapnik, V. (1995). The nature of statistical learning theory. New York: Springer Verlag.
Google Scholar
Weigend, A., & N. A. Gershenfeld (Eds.) (1994). Time series prediction: Forecasting the future and understanding the past. Addison-Wesley. Santa Fe Institute Studies in the Sciences of Complexity.
Google Scholar
Zemel, R., & Pitassi, T. (2001). A gradient-based boosting algorithm for regression problems. In T. K. Leen, T. G. Dietterich, & V. Tresp (Eds.), Advances in Neural Information Precessing Systems 13 (pp. 696–702). Cambridge, MA: MIT Press.
Google Scholar

Download references

Author information

Authors and Affiliations

GMD FIRST, Kekulèstr. 7, 12489, Berlin, Germany
Gunnar Rätsch
Department of Decision Sciences and Eng. Systems, Rensselaer Polytechnic Institute, Troy, NY, 12180, USA
Ayhan Demiriz
Department of Mathematical Sciences, Rensselaer Polytechnic Institute, Troy, NY, 12180, USA
Kristin P. Bennett

Authors

Gunnar Rätsch
View author publications
Search author on:PubMed Google Scholar
Ayhan Demiriz
View author publications
Search author on:PubMed Google Scholar
Kristin P. Bennett
View author publications
Search author on:PubMed Google Scholar

Rights and permissions

Reprints and permissions

About this article

Cite this article

Rätsch, G., Demiriz, A. & Bennett, K.P. Sparse Regression Ensembles in Infinite and Finite Hypothesis Spaces. Machine Learning 48, 189–218 (2002). https://doi.org/10.1023/A:1013907905629

Download citation

Issue date: July 2002
DOI: https://doi.org/10.1023/A:1013907905629

Sparse Regression Ensembles in Infinite and Finite Hypothesis Spaces

Abstract

Article PDF

Similar content being viewed by others

On Optimizing Ensemble Models using Column Generation

Learning from non-random data in Hilbert spaces: an optimal recovery perspective

Penalized Estimation of a Finite Mixture of Linear Regression Models

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Sparse Regression Ensembles in Infinite and Finite Hypothesis Spaces

Abstract

Article PDF

Similar content being viewed by others

On Optimizing Ensemble Models using Column Generation

Learning from non-random data in Hilbert spaces: an optimal recovery perspective

Penalized Estimation of a Finite Mixture of Linear Regression Models

Explore related subjects

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Share this article