Skip to main content

Advertisement

Springer Nature Link
Log in
Menu
Find a journal Publish with us Track your research
Search
Saved research
Cart
  1. Home
  2. Machine Learning
  3. Article

Sparse Regression Ensembles in Infinite and Finite Hypothesis Spaces

  • Published: July 2002
  • Volume 48, pages 189–218, (2002)
  • Cite this article
Download PDF
Save article
View saved research
Machine Learning Aims and scope Submit manuscript
Sparse Regression Ensembles in Infinite and Finite Hypothesis Spaces
Download PDF
  • Gunnar Rätsch1,
  • Ayhan Demiriz2 &
  • Kristin P. Bennett3 
  • 2913 Accesses

  • 34 Citations

  • Explore all metrics

Abstract

We examine methods for constructing regression ensembles based on a linear program (LP). The ensemble regression function consists of linear combinations of base hypotheses generated by some boosting-type base learning algorithm. Unlike the classification case, for regression the set of possible hypotheses producible by the base learning algorithm may be infinite. We explicitly tackle the issue of how to define and solve ensemble regression when the hypothesis space is infinite. Our approach is based on a semi-infinite linear program that has an infinite number of constraints and a finite number of variables. We show that the regression problem is well posed for infinite hypothesis spaces in both the primal and dual spaces. Most importantly, we prove there exists an optimal solution to the infinite hypothesis space problem consisting of a finite number of hypothesis. We propose two algorithms for solving the infinite and finite hypothesis problems. One uses a column generation simplex-type algorithm and the other adopts an exponential barrier approach. Furthermore, we give sufficient conditions for the base learning algorithm and the hypothesis set to be used for infinite regression ensembles. Computational results show that these methods are extremely promising.

Article PDF

Download to read the full article text

Similar content being viewed by others

On Optimizing Ensemble Models using Column Generation

Article Open access 22 February 2024

Learning from non-random data in Hilbert spaces: an optimal recovery perspective

Article 26 April 2022

Penalized Estimation of a Finite Mixture of Linear Regression Models

Chapter © 2023

Explore related subjects

Discover the latest articles, books and news in related subjects, suggested using machine learning.
  • Algorithms
  • Continuous Optimization
  • Linear Models and Regression
  • Learning algorithms
  • Machine Learning
  • Statistical Learning

References

  • Bauer, E., & Kohavi, R. (1999). An empirical comparison of voting classification algorithm: Bagging, boosting and variants. Machine Learning, 36, 105–142.

    Google Scholar 

  • Bennett, K., Demiriz, A., & Shawe-Taylor, J. (2000). A column generation algorithm for boosting. In Pat Langley (Ed.), Proceedings Seventeenth International Conference on Machine Learning (pp. 65–72). San Francisco: Morgan Kaufmann.

    Google Scholar 

  • Bertoni, A., Campadelli, P., & Parodi, M. (1997). Aboosting algorithm for regression. In W. Gerstner, A. Germond, M. Hasler, & J.-D. Nicoud (Eds.), Proceedings ICANN'97, Int. Conf. on Artificial Neural Networks, Vol. V of LNCS (pp. 343–348), Berlin: Springer.

    Google Scholar 

  • Bertsekas, D. (1995). Nonlinear programming. Belmont, MA: Athena Scientific.

    Google Scholar 

  • Bradley, P., Mangasarian, O., & Rosen, J. (1998). Parsimonious least norm approximation. Computational Optimization and Applications, 11:1, 5–21.

    Google Scholar 

  • Breiman, L. (1999). Prediction games and arcing algorithms. Neural Computation, 11:7, 1493–1518. Also Technical Report 504, Statistics Department, University of California, Berkeley.

    Google Scholar 

  • Breneman, C., Sukumar, N., Bennett, K., Embrechts, M., Sundling, M., & Lockwood, L. (2000). Wavelet representations of molecular electronic properties: Applications in ADME, QSPR, and QSAR. Presentation, QSAR in Cells Symposium of the Computers in Chemistry Division's 220th American Chemistry Society National Meeting.

  • Censor,Y., & Zenios, S. (1997). Parallel optimization: Theory, algorithms and application. Numerical Mathematics and Scientific Computation. Oxford: Oxford University Press.

    Google Scholar 

  • Chen, S., Donoho, D., & Saunders, M. (1995). Atomic decomposition by basis pursuit Technical Report 479, Department of Statistics, Stanford University.

  • Collins, M., Schapire, R., & Singer,Y. (2000). Adaboost and logistic regression unified in the context of information geometry. In Proceedings of the Thirteenth Annual Conference on Computational Learning Theory.

  • Cominetti, R., & Dussault, J.-P. (1994). A stable exponential penalty algorithm with superlinear convergence J.O.T.A., 83:2.

  • Demiriz, A., Bennett, K., Breneman, C., & Embrechts, M. (2001). Support vector machine regression in chemometrics. Computer Science and Statistics. In Proceeding of the Conference on the 32 Symposium on the Interface, to appear.

  • Dietterich, T. (1999). An experimental comparison of three methods for constructing ensembles of decision trees: Bagging, boosting, and randomization. Machine Learning, 40:2.

  • Drucker, H., Schapire, R., & Simard, P. (1993). Boosting performance in neural networks. International Journal of Pattern Recognition and Artificial Intelligence, 7, 705–719.

    Google Scholar 

  • Duffy, N., & Helmbold, D. (2000). Leveraging for regression. In Colt'00 (pp. 208-219).

  • Embrechts, M., Kewley, R., & Breneman, C. (1998). Computationally intelligent data mining for the automated design and discovery of novel pharmaceuticals. In C. D. et al. (Ed.), Intelligent engineering systems through artifical neural networks, pp. 391-396. ASME Press.

  • Fisher, J., D. H. (Ed.). Improving regressors using boosting techniques. In Proceedings of the Fourteenth International Conference on Machine Learning.

  • Frean, M., & Downs, T. (1998). A simple cost function for boosting. Technical Report, Department of Computer Science and Electrical Engineering, University of Queensland.

  • Freund, Y., & Schapire, R. (1996). Game theory, on-line prediction and boosting. In COLT. San Mateo, CA: Morgan Kaufman. ACM Press, New York, NY, pp. 325–332.

    Google Scholar 

  • Freund, Y., & Schapire, R. (1994). A decision-theoretic generalization of on-line learning and an application to boosting. In EuroCOLT: European Conference on Computational Learning Theory. LNCS.

  • Freund, Y., & Schapire, R. (1996). Experiments with a new boosting algorithm. In Proc. 13th International Conference on Machine Learning (pp. 148–146). San Mateo, CA: Morgan Kaufmann.

    Google Scholar 

  • Friedman, J., Hastie, T., & Tibshirani, R. (1998). Additive logistic regression: A statistical view of boosting. Technical Report, Department of Statistics, Sequoia Hall, Stanford Univerity.

  • Friedman, J. (1999). Greedy function approximation. Technical Report, Department of Statistics, Stanford University.

  • Frisch, K. (1955). The logarithmic potential method of convex programming. Memorandum, University Institute of Economics, Oslo.

    Google Scholar 

  • Grove, A., & Schuurmans, D. (1998). Boosting in the limit: Maximizing the margin of learned ensembles. In Proceedings of the Fifteenth National Conference on Artifical Intelligence.

  • Hettich, R., & Kortanek, K. (1993). Semi-infinite programming: Theory, methods and applications. SIAM Review, 3, 380–429.

    Google Scholar 

  • Kaliski, J. A., Haglin, D. J., Roos, C., & Terlaky, T. (1997). Logarithmic barrier decomposition methods for semi-infinite programming. International Transactions in Operational Research, 4:4, 285–303.

    Google Scholar 

  • Kivinen, J., & Warmuth, M. (1999). Boosting as entropy projection. In Proc. 12th Annual Conference on Computational Learning Theory (pp. 134–144). New York: ACM Press.

    Google Scholar 

  • LeCun, Y., Jackel, L. D., Bottou, L., Brunot, A., Cortes, C., Denker, J., Drucker, H., Guyon, I., Müller, U. A., Säckinger, E., Simard, P., & Vapnik, V. (1995). Comparison of learning algorithms for handwritten digit recognition. In F. Fogelman-Soulié, & P. Gallinari (Eds.), Proceedings ICANN'95-International Conference on Artificial Neural Networks (Vol. II, pp. 53–60). Nanterre, France. EC2.

    Google Scholar 

  • Luenberger, D. (1984). Linear and nonlinear programming (2nd edn.). Reading: Addison-Wesley Publishing Co., Reprinted with corrections in May, 1989.

  • Mackey, M. C., & Glass, L. (1977). Oscillation and chaos in physiological control systems. Science, 197, 287–289.

    Google Scholar 

  • Maclin, R., & Opitz, D. (1997). An empirical evaluation of bagging and boosting. In Proc. of AAAI.

  • Mason, L., Bartlett, P., & Baxter, J. (1998). Improved generalization through explicit optimization of margins. Technical Report, Deparment of Systems Engineering, Australian National University.

  • Mason, L., Baxter, J., Bartlett, P., & Frean, M. (1999). Functional gradient techniques for combining hypotheses. In A. Smola, P. Bartlett, B. Schölkopf, & D. Schuurmans (Eds.), Advances in large margin classifiers (pp. 221–247). Cambridge, MA: MIT Press.

    Google Scholar 

  • Mika, S., Rätsch, G., & Müller, K.-R. (2001). A mathematical programming approach to the Kernel Fisher algorithm. In T. K. Leen, T. G. Dietterich, & V. Tresp (Eds.), Advances in Neural Information Processing Systems, 13, 591–597.

  • Mosheyev, L., & Zibulevsky, M. (2000). Penalty/barrier multiplier algorithm for semidefinite programming. Optimization Methods and Software, 13:4, 235–262.

    Google Scholar 

  • Müller, K.-R., Kohlmorgen, J., & Pawelzik, K. (1995). Analysis of switching dynamics with competing neural networks. IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences, E78-A:10, 1306–1315.

    Google Scholar 

  • Müller, K.-R., Smola, A., Rätsch, G., Schölkopf, B., Kohlmorgen, J., & Vapnik, V. (1999). Predicting time series with support vector machines. In B. Schölkopf, C. J. C. Burges, & A. J. Smola (Eds.), Advances in Kernel methods-support vector learning (pp. 243–254). Cambridge, MA: MIT Press. Short version appeared in ICANN'97, Springer Lecture Notes in Computer Science.

    Google Scholar 

  • Müller, K.-R., Smola, A., Rätsch, G., Schölkopf, B., Kohlmorgen, J., & Vapnik, V. (1997). Predicting time series with support vector machines. In W. Gerstner, A. Germond, M. Hasler, & J.-D. Nicoud (Eds.), Artificial neural networks-ICANN'97 (pp. 999–1004). Berlin: Springer. Lecture Notes in Computer Science, Vol. 1327.

    Google Scholar 

  • Pawelzik, K., Kohlmorgen, J., & Müller, K.-R. (1996). Annealed competition of experts for a segmentation and classification of switching dynamics. Neural Computation, 8:2, 342–358.

    Google Scholar 

  • Rätsch, G. (2001). Robust boosting via convex optimization. Ph.D. Thesis, University of Potsdam, Neues Palais 10, 14469 Potsdam, Germany.

    Google Scholar 

  • Rätsch, G., Onoda, T., & Müller, K.-R. (2001). Soft margins for AdaBoost. Machine Learning, 42:3, 287–320. also NeuroCOLT Technical Report NC-TR-1998-021.

    Google Scholar 

  • Rätsch, G., Schölkopf, B., Mika, S., & Müller, K.-R. (2000a). SVM and boosting: One class. Technical report 119, GMD FIRST, Berlin. Accepted for publication in IEEE TPAMI.

  • Rätsch, G., Schölkopf, B., Smola, A., Mika, S., Onoda, T., & Müller, K.-R. (2000b). Robust ensemble learning. In A. Smola, P. Bartlett, B. Schölkopf, & D. Schuurmans (Eds.), Advances in large margin classifiers (pp. 207–219). Cambridge, MA: MIT Press.

    Google Scholar 

  • Rätsch, G. R., & Warmuth, M. K. (2001). Marginal boosting. Royal Holloway College, NeuroCOLT2 Technical report, 97. London.

  • Rätsch, G., Warmuth, M., Mika, S., Onoda, T., Lemm, S., & Müller, K.-R. (2000). Barrier boosting. In COLT'2000 (pp. 170–179). San Mateo, CA: Morgan Kaufmann.

    Google Scholar 

  • Ridgeway, G. D., & Madigan, T. R. (1999). Boosting methodology for regression problems. In D. Heckerman, & J. Whittaker (Eds.), Proceedings of Artificial Intelligence and Statistics '99 (pp. 152-161). http:/www.rand.org/methodology/stat/members/gregr.

  • Schapire, R., Freund,Y., Bartlett, P., & Lee,W. (1997). Boosting the margin:Anewexplanation for the effectiveness of voting methods. In Proc. 14th International Conference on Machine Learning (pp. 322–330). San Mateo, CA: Morgan Kaufmann.

    Google Scholar 

  • Schölkopf, B., Burges, C., & Smola, A. (Eds.). (1999). Advances in Kernel methods-support vector learning. Cambridge, MA: MIT Press.

    Google Scholar 

  • Schölkopf, B., Smola, A., Williamson, R. C., & Bartlett, P. L. (2000). New support vector algorithms. Neural Computation, 12, 1207–1245.

    Google Scholar 

  • Schwenk, H., & Bengio, Y. (1997). AdaBoosting neural networks. In W. Gerstner, A. Germond, M. Hasler, & J.-D. Nicoud (Eds.), Proc. of the Int. Conf. on Artificial Neural Networks (ICANN'97), Vol. 1327 of LNCS (pp. 967–972). Berlin: Springer.

    Google Scholar 

  • Smola, A. J. (1998). Learning with Kernels. Ph.D. Thesis, Technische Universit¨at Berlin.

  • Smola, A., Schölkopf, B., & Rätsch, G. (1999). Linear programs for automatic accuracy control in regression. In Proceedings ICANN'99, Int. Conf. on Artificial Neural Networks, Berlin: Springer.

    Google Scholar 

  • Vapnik, V. (1995). The nature of statistical learning theory. New York: Springer Verlag.

    Google Scholar 

  • Weigend, A., & N. A. Gershenfeld (Eds.) (1994). Time series prediction: Forecasting the future and understanding the past. Addison-Wesley. Santa Fe Institute Studies in the Sciences of Complexity.

    Google Scholar 

  • Zemel, R., & Pitassi, T. (2001). A gradient-based boosting algorithm for regression problems. In T. K. Leen, T. G. Dietterich, & V. Tresp (Eds.), Advances in Neural Information Precessing Systems 13 (pp. 696–702). Cambridge, MA: MIT Press.

    Google Scholar 

Download references

Author information

Authors and Affiliations

  1. GMD FIRST, Kekulèstr. 7, 12489, Berlin, Germany

    Gunnar Rätsch

  2. Department of Decision Sciences and Eng. Systems, Rensselaer Polytechnic Institute, Troy, NY, 12180, USA

    Ayhan Demiriz

  3. Department of Mathematical Sciences, Rensselaer Polytechnic Institute, Troy, NY, 12180, USA

    Kristin P. Bennett

Authors
  1. Gunnar Rätsch
    View author publications

    Search author on:PubMed Google Scholar

  2. Ayhan Demiriz
    View author publications

    Search author on:PubMed Google Scholar

  3. Kristin P. Bennett
    View author publications

    Search author on:PubMed Google Scholar

Rights and permissions

Reprints and permissions

About this article

Cite this article

Rätsch, G., Demiriz, A. & Bennett, K.P. Sparse Regression Ensembles in Infinite and Finite Hypothesis Spaces. Machine Learning 48, 189–218 (2002). https://doi.org/10.1023/A:1013907905629

Download citation

  • Issue date: July 2002

  • DOI: https://doi.org/10.1023/A:1013907905629

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • ensemble learning
  • boosting
  • regression
  • sparseness
  • semi-infinite programming

Advertisement

Search

Navigation

  • Find a journal
  • Publish with us
  • Track your research

Discover content

  • Journals A-Z
  • Books A-Z

Publish with us

  • Journal finder
  • Publish your research
  • Language editing
  • Open access publishing

Products and services

  • Our products
  • Librarians
  • Societies
  • Partners and advertisers

Our brands

  • Springer
  • Nature Portfolio
  • BMC
  • Palgrave Macmillan
  • Apress
  • Discover
  • Your US state privacy rights
  • Accessibility statement
  • Terms and conditions
  • Privacy policy
  • Help and support
  • Legal notice
  • Cancel contracts here

104.23.243.59

Not affiliated

Springer Nature

© 2026 Springer Nature