The Art and Science of Choosing the Right Data Science Model

Ariel K
Sep 5, 2023
2 min read

Selecting the most appropriate machine learning or statistical model for a given data science challenge is far from straightforward. Experienced data scientists outline some of the multifaceted complexities involved in model selection:

Overview First

"The biggest mistakes happen when people dive into modeling without investing time upfront in understanding the business problem, available data, and metrics for success," says Angela Moss, principal data scientist at InsightForecast.

"You can end up blindly testing every model under the sun without a clear direction. Grounding yourself in the goals, variables, and data realities steers you toward the most promising models from the start."

Mapping Methods to Problems

With so many sophisticated algorithms now available, matching specific models to the problem at hand takes practice.

"Learning when logistic regression is superior to neural networks versus when support vector machines are optimal is a nuanced skill," notes David Chang, analytics lead at FIG Insights.

"It requires a level of discernment to align the strengths and limitations of various techniques to the problem context. There are very few one-size-fits-all answers in model selection."

Avoiding Overcomplication

The temptation to unnecessarily complicate model choices is real, adds Moss:

"Often a well-tuned random forest will outperform deep neural networks for tabular data. Just because a method is new or bleeding-edge doesn't mean it's the best fit. I always ask whether a simpler model can reach the needed accuracy."

Trusting Your Intuition

While analytic rigor is mandatory, Chang believes room remains for intuition:

"You develop a sixth sense from experience - a gut feel for which models naturally lend themselves to certain data traits and response variables. Honing this intuitive judgment accelerates the path to an optimal model."

Evaluating Multiple Approaches

Ultimately identifying the right model requires experimentation. "We take a 'bake-off' approach, assessing a shortlist of 3-4 promising models side-by-side on an evaluation set," says Moss.

"Seeing actual performance on your data overrides assumptions. But you must invest time upfront narrowing the options through analysis before baking off models."

In the end, choosing the right data science model mirrors data science as a whole

Choosing the right data science model requires blending art, science, experience and business understanding. While complex, model selection is a learnable skill. As Chang summarizes, "the more models you train, the better calibration your model selection muscle develops."

Contact Random Forest Services today and let our Data Science experts help you choose the right data science model to use and set your project on the right path.