The relationship between Machine Learning and Operations Research

May. 2018

1. What’s the differences between operations research and machine learning?

My leader asked me this question during the internship interview. My answer is: 'machine learning focuses on mining potential distributions from data to respond to unknown areas, and the key is fitting. However, operations research focuses on the abstraction of the problem and establishing a mathematical model that can accurately describe the problem, then applies relevant algorithms to obtain the optimal solutions. My leader then asked: 'machine learning also needs to establish a mathematical model, why it can only get the approximate solution with errors? Why operations research can obtain the optimal solution? What causes this difference?'
I was not able to answer as I just came into contact with machine learning, more precisely, the algorithm, at that time. I only knew that the classical machine learning models include logistic regression, SVM and decision tree. The optimization algorithm has gradient descent (logistic regression), maximum information gain (tree model), and BP (neural network). In operations research, the classical models include TSP, Knapsack, Bin Packing and Scheduling. The algorithms include heuristic, meta-heuristic, hyper-heuristic, exact algorithm, etc.I was confused by the relationship between the two subjects at the beginning. As I learned more, I think they are similar in some sense. Definition: the key difference between machine learning and operations research is the strength of prior knowledge. Based on the gap, the two have their own methodologies.The typical problem-solving method is building mathematical model (abstracting problems) -> getting data -> designing algorithms (solving problems) -> analyzing results. I am going to explain their relationship in the model and algorithm levels.
1.1 Model level

The real problems can be divided into two categories: discrete optimization and continuous optimization. Both are contained in OR, mostly about the first one with numerous complex constraints. Machine learning prefers the second one, it usually expects to fit a non-constraint distribution.

Machine learning has to use data-driven operations, as its fuzzy understanding about the problem. But in OR, the solution space is limited by several mathematical formulas defined by the problem, which means that it has prior knowledge.They are both focusing on np-hard problems, such as TSP and Bin Packing in operations research, and KNN, Decision Tree Model in machine learning.

Example 1. Assume that we want to predict the price of a house to make a buy-or-not decision. Strategy 1: The approach of traditional machine learning is first collecting data and extract features, such as house area, average area of the surroundings, the balcony orientation, then building a prediction model, whose objective is getting minimal loss according to the training data, for the house price. Finally, the house price could be predicted. Strategy 2: For example, as a buyer, we expect to buy a house at the lowest price. The constraints are 1) Total Price = A*bedroom_area + B*non_bedroom_area, 2) Total Price <= average_price_in_the_neighborhood*1.2*house_area. It is easier to verify the first strategy as it seeks for the global optimum. The second strategy is better when we are able to define the problem very well.

Example 2. The bin packing problem is a classic problems in OR. One problem is to use the minimum number of bins to load all the items with various sizes. This problem looks easy as the corresponding mathematical model and solving algorithm are mature. Considering the inter-structure of the bins and the size of the items, our goal is using as fewer bins as possible using basic model, set cover model, etc. If the size of the bins and items is unknown, we first need history bin packing data, for example, how many item1, item2, item3, …, itemX were packed in N boxes. Then a binary classification model could be built with item and bin features. To obtain the minimum number of required bins for the items, we first convert items to features, which are the input of the model. The feature Number of Required Bins starts from 0 and increased by 1 until the output is 1, which is the minimum number bins needed for these items is obtained.The examples show that when it comes to abstracting real problems, since operations research has more prior knowledge, such as size of items and bins, its solution are more convincing.
1.2. Algorithm level
There are various algorithms in machine learning and OR. In OR, dynamic programming, Branch and Bound, cut-plane are exact algorithms. Tabu Search, GA, Simulated Annealing are approximate algorithms. When it comes to machine learning, there are gradient descent, SMO, information gain (ID3), Lloyd Algorithm (KNN), etc.There are two categories of algorithms: exact algorithm and approximation algorithm. Both machine learning and operations research have these two algorithms.Supplementary: It’s necessary to combine the background when distinguishing various algorithms. For example, the gradient descent belongs to the exact algorithm when the solution space is convex. The non-convex problem belongs to approximate algorithm.
Exact algorithm is the smart Brute-Force search, which guides the proper search direction. The search would be terminated once it enters bad searching space.Exact algorithm is not popular in machine learning. The reason is that under limited computing resources, it’s difficult to obtain global optimum for large-scale problem. Therefore, there are lots of researches about how to improve the approximate algorithm to produce better solutions. The common idea is to increase the randomness of the algorithm and do trade-off between convergence ability and randomness. Randomness can help the search to get out of the local optimum, but too much randomness would deteriorate algorithms to random search.Exact algorithms often make sense in the problems with small or medium size. It is favored by some researchers because the algorithm not only being convenient to do parallelization but also give the upper bound or lower bound of the problem.For large-scale problems, approximate algorithms (Hill Climbing, etc.) are preferred. To avoid trapping into local optimum, the algorithms can increase the randomness, such as the tabu strategy in Tabu Search algorithm, the mutation in GA, etc.
Therefore, the two subjects have the same core in designing approximate algorithm that is balancing the tradeoff between convergence ability and randomness.Supplement: The balance also exists in exact algorithm, such as branching strategy in Branch and Bound.
1.3. Summary

Q1: What’s the main difference between machine learning and operations research?
A1: The strength of prior knowledge.

Q2: Why does machine learning always have error, but OR can get the exact solution?
A2: First, we should know that error does not mean inaccurate solution. Besides, the solution from operations research also has errors. In machine learning, it is difficult to get the accurate solution when the data volume is large. Even we are able to obtain the accurate solution, error still exists because the gap between real situations and the models, which is caused by the lack of prior knowledge. Operations research model has constraints, which limit the solution space. However, the solution might not be the one can really solve the problems. On the other hand, there might be errors in operations research solutions when facing large-scale problems with limited computer resources.

The complexity of machine learning comes from the large training data. In OR, it comes from various constraints and complex objective. They are similar in algorithm designing.In a broad sense, machine learning is an application of operations research. And more broadly broader sense, they both belong to math.
2. Introduction to global optimum and approximate optimum

2.1. Global optimum

It’s difficult to obtain the global optimal solution for the real problems.There are some doubts for exact algorithm, such as:
a. Whether the abstract mathematical model can reflect the reality.
b. Whether we can obtain the global optimal solution of the problem under limited computing resources.

Regarding the first doubt, professor Ke Tang said:Using mathematical programming to get the exact solution of approximate problems. Using evolutionary algorithm to get the approximate solution of exact problems.Professor Tang gave an example of binary classification problem in machine learning. He said:’It is easy to know that the original problem’s loss function is the step function (true or false). We have to generate an approximate problem that replacing step loss with sigmoid, as the step loss function is indifferentiable, which make the model can be solved by gradient descent.’Not only the objectives, but also the constraints are difficult to be defined as mathematical formulas, such as A >= B xor (C and D).For b, algorithm has difficulties to converge when the model is very complex with plenty of decision variables and constraints. So some general solvers will provide API for terminating search anytime such as Time Limit and Max Gap. However, the obtained solution isn’t global optimum
2.2. Approximate optimum

Approximate algorithm: A striver who struggles but lacks a soul.
As mentioned above, the core of approximate algorithm is the trade-off between convergence ability and randomness. It has higher recognition than the exact algorithm in industry, managers care more about the improvement rather than global optimum.
3. Examples about the relationship

For some parts of algorithms that requiring lots of computer resources, it is better to train an evaluation function using learning method to approximate it.

3.1. Exact algorithm

a. For the searching order in Branch and Bound, a good strategy can speed up the process of convergence and pruning. In learning to search in branch and bound Algorithms, the author trains an adaptive strategy to replace the traditional strategy, such as BFS and DFS.
b. Branching strategy is one of the most important components in Branch and Bound, whose key objective is making solution space on both branching sides uniform. For the reason that good branching strategy often requires a lot of computing resources, some research propose learning an evaluation function of branching strategy such as A machine learning -based approximation of strong branching, Learning to branch in mixed integer programming and Towards Learning Integral Strategy of Branch and Bound.
c. Based on the Decomposition (Dantzig-Wolfe), such as Learning where to use a Decomposition.
d. Based on Column Generation, some studies train models to simulate pricing problem and modify it.
e. Using Reinforcement Learning to solve TSP or Dynamic Job Shop Problem, such as Multi-objective scheduling of dynamic job shop using variable neighborhood search.

3.2. In heuristic algorithm

a. In Small Boxes Big Data: A Deep Learning Approach to Optimize Variables Sized Bin Packing, the author trains a classification model by using neural network to find best heuristic algorithm(in 8-existing heuristic algorithms) to solve the given benchmark.