Author: Ed Orlando
Introduction & Problem Statement
We have all been there. You are at the end of the school semester and the only thing left to do is take that dreaded final exam. If you were like me, you would calculate the minimum exam score needed to receive a certain grade. For example, you might be able to score a 70% on your History exam and still get a B in the class. This is valuable information since you might decide to spend more time studying for other exams if you think a 70% is easily achievable.
In this article, we will solve this very straightforward optimization problem programmatically using R’s ROI package. In the first example, we will find the minimum grade needed on the final exam in order to receive a B in our History class.
In the second example, we will find the minimum final exam grade needed to receive a B (80%) for all four of our classes: History, Science, Math, & English. The goal of this article is to show you how to solve similarly structured optimization problems with various data sets all in one function.
Linear v Non-Linear Solvers
Although the problems presented in this article are technically linear optimization problems, we are going to solve them using customized functions and non-linear solvers. The benefit of this approach is that it offers the most flexibility to optimization problems one faces in the wild.
Linear solver solutions are fairly easy to set up and understand. Linear solvers also find the global optimal solution without the risk of finding only local optimal solutions. However, non-linear problem set ups and solutions allow for conditionals (if-then statements) that are not allowed in linear solver solutions.
Conditional statements are very common in many business-type optimization problems. Therefore, the solutions presented below concentrate solely on setting up customized functions and non-linear solver solutions that offer the most flexibility.
R Versus Excel
Excel’s solver works great on single data sets and would work great to solve the minimum final exam grade needed to achieve a B in History. In fact, you could hypothetically set up 4 different solvers on 4 different tabs for all four courses and solve each of the independently fairly quickly as well.
However, consider these questions.
- How efficient is Excel if we had to solve for 100 different data sets?
- How do we update our tabs in Excel if our data changes? For example, what if another assignment is graded?
- Why does Excel return a local solution that is 20% less optimal compared to R?
Excel may get bogged down and become very inefficient if you had to run 100 solvers independently on 100 tabs - even if you were able to program it using VBA. It is also hard to have new data flow into 100 different tabs efficiently. Lastly, Excel’s non-linear solutions are sometimes 10-20% less optimal compared to more mature solvers.
However, there are a couple benefits of using Excel alongside R. For starters, the initial setup of a single solver can be performed much quicker in Excel. In addition, it is much easier to understand the problem visually in Excel. For example, you can color cells, highlight summaries, add labels, and much more.
To summarize, the initial setup for optimization problems should be performed in Excel. However, if you have new data continuously flowing in, or you are solving for multiple data sets, then you should transition your solver into R.
ROI Package versus OMPR Package
There is an ompr package in R that allows for tidy linear programming and is simple to use. However, at this time, the ompr package does not support non-linear functions.
Although the ROI package has a steeper learning curve, once you get comfortable with it, you can set up non-linear functions very easily. The ROI package also provides a great deal of flexibility for most business related optimization problems.
As stated earlier, the problems we are solving in the article are technically linear in nature and we do run the risk of possibly not finding the global optimal. Nevertheless, as mentioned earlier, we will solve the customized functions with non-linear methods to highlight the benefits of this approach.
Gratitude & References
If you want more detailed information about the topics discussed in this article, please check out Matt Dancho’s Learning Labs Pro: 15 & 16. If you are not a Learning Lab Pro member and you love R or data science, you should definitely subscribe. As a disclaimer, I have no affiliation with BSU, I am just an avid student and a huge fan.
Load Libraries
Before we begin the analysis, let’s first load in a few libraries - including the ROI.plugin.alabama package.
# Solver Backend
library(ROI)
library(ROI.plugin.alabama)
# Core
library(tidyverse)
Part 1 | Section 1: Calculate Minimum Final Exam Score for History Class
Remember, our goal is to find the lowest possible final exam score that will still allow us to get a B. To date, our overall class grade sits at 90% (45 pnts out of 50). The weight of the final exam is 50%. Therefore, we can score as low as a 70% on the final exam and still achieve a B in our overall class grade. See summary of the math below.
(90% assignment total score x 0.50 wt) + (70% final exam score x 0.50 wt) = 80% overall grade ~ B
The current grades and summary stats are below. We will use this raw data to set up our functions later on.
- Assignment #1: 12 pnts out of 15 ~ 80%
- Assignment #2: 20 pnts out of 20 ~ 100%
- Assignment #3: 13 pnts out of 15 ~ 87%
Part 1 | Section 2: Set Up History Scores in a Tibble
We will set up a data tibble for our current History course scores for the first 3 assignments. Remember, the total points earned was a total of 45 out of 50 points.
history_current_grade_tbl <- tibble(
Subject = rep(c("History"), times = c(3)),
Assignment = c("Assign_01", "Assign_02", "Assign_03"),
Pnts_Earned = c(12.00, 20.00, 13.00),
Pnts_Possible = c(15.00, 20.00, 15.00))
history_current_grade_tbl
## # A tibble: 3 x 4
## Subject Assignment Pnts_Earned Pnts_Possible
## <chr> <chr> <dbl> <dbl>
## 1 History Assign_01 12 15
## 2 History Assign_02 20 20
## 3 History Assign_03 13 15
We also know that the final exam is worth 50 points. We will set up a variable below and use it later in our customized function.
Pnts_Possible_Final_Exam <- 50
Part 1 | Section 3: Set Up Function to Calculate Points Needed
Next, we will set up a customized function that will allow us to pass through an anonymous value for the points earned on the Final Exam. Once a value is plugged into the function, it will calculate the final grade. Remember, the final exam score needed to achieve our goal is “technically unknown” to this function, but we will test it out manually before plugging it into the non-linear solver.
calc_Final_Exam_Points <- function(Pnts_Earned_Final_Exam) {
history_current_grade_tbl %>%
# Sum up current points earned and sum up all current possible points
summarise(Pnts_Earned_sum = sum(Pnts_Earned),
Pnts_Possible_sum = sum(Pnts_Possible)) %>%
# Add Unknown Final Exam Pnts Earned to Current Pnts Earned
mutate(Total_Pnts_Earned = Pnts_Earned_Final_Exam + Pnts_Earned_sum) %>%
# Add Final Exam Possible Possible Pnts to Current Possible Pnts
mutate(Total_Pnts_Possible = Pnts_Possible_Final_Exam + Pnts_Possible_sum) %>%
# Calculate Total Grade for Class
mutate(Final_Class_Grade = Total_Pnts_Earned/Total_Pnts_Possible) %>%
# Pull out Final Class Grade
pull(Final_Class_Grade)
}
When we plug in 35 final exam pnts earned into the function below, the final grade equals 80%.
In other words, we would need a 70% on the final (35 pnts earned / 50 pnts possible) to get an overall score of 80%. Therefore, we know the function is working properly.
calc_Final_Exam_Points(35)
## [1] 0.8
Part 1 | Section 4: Set Up a Single Data Set Optimization Objective
We still have not technically solved this problem using a solver solution. At this point, we have only created a function that allows us to calculate our scores manually with a customized function.
Therefore, let’s assume we didn’t know the answer and we needed to solve programmatically. The number of outputs we will receive from the solver function equals one (1), so we will set our n variable = 1. We also know we want to achieve an 80% overall grade, so we can set that variable up as well.
n <- 1
final_grade <- 0.80
Next, we will create a function that is described in detail below.
- A single data set will be allowed to pass through the function. In this example, the data set is the history_current_grade_tbl which includes all of our History course scores to date.
- The objective is to calculate the Final Exam points needed to achieve a B (80% overall grade).
- The only constraint in the solver set is that the final grade must be equal to 80%.
- Since this is a minimization problem, we set maximum = FALSE
- Lastly, we can then solve the problem using the ROI_solve function and the alabama solver to return the results in a well formatted tibble using some dplyr functions.
single_data_set_optimization <- function(data = data) {
# Set up solver (think of Excel's Solver Add-in)
final_exam_grade_nlp <- OP(
objective = F_objective(F = calc_Final_Exam_Points, n = n, names = "Pnts_Earned_Final_Exam"),
constraints = rbind(
F_constraint(F = calc_Final_Exam_Points, dir = "==", rhs = final_grade)
),
maximum = FALSE
)
# Solve for optimal solution
sol <- ROI_solve(final_exam_grade_nlp, solver = "alabama", start = rep(1/n, n))
# Return the optimal solution results in a tidy tibble format
return(
bind_cols(
tibble(Final_Grade_Perc = calc_Final_Exam_Points(sol$solution)),
enframe(sol$solution) %>% spread(key = name, value = value),
tibble(Pnts_Possible_Final_Exam = Pnts_Possible_Final_Exam)) %>%
mutate(Final_Exam_Score_Needed = Pnts_Earned_Final_Exam/Pnts_Possible_Final_Exam)
)
}
Now we can pass the grade tibble into the function and view the results.
history_grade_summary <- single_data_set_optimization(history_current_grade_tbl)
glimpse(history_grade_summary)
## Rows: 1
## Columns: 4
## $ Final_Grade_Perc <dbl> 0.8
## $ Pnts_Earned_Final_Exam <dbl> 35
## $ Pnts_Possible_Final_Exam <dbl> 50
## $ Final_Exam_Score_Needed <dbl> 0.7
The formatted tibble returns the appropriate results we expected when calculating the solution manually. In order to receive a minimum of overall grade of 80%, we would need to score a 70% on the final exam. The set up of these functions will now allow us to programmatically figure out what we need to score on all of our final exams in order to get Bs in all of our classes.
Part 2 | Section 1: Calculate Minimum Final Exam Score for All Four Classes
In Part 2, we will now explore how to calculate the minimum final exam scores needed to get Bs in all four courses: History, Science, Math & English. A summary of our current grades to date are listed below.
To keep the examples simple, we assume only been 3 assignments have been given in each of the 4 classes, and all four final exams are worth 50 points.
- Current History Grade: 90% (45.0 pnts out of 50)
- Current Science Grade: 87% (43.5 pnts out of 50)
- Current Math Grade: 76% (38.0 pnts out of 50)
- Current English Grade: 68% (34.0 pnts out of 50)
Part 2 | Section 2: Set Up All Subjects’ Scores in a Tibble
We will set up a data tibble for our all current scores for the first 3 assignments for all 4 subjects.
all_subjects_current_grade_tbl <- tibble(
Subject = rep(c("History", "Science", "Math", "English"), times = c(3, 3, 3, 3)),
Assignment = rep(c("Assign_01", "Assign_02", "Assign_03"), times = c(4)),
Pnts_Earned = c(12, 20, 13, 12, 20, 11.5, 10, 18, 10, 10, 14, 10),
Pnts_Possible = c(15, 20, 15, 15, 20, 15, 15, 20, 15, 15, 20, 15))
all_subjects_current_grade_tbl
## # A tibble: 12 x 4
## Subject Assignment Pnts_Earned Pnts_Possible
## <chr> <chr> <dbl> <dbl>
## 1 History Assign_01 12 15
## 2 History Assign_02 20 20
## 3 History Assign_03 13 15
## 4 Science Assign_01 12 15
## 5 Science Assign_02 20 20
## 6 Science Assign_03 11.5 15
## 7 Math Assign_01 10 15
## 8 Math Assign_02 18 20
## 9 Math Assign_03 10 15
## 10 English Assign_01 10 15
## 11 English Assign_02 14 20
## 12 English Assign_03 10 15
Part 2 | Section 3: Set Up Nested Tibble
One incredible functionality in R is the ability to nest() data sets. Nesting gives you the ability to apply functions across multiple data sets independently. In other words, think of it as a loop that allows you to perform functions on one set of data, print some results, and repeat these steps on other data sets. To account for the various independent data sets, we set up a nested tibble which includes the same information in the tibble above.
all_subjects_current_grade_nested_tbl <- all_subjects_current_grade_tbl %>%
group_by(Subject) %>%
nest()
all_subjects_current_grade_nested_tbl
## # A tibble: 4 x 2
## # Groups: Subject [4]
## Subject data
## <chr> <list>
## 1 History <tibble [3 x 3]>
## 2 Science <tibble [3 x 3]>
## 3 Math <tibble [3 x 3]>
## 4 English <tibble [3 x 3]>
We will apply the optimization function we created in Part 1 to the nested data that will allow us to calculate an optimal score for all four final exams independently.
For more information about nested functions and mapping, please check out R for Data Science.
Part 2 | Section 4: Set Up a Multiple Data Set Optimization Objective
As we did earlier, we will set n = 1 since we are only receiving one (1) output score for each subject. We also know we want to achieve an 80% overall grade for all subjects, so we can set that variable up as well.
n <- 1
final_grade <- 0.80
The only difference in the code below is that we wrap another function around our original calc_Final_Exam_Points() function to allow for the mapping across multiple data sets.
all_subjects_nested_model <- function(nested_tbl) {
calc_Final_Exam_Points <- function(Pnts_Earned_Final_Exam) {
nested_tbl %>%
# Sum up current points earned and sum up all current possible points
summarise(Pnts_Earned_sum = sum(Pnts_Earned),
Pnts_Possible_sum = sum(Pnts_Possible)) %>%
# Add Unknown Final Exam Pnts Earned to Current Pnts Earned
mutate(Total_Pnts_Earned = Pnts_Earned_Final_Exam + Pnts_Earned_sum) %>%
# Add Final Exam Possible Possible Pnts to Current Possible Pnts
mutate(Total_Pnts_Possible = Pnts_Possible_Final_Exam + Pnts_Possible_sum) %>%
# Calculate Total Grade for Class
mutate(Final_Class_Grade = Total_Pnts_Earned/Total_Pnts_Possible) %>%
# Pull out Final Class Grade
pull(Final_Class_Grade)
}
# Set up solver (think of Excel's Solver Add-in)
final_exam_grade_nlp <- OP(
objective = F_objective(F = calc_Final_Exam_Points, n = n, names = "Pnts_Earned_Final_Exam"),
constraints = rbind(
F_constraint(F = calc_Final_Exam_Points, dir = "==", rhs = final_grade)
),
maximum = FALSE
)
# Solve for optimal solution
sol <- ROI_solve(final_exam_grade_nlp, solver = "alabama", start = rep(1/n, n))
# Return the optimal solution results in a tidy tibble format
return(
bind_cols(
tibble(Final_Grade_Perc = calc_Final_Exam_Points(sol$solution)),
enframe(sol$solution) %>% spread(key = name, value = value),
tibble(Pnts_Possible_Final_Exam = Pnts_Possible_Final_Exam)) %>%
mutate(Final_Exam_Score_Needed = Pnts_Earned_Final_Exam/Pnts_Possible_Final_Exam)
)
}
Now all we have to do is map the all_subjects_nested_model() function to the nested data set and unnest() the results. The final results are listed below in a nice tidy format.
receive_all_Bs_summary_tbl <- all_subjects_current_grade_nested_tbl %>%
mutate(model = map(data, all_subjects_nested_model)) %>%
select(Subject, model) %>%
unnest()
Let’s take a look at the final results for all 4 classes.
glimpse(receive_all_Bs_summary_tbl)
## Rows: 4
## Columns: 5
## Groups: Subject [4]
## $ Subject <chr> "History", "Science", "Math", "English"
## $ Final_Grade_Perc <dbl> 0.8, 0.8, 0.8, 0.8
## $ Pnts_Earned_Final_Exam <dbl> 35.0, 36.5, 42.0, 46.0
## $ Pnts_Possible_Final_Exam <dbl> 50, 50, 50, 50
## $ Final_Exam_Score_Needed <dbl> 0.70, 0.73, 0.84, 0.92
In summary, we would need to get the following minimum final exam scores in order to get all Bs on our report card for all four subjects.
- History Final Exam Score ~ 70%
- Science Final Exam Score ~ 73%
- Math Final Exam Score ~ 84%
- English Final Exam Score ~ 92%
Part 2 | Section 5: Calculate Score Needed to Get All As
Since we took the time and effort to set up dynamic functions and variables, we can also see if it is feasible to get a 90% or an A in every course by changing the final_grade variable to 0.90. And then all we need to do is re-run the same function and print the results.
final_grade <- 0.90
receive_all_As_summary_tbl <- all_subjects_current_grade_nested_tbl %>%
mutate(model = map(data, all_subjects_nested_model)) %>%
select(Subject, model) %>%
unnest()
glimpse(receive_all_As_summary_tbl)
## Rows: 4
## Columns: 5
## Groups: Subject [4]
## $ Subject <chr> "History", "Science", "Math", "English"
## $ Final_Grade_Perc <dbl> 0.9, 0.9, 0.9, 0.9
## $ Pnts_Earned_Final_Exam <dbl> 45.0, 46.5, 52.0, 56.0
## $ Pnts_Possible_Final_Exam <dbl> 50, 50, 50, 50
## $ Final_Exam_Score_Needed <dbl> 0.90, 0.93, 1.04, 1.12
We would need to get the following minimum final exam scores in order to get all As on our report card for all four subjects.
- History Final Exam Score ~ 90%
- Science Final Exam Score ~ 93%
- Math Final Exam Score ~ 104%
- English Final Exam Score ~ 112%
Unfortunately, unless extra credit was given on the Math and English exams, we could not get an A in either class, since we would have to score above 100% for each final exam. However, As in History and Science are achievable if we scored a 90% and a 93% on those finals, respectively.
Summary and Final Thoughts
The examples presented are not especially difficult to solve manually or by using Excel’s solver. Both problems presented are technically linear and very simple in nature. Linear problems are often easier to understand compared to non-linear.
However, I personally believe in this non-linear and functional setup approach since non-linear problems are more likely presented in the wild. Non-linear problems allow for conditions (if-then statements) to be set up in the functions that also adds another layer of flexibility to the optimization solutions.
Although the functions and the setup may seem like a lot to absorb, I truly believe the power comes from the ability to pass multiple data sets through one function multiple times to get multiple solutions. I hope you found some value in this article.
For questions related to this analysis, please message me on LinkedIn.
For access to more of my articles, please check out my blog.