Biochemistry is an eminently quantitative science. The study of the relationship between variables and the determination of the parameters governing such relation, are part of the work of our discipline. Generally, the variables of interest are not linearly related to each other. Although exist linearizing transformations, these are not without risks since they can introduce significant biases.
The existence of programming languages such as R, allows us to easily and reliably address the non-linear fit of biochemical models. The function dir.MM() from the renz package carry out the non-linear least square fitting of kinetic data to the Michaelis-Menten equation.
Regression analysis is a set of statistical processes for estimating the relationships between a dependent variable (in our case initial rate) and one or more independent variables (in our case the substrate concentration). The method least squares computes the unique line (or hyperplane) that minimizes the sum of squared differences between the true data and that line (see Figure).
To find the kinetic parameters (Km and Vmax) that minimize the SSQ, we will have to solve the system of equations:
To achieve such purpose, the function dir.MM() invokes numerical method well implemented into R.
We start by loading some kinetic data obtained by students during their undergraduate laboratory training. Using β-galactosidase as an enzyme model, the students assess the effect of the substrate o-nitrophenyl-β-D-galactopynaroside (ONPG) on the initial rate (doi: 10.1002/bmb.21522). The data obtained by eight different groups of students can be loaded just typing:
ONPG | v1 | v2 | v3 | v4 | v5 | v6 | v7 | v8 |
---|---|---|---|---|---|---|---|---|
0.05 | 2.26 | 1.29 | 0.004 | 0.004 | 0.004 | 0.003 | 1.77 | 2.98 |
0.10 | 5.48 | 3.33 | 0.008 | 0.007 | 0.007 | 0.006 | 5.20 | 5.20 |
0.25 | 13.40 | 11.80 | 0.020 | 0.020 | 0.016 | 0.017 | 15.04 | 14.38 |
0.50 | 24.70 | 22.80 | 0.035 | 0.035 | 0.032 | 0.031 | 28.31 | 30.30 |
1.00 | 40.90 | 35.20 | 0.060 | 0.056 | 0.050 | 0.048 | 50.98 | 48.99 |
2.50 | 62.30 | 39.90 | 0.110 | 0.104 | 0.090 | 0.101 | 75.42 | 86.25 |
5.00 | 94.30 | 73.50 | 0.138 | 0.138 | 0.115 | 0.121 | 112.68 | 112.57 |
8.00 | 105.00 | 12.90 | 0.154 | 0.150 | 0.119 | 0.139 | 126.06 | 136.24 |
20.00 | 133.00 | 112.00 | 0.179 | 0.179 | 0.142 | 0.152 | 154.93 | 169.97 |
30.00 | 144.00 | 120.00 | 0.200 | 0.200 | 0.166 | 0.181 | 168.75 | 177.71 |
The first column gives the ONPG concentrations in mM, and the remaining 8 columns correspond to the initial rates. Note that while groups 1, 2, 7 and 8 decided to express their rates as μM/min, the remaining groups opted by mM/min. This information can be confirmed by checking the attributes of data:
attributes(data)
#> $names
#> [1] "ONPG" "v1" "v2" "v3" "v4" "v5" "v6" "v7" "v8"
#>
#> $row.names
#> [1] 1 2 3 4 5 6 7 8 9 10
#>
#> $`[ONPG]`
#> [1] "mM"
#>
#> $`v3, v4, v5, v6`
#> [1] "mM/min"
#>
#> $class
#> [1] "data.frame"
#>
#> $`v1, v2, v7, v8`
#> [1] "uM/min"
Thus, before continuing we are going to express all the rates using the same units: μM/min:
I strongly insist to my students that when we have to analyze data, the first thing we must do is a scatter diagram, since this will give us a first impression about our data and will guide us on how to proceed with the analysis. To lead by example, we will carry out such diagrams.
The first four groups:
oldmar <- par()$mar
oldmfrow <- par()$mfrow
par(mfrow = c(2, 2))
par(mar = c(4, 4,1,1))
for (i in 2:5){
plot(data$ONPG, data[, i],
ty = 'p', ylab = 'v (uM/min)', xlab = '[ONPG] (mM)')
}
The next four groups:
oldmar <- par()$mar
oldmfrow <- par()$mfrow
par(mfrow = c(2, 2))
par(mar = c(4, 4,1,1))
for (i in 6:9){
plot(data$ONPG, data[, i],
ty = 'p', ylab = 'v (uM/min)', xlab = '[ONPG] (mM)')
}
In general, the data does not provide us with any surprises. That is, the relationship between the dependent variable (initial rate) and the independent variable ([ONPG]) is what we expect: hyperbolic curve. An exception is the rate obtained by group 2 when [ONPG] = 8 mM, which is clearly an “outlier”. No problem! We will remove that point from further analysis to prevent it from introducing artifacts.
Using the data from group 7 to illustrate the use of the dir.MM() function:
#> $parameters
#> Km Vm
#> 3.115 181.182
#>
#> $data
#> S v fitted_v
#> 1 0.05 1.77 2.862275
#> 2 0.10 5.20 5.635521
#> 3 0.25 15.04 13.460773
#> 4 0.50 28.31 25.059751
#> 5 1.00 50.98 44.029648
#> 6 2.50 75.42 80.668744
#> 7 5.00 112.68 111.634011
#> 8 8.00 126.06 130.405398
#> 9 20.00 154.93 156.765737
#> 10 30.00 168.75 164.138910
We propose to the reader, as an exercise, to compare these results with those obtained when we use data from the remaining groups.