1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.

Non-linear least squares standard error calculation in R

Discussion in 'Education' started by clw, Oct 8, 2018.

  1. clw

    clw Guest

    I am using implementations of the Levenberg-Marquardt algorithm for non-linear least squares regression based on MINPACK-1 utilizing either the R function nlsLM() from minpack.lm or an implementation in C using the mpfit library.

    I have observed that while I come up with comparable parameter estimates in both cases, the calculated uncertainty is drastically different in the two approaches. Below you find the code of the two approaches:

    In R:

    library(minpack.lm)

    x <- seq(1,9)
    y <- c(2.93,1.79,1.03,0.749,0.562,0.218,0.068,0.155,0.03)
    table <- data.frame(x,y)

    fit <- nlsLM(y ~ N * exp( -rate * x ), data = table, start = list(rate = 0.1, N = 1)

    summary(fit)


    which returns the estimates and standard errors:

    rate 0.47825, std.Err: 0.02117
    N 4.69723, std.Err: 0.18950


    In C:

    #include <math.h>
    #include <stdio.h>
    #include <string.h>

    #include "cmpfit-1.2/mpfit.h"

    // Data structure to keep my vectors
    typedef struct my_data {

    double *x;
    double *y;
    int n;

    } my_data;

    // defining the function that is to be fitted
    int my_exp(int m, int n, double *p, double *dev, double **deriv, my_data *data)
    {

    int i;
    double *x, *y, f;

    x = data->x;
    y = data->y;

    for (i = 0; i < m; i++) {

    f = p[0] * exp(-p[1]*x);
    dev = y - f;

    }

    return 0;
    }


    int main(void)
    {

    // define test data
    my_data data;
    data.x = (double[]){1,2,3,4,5,6,7,8,9};
    data.y = (double[]){2.93,1.79,1.03,0.749,0.562,0.218,0.068,0.155,0.032};
    data.n = 9;

    // start values
    double p[2] = {1.0, 0.1};

    mp_result result;
    memset(&result, 0, sizeof(result));

    double perr[2];
    result.xerror = perr;

    int ret;

    ret = mpfit((mp_func)my_exp, data.n, 2, p, 0,0, &data, &result);


    printf("rate: %f, Error: %f\n", p[1], perr[1]);
    printf(" N: %f, Error: %f\n", p[0], perr[0]);

    return 0;
    }


    which returns:

    rate: 0.479721, Error: 0.270632
    N: 4.707439, Error: 2.422189


    As you can see, the errors returned by mpfit are about 12x higher then from R. I also implemented the same thing using GSL which produces results identical to mpfit. Unfortunately I am not very comfortable with the R code base, so it would be great if anyone has any inside into what nlsLM in R does differently then mpfit and GSL when calculating the errors.

    * EDIT 1 *

    So, I've made a few discoveries I'd like to share and hopefully get some input on. The errors returned by mpfit in the C implementation actually correspond to what one would get in R using:

    diag(sqrt(summary(fit)$cov.unscaled))


    i.e. the square root of the diagonal of the unscaled covariance matrix. The standard error estimates for the parameters actually returned by summary(fit) are apparently scaled by the residual standard error of the regression, so that:

    sqrt(diag(summary(fit)$cov.unscaled) * s$sigma^2)


    actually returns the values also observed in summary(fit).

    Why is this scaling done? What is the statistical reason behind it? Any input would be very welcome!

    * EDIT 2 *

    So, I think I figured it out now and I'll put it here for future reference. What is returned by the C implementation as error, as well as stored in cov.unscaled in R is just the equivalent of solve(t(J) %*% J) which needs to be scaled by sigma^2 to get the variance of the parameter estimates. Thanks @dave for the help!

    Login To add answer/comment
     

Share This Page