I want to minimize a simple linear function
Y = x1 + x2 + x3 + x4 + x5 using ordinary least squares with the constraint that the sum of all coefficients have to equal 5. How can I accomplish this in R? All of the packages I've seen seem to allow for constraints on individual coefficients, but I can't figure out how to set a single constraint affecting coefficients. I'm not tied to OLS; if this requires an iterative approach, that's fine as well.
5-sum(p[1:4])... You could conceivably do the calculus yourself and get a closed-form expression .. - Ben Bolker 2012-04-03 19:48
Y ~ x1+x2+x3+x4+x5, how do I indicate to the minimizing function that I want to keep the parameter for
5-sum(x[1:4])? I can't just solve for
Y ~ x1+x2+x3+x4, because that (appears to me to be) a completely different optimization problem - eykanal 2012-04-03 19:55
sum(p)=C. The original linear problem (without constraints) is ill-posed, because we can make
a1*x1+a2*x2+a3*x3as small as we want by setting the coefficients to large negative numbers if x is positive and vice versa. Putting the constraint on (a1+a2+a3=C) transforms this to a lower-dimensional, but still ill-posed problem, i.e. minimizing
a1*(x1-x3)+a2*(x2-x3)+C*x3). Care to clarify the problem ... ? (Perhaps you mean you want to fit a linear least-squares problem?? - Ben Bolker 2012-04-03 20:05
nlminbor something, though, that's fine as well. So yes, that's what I'm looking for, and clarification will be rewarded with cupcakes (actual cupcakes not included) - eykanal 2012-04-03 20:10
The basic math is as follows: we start with
mu = a0 + a1*x1 + a2*x2 + a3*x3 + a4*x4
and we want to find
a4 to minimize the SSQ between
mu and our response variable
if we replace the last parameter (say
a4) with (say)
C-a1-a2-a3 to honour the constraint, we end up with a new set of linear equations
mu = a0 + a1*x1 + a2*x2 + a3*x3 + (C-a1-a2-a3)*x4 = a0 + a1*(x1-x4) + a2*(x2-x4) + a3*(x3-x4) + C*x4
a4 has disappeared ...)
Something like this (untested!) implements it in R.
Original data frame:
d <- data.frame(y=runif(20), x1=runif(20), x2=runif(20), x3=runif(20), x4=runif(20))
Create a transformed version where all but the last column have the last column "swept out", e.g.
x1 -> x1-x4; x2 -> x2-x4; ...
dtrans <- data.frame(y=d$y, sweep(d[,2:4], 1, d[,5], "-"), x4=d$x4)
tx2, ... to minimize confusion:
names(dtrans)[2:4] <- paste("t",names(dtrans[2:4]),sep="")
constr <- 5
Now fit the model with an offset:
It wouldn't be too hard to make this more general.
This requires a little more thought and manipulation than simply specifying a constraint to a canned optimization program. On the other hand, (1) it could easily be wrapped in a convenience function; (2) it's much more efficient than calling a general-purpose optimizer, since the problem is still linear (and in fact one dimension smaller than the one you started with). It could even be done with big data (e.g.
biglm). (Actually, it occurs to me that if this is a linear model, you don't even need the offset, although using the offset means you don't have to compute
a0=intercept-C*x4 after you finish.)
x4be equal to
5-x1-x2-x3; I'm looking to constrain the coefficients, not the variables themselves. How would I set up the constrain
a4=5-a1-a2-a3- eykanal 2012-04-04 11:31
y=a0 + (a1-a4)*x1 + ... + (a3-a4)*x3- eykanal 2012-04-04 12:41
a4doesn't appear in the equation any more, it has been transformed out. Have you worked through the algebra yourself - Ben Bolker 2012-04-04 13:18
Since you said you are open to other approaches, this can also be solved in terms of a quadratic programming (QP):
Minimize a quadratic objective: the sum of the squared errors,
subject to a linear constraint: your weights must sum to 5.
Assuming X is your n-by-5 matrix and Y is a vector of length(n), this would solve for your optimal weights:
library(limSolve) lsei(A = X, B = Y, E = matrix(1, nrow = 1, ncol = 5), F = 5)