MC++
McCormick Relaxation Arithmetic for Factorable Functions
Author
Benoît Chachuat

A convex relaxation \(f^{\rm cv}\) of a function \(f\) on the convex domain \(D\) is a function that is (i) convex on \(D\) and (ii) underestimates \(f\) on \(D\). Likewise, a concave relaxation \(f^{\rm cc}\) of a function \(f\) on the convex domain \(D\) is a function that is (i) concave on \(D\) and (ii) overestimates \(f\) on \(D\). McCormick's technique [McCormick, 1976] provides a means for computing pairs of convex/concave relaxations of a multivariate function on interval domains provided that this function is factorable and that the intrinsic univariate functions in its factored form have known convex/concave envelopes or, at least, relaxations.

McCormick_relax.png

The class mc::McCormick provides an implementation of the McCormick relaxation technique and its recent extensions; see [McCormick, 1976; Scott et al., 2011; Tsoukalas & Mitsos, 2012; Wechsung & Barton, 2013]. mc::McCormick also has the capability to propagate subgradients for these relaxations, which are guaranteed to exist in the interior of the domain of definition of any convex/concave function. This propagation is similar in essence to the forward mode of automatic differentiation; see [Mitsos et al., 2009]. We note that mc::McCormick is not a verified implementation in the sense that rounding errors are not accounted for in computing convex/concave bounds and subgradients.

The implementation of mc::McCormick relies on the operator/function overloading mechanism of C++. This makes the computation of convex/concave relaxations both simple and intuitive, similar to computing function values in real arithmetics or function bounds in interval arithmetic (see Non-Verified Interval Arithmetic for Factorable Functions). Moreover, mc::McCormick can be used as the template parameter of other classes of MC++, for instance mc::TModel and mc::TVar. Likewise, mc::McCormick can be used as the template parameter of the classes fadbad::F, fadbad::B and fadbad::T of FADBAD++ for computing McCormick relaxations and subgradients of the partial derivatives or the Taylor coefficients of a factorable function (see How do I compute McCormick relaxations of the partial derivatives or the Taylor coefficients of a factorable function using FADBAD++?).

mc::McCormick itself is templated in the type used to propagate the supporting interval bounds. By default, mc::McCormick can be used with the non-verified interval type mc::Interval of MC++. For reliability, however, it is strongly recommended to use verified interval arithmetic such as PROFIL or FILIB++. We note that Taylor models as provided by the classes mc::TModel and mc::TVar can also be used as the template parameter (see Taylor Model Arithmetic for Factorable Functions).

Examples of McCormick relaxations constructed with mc::McCormick are shown on the left plot of the figure below for the factorable function \(f(x)=\cos(x^2)\,\sin(x^{-3})\) for \(x\in [\frac{\pi}{6},\frac{\pi}{3}]\). Also shown on the right plot are the affine relaxations constructed from a subgradient at \(\frac{\pi}{4}\) of the McCormick relaxations of \(f\) on \([\frac{\pi}{6},\frac{\pi}{3}]\).

MC-1D_relax.png
MC-1D_linearize.png

How do I compute McCormick relaxations of a factorable function?

Suppose one wants to compute McCormick relaxation of the real-valued function \(f(x,y)=x(\exp(x)-y)^2\) for \((x,y)\in [-2,1]\times[-1,2]\), at the point \((x,y)=(0,1)\). For simplicity, the supporting interval bounds are calculated using the default interval type mc::Interval here:

#include "interval.hpp"
typedef mc::Interval I;
typedef mc::McCormick<I> MC;

First, the variables \(x\) and \(y\) are defined as follows:

MC X( I( -2., 1. ), 0. );
MC Y( I( -1., 2. ), 1. );

Essentially, the first line means that X is a variable of type mc::McCormick, belonging to the interval \([-2,1]\), and whose current value is \(0\). The same holds for the McCormick variable Y, which belonging to the interval \([-1,2]\) and has a value of \(1\).

Having defined the variables, McCormick relaxations of \(f(x,y)=x(\exp(x)-y)^2\) on \([-2,1]\times[-1,2]\) at \((0,1)\) are simply computed as:

MC F = X*pow(exp(X)-Y,2);

These relaxations can be displayed to the standard output as:

std::cout << "f relaxations at (0,1): " << F << std::endl;

which produces the following output:

f relaxations at (0,1): [ -2.76512e+01 :  1.38256e+01 ] [ -1.38256e+01 :  8.52245e+00 ]

Here, the first pair of bounds correspond to the supporting interval bounds, as obtained with mc::Interval, and which are valid over the entire domain \([-2,1]\times[-1,2]\). The second pair of bounds are the values of the convex and concave relaxations at the selected point \((0,1)\). In order to describe the convex and concave relaxations on the entire range \([-2,1]\times[-1,2]\), it would be necessary to repeat the computations at different points. The current point can be set/modified by using the method mc::McCormick::c, for instance at \((-1,0)\)

X.c( -1. );
Y.c( 0. );
F = X*pow(exp(X)-Y,2);
std::cout << "f relaxations at (-1,0): " << F << std::endl;

producing the output:

f relaxations at (-1,0): [ -2.76512e+01 :  1.38256e+01 ] [ -1.75603e+01 :  8.78014e+00 ]

The values of the McCormick convex and concave relaxations of \(f(x,y)\) can be retrieved, respectively, as:

double Fcv = F.cv();
double Fcc = F.cc();

Likewise, the lower and upper bounds of the supporting interval bounds can be retrieved as:

double Flb = F.l();
double Fub = F.u();

How do I compute a subgradient of the McCormick relaxations?

Computing a subgradient of a McCormick relaxation requires specification of the independent variables via the .sub method, prior to evaluating the function in mc::McCormick type. Continuing the previous example, the function has two independent variables \(x\) and \(y\). Defining \(x\) and \(y\) as the subgradient components \(0\) and \(1\) (indexing in C/C++ start at 0 by convention!), respectively, is done as follows:

X.sub( 2, 0 );
Y.sub( 2, 1 );

Similar to above, the McCormick convex and concave relaxations of \(f(x,y)\) at \((-1,0)\) along with subgradients of these relaxations are computed as:

F = X*pow(exp(X)-Y,2);
std::cout << "f relaxations and subgradients at (-1,0): " << F << std::endl;

producing the output:

f relaxations and subgradients at (-1,0): [ -2.76512e+01 :  1.38256e+01 ] [ -1.75603e+01 :  8.78014e+00 ] [ (-3.19186e+00, 3.70723e+00) : ( 1.59593e+00,-1.85362e+00) ]

The additional information displayed corresponds to, respectively, a subgradient of the McCormick convex underestimator at (-1,0) and a subgradient of the McCormick concave overestimator at (-1,0). In turn, these subgradients can be used to construct affine relaxations on the current range, or passed to a bundle solver to locate the actual minimum or maximum of the McCormick relaxations.

The subgradients of the McCormick relaxations of \(f(x,y)\) at the current point can be retrieved as follows:

const double* Fcvsub = F.cvsub();
const double* Fccsub = F.ccsub();

or, alternatively, componentwise as:

double Fcvsub_X = F.cvsub(0);
double Fcvsub_Y = F.cvsub(1);
double Fccsub_X = F.ccsub(0);
double Fccsub_Y = F.ccsub(1);

Directional subgradients can be propagated too. In the case that subgradients are to computed along the direction (1,-1) for both the convex and concave relaxations, we define:

const double sub_dir[2] = { 1., -1 };
X.sub( 1, &sub_dir[0], &sub_dir[0] );
Y.sub( 1, &sub_dir[1], &sub_dir[1] );
F = X*pow(exp(X)-Y,2);
std::cout << "f relaxations and subgradients along direction (1,-1) at (-1,0): " << F << std::endl;

producing the output:

f relaxations and subgradients along direction (1,-1) at (-1,0): [ -2.76512e+01 :  1.38256e+01 ] [ -1.75603e+01 :  8.78014e+00 ] [ (-6.89910e+00) : ( 3.44955e+00) ]

How do I compute McCormick relaxations of the partial derivatives or the Taylor coefficients of a factorable function using FADBAD++?

Now suppose one wants to compute McCormick relaxation not only for a given factorable function, but also for its partial derivatives. Continuing the previous example, the partial derivatives of \(f(x,y)=x(\exp(x)-y)^2\) with respect to its independent variables \(x\) and \(y\) can be obtained via automatic differentiation (AD), either forward or reverse AD. This can be done for instance using the classes fadbad::F, and fadbad::B of FADBAD++.

Considering forward AD first, we include the following header files:

#include "mcfadbad.hpp" // available in MC++
#include "fadiff.h" // available in FADBAD++
typedef fadbad::F<MC> FMC;

The variables are initialized and the derivatives and subgradients are seeded as follows:

FMC FX = X; // initialize FX with McCormick variable X
FX.diff(0,2); // differentiate with respect to x (index 0 of 2)
FX.x().sub(2,0); // seed subgradient of x (index 0 of 2)
FMC FY = Y; // initialize FY with McCormick variable Y
FY.diff(1,2); // differentiate with respect to y (index 1 of 2)
FY.x().sub(2,1); // seed subgradient of y (index 1 of 2)

As previously, the McCormick convex and concave relaxations of \(f\), \(\frac{\partial f}{\partial x}\), and \(\frac{\partial f}{\partial y}\) at \((-1,0)\) on the range \([-2,1]\times[-1,2]\), along with subgradients of these relaxations, are computed as:

FMC FF = FX*pow(exp(FX)-FY,2);
std::cout << "f relaxations and subgradients at (-1,0): " << FF.x() << std::endl;
std::cout << "df/dx relaxations and subgradients at (-1,0): " << FF.d(0) << std::endl;
std::cout << "df/dy relaxations and subgradients at (-1,0): " << FF.d(1) << std::endl;

producing the output:

f relaxations and subgradients at (-1,0): [ -2.76512e+01 :  1.38256e+01 ] [ -1.75603e+01 :  8.78014e+00 ] [ (-3.19186e+00, 3.70723e+00) : ( 1.59593e+00,-1.85362e+00) ]
df/dx relaxations and subgradients at (-1,0): [ -4.04294e+01 :  3.41004e+01 ] [ -2.33469e+01 :  2.90549e+01 ] [ (-2.31383e+01,-1.94418e-01) : ( 1.59593e+00,-1.85362e+00) ]
df/dy relaxations and subgradients at (-1,0): [ -7.45866e+00 :  1.48731e+01 ] [ -5.96505e+00 :  7.71460e+00 ] [ (-5.96505e+00,-4.00000e+00) : ( 7.17326e+00,-4.00000e+00) ]

Relaxations of the partial derivatives can also be computed using the backward mode of AD, which requires the additional header file:

#include "badiff.h" // available in FADBAD++
typedef fadbad::B<MC> BMC;

then initialize and seed new variables and compute the function as follows:

BMC BX = X; // initialize FX with McCormick variable X
BX.x().sub(2,0); // seed subgradient as direction (1,0)
BMC BY = Y; // initialize FY with McCormick variable Y
BY.x().sub(2,1); // seed subgradient as direction (1,0)
BMC BF = BX*pow(exp(BX)-BY,2);
BF.diff(0,1); // differentiate f (index 0 of 1)
std::cout << "f relaxations and subgradients at (-1,0): " << BF.x() << std::endl;
std::cout << "df/dx relaxations and subgradients at (-1,0): " << BX.d(0) << std::endl;
std::cout << "df/dy relaxations and subgradients at (-1,0): " << BY.d(0) << std::endl;

producing the output:

f relaxations and subgradients at (-1,0): [ -2.76512e+01 :  1.38256e+01 ] [ -1.75603e+01 :  8.78014e+00 ] [ (-3.19186e+00, 3.70723e+00) : ( 1.59593e+00,-1.85362e+00) ]
df/dx relaxations and subgradients at (-1,0): [ -4.04294e+01 :  3.41004e+01 ] [ -1.37142e+01 :  1.60092e+01 ] [ (-1.35056e+01,-1.94418e-01) : ( 8.82498e+00,-1.31228e+00) ]
df/dy relaxations and subgradients at (-1,0): [ -7.45866e+00 :  1.48731e+01 ] [ -5.96505e+00 :  7.71460e+00 ] [ (-5.96505e+00,-4.00000e+00) : ( 7.17326e+00,-4.00000e+00) ]

It is noteworthy that the bounds, McCormick relaxations and subgradients for the partial derivatives as computed with the forward and backward mode, although valid, may not be the same since the computations involve different sequences or even operations. In the previous examples, for instance, forward and backward AD produce identical interval bounds for \(\frac{\partial f}{\partial x}\) and \(\frac{\partial f}{\partial y}\) at \((-1,0)\), yet significantly tighter McCormick relaxations are obtained with backward AD for \(\frac{\partial f}{\partial x}\) at \((-1,0)\).

Another use of FADBAD++ involves computing McCormick relaxations of the Taylor coefficients in the Taylor expansion of a factorable function in a given direction up to a certain order. Suppose we want to compute McCormick relaxation of the first 5 Taylor coefficients of \(f(x,y)=x(\exp(x)-y)^2\) in the direction \((1,0)\), i.e. the direction of \(x\). This information can be computed by using the classes fadbad::T, which requires the following header file:

#include "tadiff.h" // available in FADBAD++
typedef fadbad::T<MC> TMC;

The variables are initialized and the derivatives and subgradients are seeded as follows:

TMC TX = X; // initialize FX with McCormick variable X
TX[0].sub(2,0); // seed subgradient as direction (1,0)
TMC TY = Y; // initialize FY with McCormick variable Y
TY[0].sub(2,1); // seed subgradient as direction (1,0)
TX[1] = 1.; // Taylor-expand with respect to x
TMC TF = TX*pow(exp(TX)-TY,2);
TF.eval(5); // Taylor-expand f to degree 5
for( unsigned int i=0; i<=5; i++ )
std::cout << "d^" << i << "f/dx^" << i << " relaxations and subgradients at (-1,0): " << TF[i] << std::endl;

producing the output:

d^0f/dx^0 relaxations and subgradients at (-1,0): [ -2.76512e+01 :  1.38256e+01 ] [ -1.75603e+01 :  8.78014e+00 ] [ (-3.19186e+00, 3.70723e+00) : ( 1.59593e+00,-1.85362e+00) ]
d^1f/dx^1 relaxations and subgradients at (-1,0): [ -4.04294e+01 :  3.41004e+01 ] [ -2.33469e+01 :  2.90549e+01 ] [ (-2.31383e+01,-1.94418e-01) : ( 1.59593e+00,-1.85362e+00) ]
d^2f/dx^2 relaxations and subgradients at (-1,0): [ -4.51302e+01 :  3.77111e+01 ] [ -1.97846e+01 :  2.25846e+01 ] [ (-1.97113e+01, 0.00000e+00) : ( 7.36023e+00,-4.06006e-01) ]
d^3f/dx^3 relaxations and subgradients at (-1,0): [ -2.65667e+01 :  2.82546e+01 ] [ -1.02662e+01 :  1.27412e+01 ] [ (-1.00820e+01,-4.51118e-02) : ( 7.66644e+00,-1.80447e-01) ]
d^4f/dx^4 relaxations and subgradients at (-1,0): [ -1.19764e+01 :  1.59107e+01 ] [ -4.29280e+00 :  6.13261e+00 ] [ (-4.25007e+00,-2.25559e-02) : ( 4.86086e+00,-5.63897e-02) ]
d^5f/dx^5 relaxations and subgradients at (-1,0): [ -4.44315e+00 :  7.16828e+00 ] [ -1.49744e+00 :  2.55611e+00 ] [ (-1.44773e+00,-6.76676e-03) : ( 2.29932e+00,-1.35335e-02) ]

The zeroth Taylor coefficient corresponds to the function \(f\) itself. It can also be checked that the relaxations of the first Taylor coefficient of \(f\) matches those obtained with forward AD for \(\frac{\partial f}{\partial x}\).

Naturally, the classes fadbad::F, fadbad::B and fadbad::T can be nested to produce relaxations of higher-order derivative information.

Which functions are overloaded in McCormick relaxation arithmetic?

As well as overloading the usual functions exp, log, sqr, sqrt, pow, inv, cos, sin, tan, acos, asin, atan, erf, erfc, min, max, fabs, mc::McCormick also defines the following functions:

  • fstep(x) and bstep(x), implementing the forward step function (switching value from 0 to 1 for x>=0) and the backward step function (switching value from 1 to 0 for x>=0). These functions can be used to model a variety of discontinuous functions as first proposed in [Wechsung & Barton, 2012].
  • ltcond(x,y,z) and gtcond(x,y,z), similar in essence to fstep(x) and bstep(x), and implementing disjunctions of the form { y if x<=0; z otherwise } and { y if x>=0; z otherwise }, respectively.
  • inter(x,y,z), computing the intersection \(x = y\cap z\) and returning true/false if the intersection is nonempty/empty.
  • hull(x,y), computing convex/concave relaxations of the union \(x\cup y\).

What are the options in mc::McCormick and how are they set?

The class mc::McCormick has a public static member called mc::McCormick::options that can be used to set/modify the options; e.g.,

MC::options.ENVEL_USE = true;
MC::options.ENVEL_TOL = 1e-12;
MC::options.ENVEL_MAXIT = 100;
MC::options.MVCOMP_USE = true;

The available options are the following:

Options in mc::McCormick::Options: name, type and description
Name TypeDefault Description
ENVEL_USE bool true Whether to compute convex/concave envelopes for the neither-convex-nor-concave univariate functions such as odd power terms, sin, cos, asin, acos, tan, atan, erf, erfc. This provides tighter McCormick relaxations, but it is more time consuming. Junction points are computed using the Newton or secant method first, then the more robust golden section search method if unsuccessful.
ENVEL_TOL double 1e-10 Termination tolerance for determination function points in convex/concave envelopes of univariate terms.
ENVEL_MAXIT int 100 Maximum number of iterations for determination function points in convex/concave envelopes of univariate terms.
MVCOMP_USE bool false Whether to use Tsoukalas & Mitsos's multivariate composition result for min/max, product, and division terms; see [Tsoukalas & Mitsos, 2012]. This provides tighter McCormick relaxations, but it is more time consuming.
MVCOMP_TOL double 1e1*machprec() Tolerance for equality test in subgradient propagation for product terms with Tsoukalas & Mitsos's multivariate composition result; see [Tsoukalas & Mitsos, 2012].
DISPLAY_DIGITS unsigned int 5 Number of digits in output stream

What Errors Can Be Encountered during the Computation of Convex/Concave Bounds?

Errors are managed based on the exception handling mechanism of the C++ language. Each time an error is encountered, a class object of type mc::McCormick::Exceptions is thrown, which contains the type of error. It is the user's responsibility to test whether an exception was thrown during a McCormick relaxation, and then make the appropriate changes. Should an exception be thrown and not caught by the calling program, the execution will stop.

Possible errors encountered during the computation of a McCormick relaxation are:

Errors during Computation of a McCormick relaxation
Number Description
1 Division by zero
2 Inverse with zero in range
3 Log with negative values in range
4 Square-root with nonpositive values in range
5 Inverse sine or cosine with values outside of \([-1,1]\) range
6 Tangent with values outside of \([-\frac{\pi}{2}+k\pi,\frac{\pi}{2}+k\pi]\) range
-1 Inconsistent size of subgradient between two mc::McCormick variables
-2 Failed to compute the convex or concave envelope of a univariate term
-3 Failed to propagate subgradients for a product term with Tsoukalas & Mitsos's multivariable composition result

References