Mixed-Integer Sampling and Surrogate (Continuous Relaxation)¶
SMT provides the mixed_integer
module to adapt existing surrogates to deal with
categorical (or enumerate) and ordered variables using continuous relaxation.
For ordered variables, the values are rounded to the nearest values from a provided list. If, instead, bounds are provided, the list will consist of all integers between those bounds.
For enum variables, as many x features as enumerated levels are created with [0, 1] bounds
and the max of these feature float values will correspond to the choice of one the enum value.
For instance, for a categorical variable (one feature of x) with three levels [“blue”, “red”, “green”], 3 continuous float features x0, x1, x2 are created, the max(x0, x1, x2), let say x1, will give “red” as the value for the original categorical feature.
The user specifies x feature types through a list of types to be either:
FLOAT
: a continuous feature,ORD
: an ordered valued feature,or a tuple
(ENUM, n)
where n is the number of levels of the catagorical feature (i.e. an enumerate with n values)
In the case of mixed integer sampling, bounds of each x feature have to be adapted to take into account feature types. While FLOAT and INT feature still have an interval [lower bound, upper bound], the ENUM features bounds is defined by giving the enumeration/list of possible values (levels).
For instance, if we have the following xtypes
: [FLOAT, ORD, (ENUM, 2), (ENUM, 3)]
,
a compatible xlimits
could be [[0., 4], [-10, 10], ["blue", "red"], ["short", "medium", "long"]]
Mixed-Integer Surrogate with Gower Distance¶
Another implemented method is using a basic mixed integer kernel based on the Gower distance between two points. When constructing the correlation kernel, the distance is redefined as \(\Delta= \Delta_{cont} + \Delta_{cat}\), with \(\Delta_{cont}\) the continuous distance as usual and \(\Delta_ {cat}\) the categorical distance defined as the number of categorical variables that differs from one point to another.
For example, the Gower Distance between [1,'red', 'medium']
and [1.2,'red', 'large']
is \(\Delta= 0.2+ (0\) 'red'
\(=\) 'red'
\(+ 1\) 'medium'
\(\neq\) 'large'
) \(=1.2\)
Example of mixed-integer Gower Distance model¶
from smt.applications.mixed_integer import (
MixedIntegerSurrogateModel,
ENUM,
GOWER,
)
from smt.surrogate_models import KRG
import matplotlib.pyplot as plt
import numpy as np
xt = np.array([0, 2, 4])
yt = np.array([0.0, 1.0, 1.5])
xlimits = [["0.0", "1.0", " 2.0", "3.0", "4.0"]]
# Surrogate
sm = MixedIntegerSurrogateModel(
categorical_kernel=GOWER,
xtypes=[(ENUM, 5)],
xlimits=xlimits,
surrogate=KRG(theta0=[1e-2]),
)
sm.set_training_values(xt, yt)
sm.train()
# DOE for validation
x = np.linspace(0, 4, 5)
y = sm.predict_values(x)
plt.plot(xt, yt, "o", label="data")
plt.plot(x, y, "d", color="red", markersize=3, label="pred")
plt.xlabel("x")
plt.ylabel("y")
plt.legend()
plt.show()
___________________________________________________________________________
Evaluation
# eval points. : 5
Predicting ...
Predicting - done. Time (sec): 0.0000000
Prediction time/pt. (sec) : 0.0000000
Mixed integer sampling method¶
To use a sampling method with mixed integer typed features, the user instanciates
a MixedIntegerSamplingMethod
with a given sampling method.
The MixedIntegerSamplingMethod
implements the SamplingMethod
interface
and decorates the original sampling method to provide a DOE while conforming to integer
and categorical types.
Example of mixed-integer LHS sampling method¶
import numpy as np
import matplotlib.pyplot as plt
from matplotlib import colors
from smt.sampling_methods import LHS
from smt.applications.mixed_integer import (
FLOAT,
ORD,
ENUM,
MixedIntegerSamplingMethod,
)
xtypes = [FLOAT, (ENUM, 2)]
xlimits = [[0.0, 4.0], ["blue", "red"]]
sampling = MixedIntegerSamplingMethod(xtypes, xlimits, LHS, criterion="ese")
num = 40
x = sampling(num)
cmap = colors.ListedColormap(xlimits[1])
plt.scatter(x[:, 0], np.zeros(num), c=x[:, 1], cmap=cmap)
plt.show()
Mixed integer surrogate¶
To use a surrogate with mixed integer constraints, the user instanciates
a MixedIntegerSurrogateModel
with the given surrogate.
The MixedIntegerSurrogateModel
implements the SurrogateModel
interface
and decorates the given surrogate while respecting integer and categorical types.
Example of mixed-integer Polynomial (QP) surrogate¶
import numpy as np
import matplotlib.pyplot as plt
from smt.surrogate_models import QP
from smt.applications.mixed_integer import MixedIntegerSurrogateModel, ORD
xt = np.array([0.0, 1.0, 2.0, 3.0, 4.0])
yt = np.array([0.0, 1.0, 1.5, 0.5, 1.0])
# xtypes = [FLOAT, ORD, (ENUM, 3), (ENUM, 2)]
# FLOAT means x1 continuous
# ORD means x2 ordered
# (ENUM, 3) means x3, x4 & x5 are 3 levels of the same categorical variable
# (ENUM, 2) means x6 & x7 are 2 levels of the same categorical variable
sm = MixedIntegerSurrogateModel(xtypes=[ORD], xlimits=[[0, 4]], surrogate=QP())
sm.set_training_values(xt, yt)
sm.train()
num = 100
x = np.linspace(0.0, 4.0, num)
y = sm.predict_values(x)
plt.plot(xt, yt, "o")
plt.plot(x, y)
plt.xlabel("x")
plt.ylabel("y")
plt.legend(["Training data", "Prediction"])
plt.show()
___________________________________________________________________________
Evaluation
# eval points. : 100
Predicting ...
Predicting - done. Time (sec): 0.0000000
Prediction time/pt. (sec) : 0.0000000
Mixed integer context¶
the MixedIntegerContext
class helps the user to use mixed integer sampling methods and surrogate models consistently
by acting as a factory for those objects given a x specification: (xtypes, xlimits).
- class smt.applications.mixed_integer.MixedIntegerContext(xtypes, xlimits, work_in_folded_space=True, categorical_kernel=None)[source]¶
Class which acts as sampling method and surrogate model factory to handle integer and categorical variables consistently.
Methods
build_sampling_method
(sampling_method_class, ...)Build MixedIntegerSamplingMethod from given SMT sampling method.
build_surrogate_model
(surrogate)Build MixedIntegerSurrogateModel from given SMT surrogate model.
Project continuously relaxed values to their closer assessable values.
cast_to_enum_value
(x_col, enum_indexes)Return enumerate levels from indexes for the given x feature specified by x_col.
Convert an x point with enum indexes to x point with enum levels
Reduce categorical inputs from discrete unfolded space to initial x dimension space where categorical x dimensions are valued by the index in the corresponding enumerate list.
get_unfolded_dimension
()Returns x dimension (int) taking into account unfolded categorical features
get_unfolded_xlimits
()Returns relaxed xlimits Each level of an enumerate gives a new continuous dimension in [0, 1].
Expand categorical inputs from initial x dimension space where categorical x dimensions are valued by the index in the corresponding enumerate list to the discrete unfolded space.
- __init__(xtypes, xlimits, work_in_folded_space=True, categorical_kernel=None)[source]¶
- Parameters
- xtypes: x types list
x type specification: list of either FLOAT, ORD or (ENUM, n) spec.
- xlimits: array-like
bounds of x features
- work_in_folded_space: bool
whether x data are in given in folded space (enum indexes) or not (enum masks)
- categorical_kernel: string
the kernel to use for categorical inputs. Only for non continuous Kriging.
- build_sampling_method(sampling_method_class, **kwargs)[source]¶
Build MixedIntegerSamplingMethod from given SMT sampling method.
- build_surrogate_model(surrogate)[source]¶
Build MixedIntegerSurrogateModel from given SMT surrogate model.
- cast_to_discrete_values(x)[source]¶
Project continuously relaxed values to their closer assessable values. Note: categorical (or enum) x dimensions are still expanded that is there are still as many columns as categorical possible values for the given x dimension. For instance, if an input dimension is typed [“blue”, “red”, “green”] in xlimits a sample/row of the input x may contain the values (or mask) […, 0, 0, 1, …] to specify “green” for this original dimension.
- Parameters
- xnp.ndarray [n_evals, dim]
continuous evaluation point input variable values
- Returns
- np.ndarray
feasible evaluation point value in categorical space.
- fold_with_enum_index(x)[source]¶
Reduce categorical inputs from discrete unfolded space to initial x dimension space where categorical x dimensions are valued by the index in the corresponding enumerate list. For instance, if an input dimension is typed [“blue”, “red”, “green”] a sample/row of the input x may contain the mask […, 0, 0, 1, …] which will be contracted in […, 2, …] meaning the “green” value. This function is the opposite of unfold_with_enum_mask().
- Parameters
- x: np.ndarray [n_evals, dim]
continuous evaluation point input variable values
- Returns
- np.ndarray [n_evals, dim]
evaluation point input variable values with enumerate index for categorical variables
- unfold_with_enum_mask(x)[source]¶
Expand categorical inputs from initial x dimension space where categorical x dimensions are valued by the index in the corresponding enumerate list to the discrete unfolded space. For instance, if an input dimension is typed [“blue”, “red”, “green”] a sample/row of the input x may contain […, 2, …] which will be expanded in […, 0, 0, 1, …]. This function is the opposite of fold_with_enum_index().
- Parameters
- x: np.ndarray [n_evals, nx]
continuous evaluation point input variable values
- Returns
- np.ndarray [n_evals, nx continuous]
evaluation point input variable values with enumerate index for categorical variables
- cast_to_mixed_integer(x)[source]¶
Convert an x point with enum indexes to x point with enum levels
- Parameters
- x: array-like
point to convert
- Returns
- x as a list with enum levels if any
- cast_to_enum_value(x_col, enum_indexes)[source]¶
Return enumerate levels from indexes for the given x feature specified by x_col.
- Parameters
- x_col: int
index of the feature typed as enum
- enum_indexes: list
list of indexes in the possible values for the enum
- Returns
- list of levels (labels) for the given enum feature
Example of mixed-integer context usage¶
import numpy as np
import matplotlib.pyplot as plt
from matplotlib import colors
from mpl_toolkits.mplot3d import Axes3D
from smt.surrogate_models import KRG
from smt.sampling_methods import LHS, Random
from smt.applications.mixed_integer import MixedIntegerContext, FLOAT, ORD, ENUM
xtypes = [ORD, FLOAT, (ENUM, 4)]
xlimits = [[0, 5], [0.0, 4.0], ["blue", "red", "green", "yellow"]]
def ftest(x):
return (x[:, 0] * x[:, 0] + x[:, 1] * x[:, 1]) * (x[:, 2] + 1)
# context to create consistent DOEs and surrogate
mixint = MixedIntegerContext(xtypes, xlimits)
# DOE for training
lhs = mixint.build_sampling_method(LHS, criterion="ese")
num = mixint.get_unfolded_dimension() * 5
print("DOE point nb = {}".format(num))
xt = lhs(num)
yt = ftest(xt)
# Surrogate
sm = mixint.build_surrogate_model(KRG())
sm.set_training_values(xt, yt)
sm.train()
# DOE for validation
rand = mixint.build_sampling_method(Random)
xv = rand(50)
yv = ftest(xv)
yp = sm.predict_values(xv)
plt.plot(yv, yv)
plt.plot(yv, yp, "o")
plt.xlabel("actual")
plt.ylabel("prediction")
plt.show()
DOE point nb = 30
___________________________________________________________________________
Evaluation
# eval points. : 50
Predicting ...
Predicting - done. Time (sec): 0.0000000
Prediction time/pt. (sec) : 0.0000000