Efficient Global Optimization (EGO)
===================================

Bayesian Optimization
---------------------

Bayesian optimization is defined by Jonas Mockus in [1]_ as an optimization technique 
based upon the minimization of the expected deviation from the extremum of the studied function. 

The objective function is treated as a black-box function. A Bayesian strategy sees the objective 
as a random function and places a prior over it. The prior captures our beliefs about the behavior 
of the function. After gathering the function evaluations, which are treated as data, the prior is 
updated to form the posterior distribution over the objective function. The posterior distribution, 
in turn, is used to construct an acquisition function (often also referred to as infill sampling 
criterion) that determines what the next query point should be.

One of the earliest bodies of work on Bayesian optimisation that we are aware 
of are [2]_ and [3]_. Kushner used Wiener processes for one-dimensional problems.
Kushner’s decision model was based on maximizing the probability of improvement, and included a 
parameter that controlled the trade-off between ‘more global’ and ‘more local’ optimization, in 
the same spirit as the Exploration/Exploitation trade-off.

Meanwhile, in the former Soviet Union, Mockus and colleagues developed a multidimensional 
Bayesian optimization method using linear combinations of Wiener fields, some of which was 
published in English in [1]_. This paper also describes an acquisition function that 
is based on myopic expected improvement of the posterior, which has been widely adopted in 
Bayesian optimization as the Expected Improvement function.

In 1998, Jones used Gaussian processes together with the expected improvement function to 
successfully perform derivative-free optimization and experimental design through an algorithm 
called  Efficient  Global  Optimization, or EGO.

EGO
---

In what follows, we describe the Efficient Global Optimization (EGO) algorithm, 
as published in [4]_.

Let :math:`F` be an expensive black-box function to be minimized. We sample :math:`F` at the 
different locations :math:`X = \{x_1, x_2,\ldots,x_n\}` yielding the responses 
:math:`Y = \{y_1, y_2,\ldots,y_n\}`. We build a Kriging model (also called Gaussian process) 
with a mean function :math:`\mu` and a variance function :math:`\sigma^{2}`.

The next step is to compute the criterion EI. To do this, let us denote:

.. math::
	\begin{equation}
	f_{min} = \min \{y_1, y_2,\ldots,y_n\}.
	\end{equation}

The Expected Improvement function (EI) can be expressed:

.. math::
	\begin{equation}	
	E[I(x)] = E[\max(f_{min}-Y, 0)]
	\end{equation}

where :math:`Y` is the random variable following the distribution :math:`\mathcal{N}(\mu(x), \sigma^{2}(x))`.
By expressing the right-hand side of EI expression as an integral, and applying some tedious 
integration by parts, one can express the expected improvement in closed form: 

.. math::
  \begin{equation}	
  E[I(x)] = (f_{min} - \mu(x))\Phi\left(\frac{f_{min} - \mu(x)}{\sigma(x)}\right) + \sigma(x) \phi\left(\frac{f_{min} - \mu(x)}{\sigma(x)}\right)
  \end{equation}

where :math:`\Phi(\cdot)` and :math:`\phi(\cdot)` are respectively the cumulative and probability 
density functions of :math:`\mathcal{N}(0,1)`.

Next, we determine our next sampling point as :

.. math::
	\begin{equation}
	x_{n+1} = \arg \max_{x} \left(E[I(x)]\right)
	\end{equation}

We then test the response :math:`y_{n+1}` of our black-box function :math:`F` at :math:`x_{n+1}`, 
rebuild the model taking into account the new information gained, and research 
the point of maximum expected improvement again.

We summarize here the EGO algorithm:

EGO(F, :math:`n_{iter}`) \# Find the best minimum of :math:`\operatorname{F}` 
in :math:`n_{iter}` iterations  

For (:math:`i=0:n_{iter}`)  

* :math:`mod = {model}(X, Y)`  \# surrogate model based on sample vectors :math:`X` and :math:`Y`  
* :math:`f_{min} = \min Y`  
* :math:`x_{i+1} = \arg \max {EI}(mod, f_{min})` \# choose :math:`x` that maximizes EI  
* :math:`y_{i+1} = {F}(x_{i+1})` \# Probe the function at most promising point :math:`x_{i+1}`  
* :math:`X = [X,x_{i+1}]`  
* :math:`Y = [Y,y_{i+1}]`   
* :math:`i = i+1`  

:math:`f_{min} = \min Y`  

Return : :math:`f_{min}` \# This is the best known solution after :math:`n_{iter}` iterations.

More details can be found in [4]_.

EGO parallel (EGO with qEI criterion)
-------------------------------------


The goal is to be able to run batch optimization. At each iteration of the algorithm, multiple new sampling points are extracted 
from the know ones. These new sampling points are then evaluated using a parallel computing environment. 

The parallel version of this algorithm has been presented by Ginsbourger et al. [5]_ in 2010. 
The Expected improvement (`EI`) is extended to proposed q new sampling points instead of one, 
they called this criterion the `qEI` criterion. As the exact evaluation is not straightforward 
they proposed different ways to approximate this criterion.

Details of the implementation can be found in [6]_ [7]_.

Differents approximation strategy of the `qEI` criterion
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

The basic idea is to run :math:`q` iterations of the `EGO` algorithm and to set temporally the response :math:`\hat y_q`  of the :math:`q` new sampling points to a virtual value. 
When the :math:`q` new sampling points are defined the real evaluation of the response :math:`y_q` of these points is done in parallel.
The efficiency of the methods lies in the strategy to set the virtual values. 

Let assume that the new sampling point is at :math:`x_q`. The virtual response :math:`y_q` is set according to one of those strategies: 

The minimum constant liar (`CLmin`) strategy 
""""""""""""""""""""""""""""""""""""""""""""

:math:`\hat y_q = \min (Y)`

The Kriging believer (`KB`) strategy 
""""""""""""""""""""""""""""""""""""

The Kriging model gives a mean function  :math:`\mu` and a variance function :math:`\sigma^2` based on sample vectors :math:`X` and :math:`Y`. 

The virtual values are set according to the model prediction:

:math:`\hat y_q = \mu (x_q)`

Some variants are proposed to introduce an optimistic or pessimistic part :

    * the Kriging Believer Upper Bound (KBUB) : :math:`\hat y_q = \mu (x_q) + 3 \sigma` 
    * the Kriging Believer Lower Bound (KBLB) : :math:`\hat y_q = \mu (x_q) - 3 \sigma`

Tips for an efficient use
^^^^^^^^^^^^^^^^^^^^^^^^^

* the `n_parallel` parameter is set by the user, a real improvement of the efficiency is observed for relatively low values of the parameter (<8) [3]_
* Since the maximization of the `EI` is a highly multimodal optimization problem, it could be necessary to increase the `n_start` parameter of the algorithm. 


Implementation Notes
--------------------

Beside the Expected Improvement, the implementation here offers two other infill criteria:

* SBO (Surrogate Based Optimization): directly using the prediction of the surrogate model (:math:`\mu`)
* LCB (Lower Confidence Bound): using the 99% confidence interval :math:`\mu -3 \times \sigma`

Regarding the parallel execution, one can implement specific multiprocessing by deriving the _Evaluator_ interface
and overriding the default implementation of the _run(fun, x)_ method. The default implementation simply runs _fun(x)_.

Regardless the others parameters, you can specify a mixed surrogate model to make mixed optimization. See [8]_ .
The expected improvement is continuously computed and optimized so that can lead to an infill point that will be projected, in the mixed case, to an already evaluated point.
To avoid the re-evaluation of a point, you can penalize the Expected Improvement via tunneling which decrease the EI in the neighbourhood of the known DOE points.
However, this is not recommanded for high dimensional problems because the re-evaluation is uncommon. Tunneling evaluation can be slow with a lot of point.

When considering a mixed integer optimization, the function to be optimized by EGO
has to handle categorical variables as indexes in the given enumaration type. For instance, 
with a categorical enumeration ``["red", "green", "blue"]``,  passing ``"blue"`` to the 
function should be handled by passing the value ``2`` which is the index of ``"blue"`` 
in the enumeration list. This choice was made to keep on using a numerical ndarray as interface
of the function to be optimized ``f: [n_samples, n_features] -> [n_eval, 1]`` allowing parallel
evaluations.

References
----------

.. [1] Mockus, J. (1975). On Bayesian methods for seeking the extremum. In Optimization Techniques IFIP Technical Conference (pp. 400-404). Springer, Berlin, Heidelberg.

.. [2] Kushner, H. J. (1962). A versatile stochastic model of a function of unknown and time varying form. Journal of Mathematical Analysis and Applications, 5(1), 150-167.

.. [3] Kushner, H. J. (1964). A new method of locating the maximum point of an arbitrary multipeak curve in the presence of noise. Journal of Basic Engineering, 86(1), 97-106.

.. [4] Jones, D. R., Schonlau, M., & Welch, W. J. (1998). Efficient global optimization of expensive black-box functions. Journal of Global optimization, 13(4), 455-492.

.. [5] Ginsbourger, D., Le Riche, R., & Carraro, L. (2010). Kriging is well-suited to parallelize optimization. In Computational intelligence in expensive optimization problems (pp. 131-162). Springer, Berlin, Heidelberg.

.. [6] Roux, E. , Tillier, Y. , Kraria, S., & Bouchard, P.-O. (2020). An efficient parallel global optimization strategy based on Kriging properties suitable for material parameter identification.In AME, accepted for publication. 

.. [7] Roux, E. (2011). Assemblage mécanique: stratégies d'optimisation des procédés et d'identification des comportements mécaniques des matériaux (Doctoral dissertation).

.. [8] Saves, P. and Diouane, Y. and Bartoli, N. and Lefebvre, T. and Morlier, J., A general square exponential kernel to handle mixed-categorical variables for Gaussian process, AIAA Aviation 2022 Forum, pp. 3870. 


Usage
-----

.. code-block:: python

  import numpy as np
  from smt.applications import EGO
  from smt.surrogate_models import KRG
  from smt.utils.design_space import DesignSpace
  import matplotlib.pyplot as plt
  
  def function_test_1d(x):
      # function xsinx
      import numpy as np
  
      x = np.reshape(x, (-1,))
      y = np.zeros(x.shape)
      y = (x - 3.5) * np.sin((x - 3.5) / (np.pi))
      return y.reshape((-1, 1))
  
  n_iter = 6
  xlimits = np.array([[0.0, 25.0]])
  
  random_state = 42  # for reproducibility
  design_space = DesignSpace(xlimits, random_state=random_state)
  xdoe = np.atleast_2d([0, 7, 25]).T
  n_doe = xdoe.size
  
  criterion = "EI"  #'EI' or 'SBO' or 'LCB'
  
  ego = EGO(
      n_iter=n_iter,
      criterion=criterion,
      xdoe=xdoe,
      surrogate=KRG(design_space=design_space, print_global=False),
      random_state=random_state,
  )
  
  x_opt, y_opt, _, x_data, y_data = ego.optimize(fun=function_test_1d)
  print("Minimum in x={:.1f} with f(x)={:.1f}".format(float(x_opt), float(y_opt)))
  
  x_plot = np.atleast_2d(np.linspace(0, 25, 100)).T
  y_plot = function_test_1d(x_plot)
  
  fig = plt.figure(figsize=[10, 10])
  for i in range(n_iter):
      k = n_doe + i
      x_data_k = x_data[0:k]
      y_data_k = y_data[0:k]
      ego.gpr.set_training_values(x_data_k, y_data_k)
      ego.gpr.train()
  
      y_gp_plot = ego.gpr.predict_values(x_plot)
      y_gp_plot_var = ego.gpr.predict_variances(x_plot)
      y_ei_plot = -ego.EI(x_plot)
  
      ax = fig.add_subplot((n_iter + 1) // 2, 2, i + 1)
      ax1 = ax.twinx()
      (ei,) = ax1.plot(x_plot, y_ei_plot, color="red")
  
      (true_fun,) = ax.plot(x_plot, y_plot)
      (data,) = ax.plot(
          x_data_k, y_data_k, linestyle="", marker="o", color="orange"
      )
      if i < n_iter - 1:
          (opt,) = ax.plot(
              x_data[k], y_data[k], linestyle="", marker="*", color="r"
          )
      (gp,) = ax.plot(x_plot, y_gp_plot, linestyle="--", color="g")
      sig_plus = y_gp_plot + 3 * np.sqrt(y_gp_plot_var)
      sig_moins = y_gp_plot - 3 * np.sqrt(y_gp_plot_var)
      un_gp = ax.fill_between(
          x_plot.T[0], sig_plus.T[0], sig_moins.T[0], alpha=0.3, color="g"
      )
      lines = [true_fun, data, gp, un_gp, opt, ei]
      fig.suptitle("EGO optimization of $f(x) = x \sin{x}$")
      fig.subplots_adjust(hspace=0.4, wspace=0.4, top=0.8)
      ax.set_title("iteration {}".format(i + 1))
      fig.legend(
          lines,
          [
              "f(x)=xsin(x)",
              "Given data points",
              "Kriging prediction",
              "Kriging 99% confidence interval",
              "Next point to evaluate",
              "Expected improvment function",
          ],
      )
  plt.show()
  # Check the optimal point is x_opt=18.9, y_opt =-15.1
  
::

  Minimum in x=18.9 with f(x)=-15.1
  
.. figure:: ego_TestEGO_run_ego_example.png
  :scale: 80 %
  :align: center

Usage with parallel options
^^^^^^^^^^^^^^^^^^^^^^^^^^^

.. code-block:: python

  import numpy as np
  from smt.applications import EGO
  from smt.applications.ego import Evaluator
  from smt.surrogate_models import KRG, DesignSpace
  
  import matplotlib.pyplot as plt
  
  def function_test_1d(x):
      # function xsinx
      import numpy as np
  
      x = np.reshape(x, (-1,))
      y = np.zeros(x.shape)
      y = (x - 3.5) * np.sin((x - 3.5) / (np.pi))
      return y.reshape((-1, 1))
  
  n_iter = 3
  n_parallel = 3
  n_start = 50
  xlimits = np.array([[0.0, 25.0]])
  
  random_state = 42
  design_space = DesignSpace(xlimits, random_state=random_state)
  xdoe = np.atleast_2d([0, 7, 25]).T
  n_doe = xdoe.size
  
  class ParallelEvaluator(Evaluator):
      """
      Implement Evaluator interface using multiprocessing ThreadPool object (Python 3 only).
      """
  
      def run(self, fun, x):
          n_thread = 5
          # Caveat: import are made here due to SMT documentation building process
          import numpy as np
          from sys import version_info
          from multiprocessing.pool import ThreadPool
  
          if version_info.major == 2:
              return fun(x)
          # Python 3 only
          with ThreadPool(n_thread) as p:
              return np.array(
                  [
                      y[0]
                      for y in p.map(
                          fun, [np.atleast_2d(x[i]) for i in range(len(x))]
                      )
                  ]
              )
  
  criterion = "EI"  #'EI' or 'SBO' or 'LCB'
  qEI = "KBUB"  # "KB", "KBLB", "KBUB", "KBRand"
  ego = EGO(
      n_iter=n_iter,
      criterion=criterion,
      xdoe=xdoe,
      surrogate=KRG(design_space=design_space, print_global=False),
      n_parallel=n_parallel,
      qEI=qEI,
      n_start=n_start,
      evaluator=ParallelEvaluator(),
      random_state=random_state,
  )
  
  x_opt, y_opt, _, x_data, y_data = ego.optimize(fun=function_test_1d)
  print("Minimum in x={:.1f} with f(x)={:.1f}".format(float(x_opt), float(y_opt)))
  
  x_plot = np.atleast_2d(np.linspace(0, 25, 100)).T
  y_plot = function_test_1d(x_plot)
  
  fig = plt.figure(figsize=[10, 10])
  for i in range(n_iter):
      k = n_doe + (i) * (n_parallel)
      x_data_k = x_data[0:k]
      y_data_k = y_data[0:k]
      x_data_sub = x_data_k.copy()
      y_data_sub = y_data_k.copy()
      for p in range(n_parallel):
          ego.gpr.set_training_values(x_data_sub, y_data_sub)
          ego.gpr.train()
  
          y_ei_plot = -ego.EI(x_plot)
          y_gp_plot = ego.gpr.predict_values(x_plot)
          y_gp_plot_var = ego.gpr.predict_variances(x_plot)
  
          x_data_sub = np.append(x_data_sub, x_data[k + p])
          y_KB = ego._get_virtual_point(np.atleast_2d(x_data[k + p]), y_data_sub)
  
          y_data_sub = np.append(y_data_sub, y_KB)
  
          ax = fig.add_subplot(n_iter, n_parallel, i * (n_parallel) + p + 1)
          ax1 = ax.twinx()
          (ei,) = ax1.plot(x_plot, y_ei_plot, color="red")
  
          (true_fun,) = ax.plot(x_plot, y_plot)
          (data,) = ax.plot(
              x_data_sub[: -1 - p],
              y_data_sub[: -1 - p],
              linestyle="",
              marker="o",
              color="orange",
          )
          (virt_data,) = ax.plot(
              x_data_sub[-p - 1 : -1],
              y_data_sub[-p - 1 : -1],
              linestyle="",
              marker="o",
              color="g",
          )
  
          (opt,) = ax.plot(
              x_data_sub[-1], y_data_sub[-1], linestyle="", marker="*", color="r"
          )
          (gp,) = ax.plot(x_plot, y_gp_plot, linestyle="--", color="g")
          sig_plus = y_gp_plot + 3.0 * np.sqrt(y_gp_plot_var)
          sig_moins = y_gp_plot - 3.0 * np.sqrt(y_gp_plot_var)
          un_gp = ax.fill_between(
              x_plot.T[0], sig_plus.T[0], sig_moins.T[0], alpha=0.3, color="g"
          )
          lines = [true_fun, data, gp, un_gp, opt, ei, virt_data]
          fig.suptitle("EGOp optimization of $f(x) = x \sin{x}$")
          fig.subplots_adjust(hspace=0.4, wspace=0.4, top=0.8)
          ax.set_title("iteration {}.{}".format(i, p))
          fig.legend(
              lines,
              [
                  "f(x)=xsin(x)",
                  "Given data points",
                  "Kriging prediction",
                  "Kriging 99% confidence interval",
                  "Next point to evaluate",
                  "Expected improvment function",
                  "Virtula data points",
              ],
          )
  plt.show()
  
::

  Minimum in x=19.0 with f(x)=-15.1
  
.. figure:: ego_TestEGO_run_ego_parallel_example.png
  :scale: 80 %
  :align: center


Usage with mixed variable
^^^^^^^^^^^^^^^^^^^^^^^^^^^
.. code-block:: python

  import numpy as np
  from smt.applications import EGO
  from smt.applications.mixed_integer import MixedIntegerContext
  from smt.surrogate_models import MixIntKernelType
  from smt.utils.design_space import (
      DesignSpace,
      CategoricalVariable,
      FloatVariable,
      IntegerVariable,
  )
  import matplotlib.pyplot as plt
  from smt.surrogate_models import KRG
  
  # Regarding the interface, the function to be optimized should handle
  # categorical values as index values in the enumeration type specification.
  # For instance, here "blue" will be passed to the function as the index value 2.
  # This allows to keep the numpy ndarray X handling numerical values.
  def function_test_mixed_integer(X):
      # float
      x1 = X[:, 0]
      #  enum 1
      c1 = X[:, 1]
      x2 = c1 == 0
      x3 = c1 == 1
      x4 = c1 == 2
      #  enum 2
      c2 = X[:, 2]
      x5 = c2 == 0
      x6 = c2 == 1
      # int
      i = X[:, 3]
  
      y = (
          (x2 + 2 * x3 + 3 * x4) * x5 * x1
          + (x2 + 2 * x3 + 3 * x4) * x6 * 0.95 * x1
          + i
      )
      return y.reshape((-1, 1))
  
  n_iter = 15
  random_state = 42
  design_space = DesignSpace(
      [
          FloatVariable(-5, 5),
          CategoricalVariable(["blue", "red", "green"]),
          CategoricalVariable(["square", "circle"]),
          IntegerVariable(0, 2),
      ],
      random_state=random_state,
  )
  
  criterion = "EI"  #'EI' or 'SBO' or 'LCB'
  qEI = "KBRand"
  sm = KRG(
      design_space=design_space,
      categorical_kernel=MixIntKernelType.GOWER,
      print_global=False,
  )
  mixint = MixedIntegerContext(design_space)
  n_doe = 3
  sampling = mixint.build_sampling_method(random_state=random_state)
  xdoe = sampling(n_doe)
  ydoe = function_test_mixed_integer(xdoe)
  
  ego = EGO(
      n_iter=n_iter,
      criterion=criterion,
      xdoe=xdoe,
      ydoe=ydoe,
      surrogate=sm,
      qEI=qEI,
      n_parallel=2,
      random_state=random_state,
  )
  
  x_opt, y_opt, _, _, y_data = ego.optimize(fun=function_test_mixed_integer)
  print("Minimum in x={} with f(x)={:.1f}".format(x_opt, float(y_opt)))
  # print("Minimum in typed x={}".format(ego.mixint.cast_to_mixed_integer(x_opt)))
  
  min_ref = -15
  mini = np.zeros(n_iter)
  for k in range(n_iter):
      mini[k] = np.log(np.abs(np.min(y_data[0 : k + n_doe - 1]) - min_ref))
  x_plot = np.linspace(1, n_iter + 0.5, n_iter)
  up = max(np.floor(max(mini)) + 1, -100)
  lo = max(np.floor(min(mini)) - 0.2, -10)
  fig = plt.figure()
  axes = fig.add_axes([0.1, 0.1, 0.8, 0.8])
  axes.plot(x_plot, mini, color="r")
  axes.set_ylim([lo, up])
  plt.title("minimum convergence plot", loc="center")
  plt.xlabel("number of iterations")
  plt.ylabel("log of the difference w.r.t the best")
  plt.show()
  
::

  Minimum in x=[-4.88885884  2.          0.          0.        ] with f(x)=-14.7
  
.. figure:: ego_TestEGO_run_ego_mixed_integer_example.png
  :scale: 80 %
  :align: center


Options
-------

.. list-table:: List of options
  :header-rows: 1
  :widths: 15, 10, 20, 20, 30
  :stub-columns: 0

  *  -  Option
     -  Default
     -  Acceptable values
     -  Acceptable types
     -  Description
  *  -  fun
     -  None
     -  None
     -  ['function']
     -  Function to minimize
  *  -  criterion
     -  EI
     -  ['EI', 'SBO', 'LCB']
     -  ['str']
     -  criterion for next evaluation point determination: Expected Improvement,             Surrogate-Based Optimization or Lower Confidence Bound
  *  -  n_iter
     -  None
     -  None
     -  ['int']
     -  Number of optimizer steps
  *  -  n_max_optim
     -  20
     -  None
     -  ['int']
     -  Maximum number of internal optimizations
  *  -  n_start
     -  20
     -  None
     -  ['int']
     -  Number of optimization start points
  *  -  n_parallel
     -  1
     -  None
     -  ['int']
     -  Number of parallel samples to compute using qEI criterion
  *  -  qEI
     -  KBLB
     -  ['KB', 'KBLB', 'KBUB', 'KBRand', 'CLmin']
     -  ['str']
     -  Approximated q-EI maximization strategy
  *  -  evaluator
     -  <smt.applications.ego.Evaluator object at 0x16ad0f070>
     -  None
     -  ['Evaluator']
     -  Object used to run function fun to optimize at x points (nsamples, nxdim)
  *  -  n_doe
     -  None
     -  None
     -  ['int']
     -  Number of points of the initial LHS doe, only used if xdoe is not given
  *  -  xdoe
     -  None
     -  None
     -  ['ndarray']
     -  Initial doe inputs
  *  -  ydoe
     -  None
     -  None
     -  ['ndarray']
     -  Initial doe outputs
  *  -  verbose
     -  False
     -  None
     -  ['bool']
     -  Print computation information
  *  -  enable_tunneling
     -  False
     -  None
     -  ['bool']
     -  Enable the penalization of points that have been already evaluated in EI criterion
  *  -  surrogate
     -  <smt.surrogate_models.krg.KRG object at 0x16e33b550>
     -  None
     -  ['KRG', 'KPLS', 'KPLSK', 'GEKPLS', 'MGP']
     -  SMT kriging-based surrogate model used internaly
  *  -  random_state
     -  None
     -  None
     -  ['NoneType', 'int', 'RandomState']
     -  Numpy RandomState object or seed number which controls random draws