cgrem

To run REM, there are three parts of input that need to be provided:

  1. Reference Model: A Python Pickle file, which is specified by --ref, storing the derivatives computed by processing the reference trajectory, e.g., ones from all-atom simulations. This file should be generated by using the cgderiv command before launching this script.

  2. Input for the command “cgderiv”: A text file with runtime options to call the cgderiv command via the REM engine in the format mentioned here. This script file is parsed by the cgderiv command to process trajectories generated by the CG model during each iteration. This file is specified by --cgderiv-arg.

  3. Shell script for MD: A shell script calling the corresponding MD simulations, specified by --md, that will be used by the REM engine in each iteration. The engine will use a simple subprocess to launch the MD simulations, so users need to make sure this script is executable and the MD simulations can be conducted normally by this script.

Table generation: In each iteration, the REM engine will generate input files with tabulated potential/force, which will then be loaded by the MD simulations. The generation of such tables are controlled by the input file specified by --model. In this file, each row defines the tables for a targeted model, in the format of:

<model-name> <min> <max> <resolution> <padding> [initial parameters ...]

The first segment is the name of the model, matching a model defined in the cgderiv input. The next three segments control the generation of the tables with the minimum, maximum, and intervals of the table. The remaining segments are the values used to generate the initial tables during the iterations.

The padding option can be applied to adjust the boundaries of the tabulated potentials. Possible choices are L, L2, H, and the value U is for no-padding. Details can be found in the instruction of CGDUMP .

Optimizer: By default, REM uses a curvature based scheme for the optimization, which is described in the equation 12 and 14 in this paper. An example of defining the parameters for the optimizer is following:

--optimizer builtin,chi=0.5,t=298.15

where the key builtin means to use the default optimizer, and keys chi and t are setting the step mixing ratio and temperatures to control the step size. The REM code also allows customized optimizer developed by users in Python. A detailed discussion is found in this section.

Notes

For the first iteration of REM, the input model & tables will be generated using the parameters defined in the file specified by --model. After every iteration, there will be a restart file to be dumped storing the full history of the iterations. When REM is launched, and the restart file exists, the code will pick up the model from the last round of iteration and resume it.

Customized Optimizer

A customized optimizer can be developed in Python in a file placed in the working directory. In this Python code, a class named Optimizer is designed as in the example below:

class Optimizer:

    def __init__(self, **kwargs):

        # the code to parse the initialize the parameters.
        pass

    def run(self, params, dudl_ref, dudl_mean, dudl_var):

        # the code the give back the model parameters for the next iteration
        pass
  • When using a customized optimizer provided in a Python file, e.g., custom.py, the file name should be the first segment of the option --optimizer:

    --optmizer custom,key=value,...
    
  • The other segments in the option --optimizer should be in key=value format, which will be converted to a Python dictionary provided as the kwargs argument for the constructor of the optimizer class.

  • The customized optimizer class must have a member function named as run, which accept four arguments, which are all Python dictionaries, in which the keys are the names of the targeted models and values are following:

    1. params: an array of parameters for the models,

    2. dudl_ref: an array of the <dU/dL> derivatives calculated from the reference trajectory.

    3. dudl_mean: an array of the mean values of derivates calculated from the trial trajectories.

    4. dudl_var: an array of the variances of derivatives from the trial trajectories.

  • The function should return a Python dictionary containing the new parameters for targeted models, in which the keys are names of the models and the values are model parameters in NumPy arrays.

An example provided below is to demostrate the following update/optimization approach:

  1. for the first and last parameters, keep them fixed.

  2. For the rest parameters, update with a constant step-size factor kappa.

The customized script will be:

# file: my_opt.py
class Optimizer:

    def __init__(self, **kwargs):
        # parse and store the step-size factor
        self.kappa = float(kwargs.get('kappa', 0.01))

    def run(self, params, dudl_ref, dudl_mean, dudl_var):
        # loop over each model
        for name, dudl_aa in dudl_ref.items():
          # get the relative entropy information
          dudl_cg = dudl_mean[name].copy()
          var = dudl_var[name].copy()

          # update parameters
          step = self.kappa * (dudl_cg - dudl_aa) / var
          params[name] += step

And then, this customized script can be used in the CGREM command as:

--optimizer my_opt,kappa=0.01

Examples

cgrem --ref model_ref.p --model model.txt \
      --cgderiv-arg cgderiv.sh --md md.inp \
      --restart restart --table ./ \
      --optimizer builtin,chi=0.5,t=298.15 \
      --maxiter 1000