DEMutilities. doe_samples

Design Of Experiment tools for setting up simulation studies with Mpacts

This module relies on having scipy (https://www.scipy.org/) as well as pyDOE (https://pythonhosted.org/pyDOE/) installed.

Example usage:

import DEMutilities.postprocessing.h5storage as h5s
import DEMutilities.doe_samples as doe

f = h5s.H5Storage( "pstudy.h5", openmode='wa' )
section = f.data_section( "doe", overwrite=False)

sampler = doe.LatinHyperCubeDesign( section )

sampler.add_parameter( "params/p1", doe.Normal(0,0.5)
                     , unit='N', axis_label='$p_1$')
sampler.add_parameter( "params/p2", doe.Uniform( 0.5,1.5 )
                     , unit='kg', axis_label='$p_2$')
sampler.add_parameter( "params/p3", doe.ValueList([1,2,3])
                     , unit='-', axis_label='$p_3$')

sample_section = sampler.generate( 12 )
#At this point, you can open 'pstudy.h5' and verify that the samples are properly added.

Hint

For more information on how to create a directory structure from an HDF5 data section with samples, see DEMutilities.multisim_layout.

In order to be able to use this module import it like this:

import DEMutilities.doe_samples
#or assign it to a shorter name
import DEMutilities.doe_samples as doe

DesignBase

class DEMutilities.doe_samples.DesignBase(name, ds, dirkey='dir', dirbasename='sample')

Bases: object

Base class for different design of experiments functionalities. This class should not be instantiated directly.

DesignBase(name, parent, **kwargs)
add_parameter(name, sampling, **kwargs)
generate(samples)

DirectSamplingDesign

class DEMutilities.doe_samples.DirectSamplingDesign(ds, dirkey='dir', **kwargs)

Bases: DEMutilities.doe_samples.DesignBase

Sampling design that generates N samples based on the parameters that have been added For every sample, the generated value for each parameter will simply be the subsequent value returned by the sampling distribution.

  • Example 1: If two parameters have been added with a ValueList distribution with v1=[0,1] and v2=[2,3], the sampling for N=2 will be s1=(0,2) and s2=(1,3).
  • Example 2: Independent random samples can be created with a DirectSamplingDesign by feeding it the desired random distribution for each parameter. For example, if p1=Normal(0,1) and p2=Normal(0,2), the random samples will vary p1 and p2 independently, with values chosen from their normal distribution.
Parameters:
  • ds – DataSet entry in which the sampling will be saved
  • dirkey (str) – (optional) key that will link the generated samples to their simulation directory.
DirectSamplingDesign(name, parent, **kwargs)
add_parameter(name, sampling, **kwargs)

Add a parameter to the design with given name (full path) and sampling distribution.

Parameters:
  • name (str) – name (full path) of the parameter to add a sampling for.
  • sampling (Any distribution deriving from DesignBase) – Distribution from which the values in name will be chosen from
  • **kwargs – Key word arguments that will be passed to the add_data() method in the hdf file when adding the parameter in the design section.
generate(N)

Based on the added parameters, make a direct sampling of N samples and add them to the design section of the provided data set entry.

Parameters:N (int) – Number of samples to generate. Warning: some distributions like ValueList will give an error if more samples are requested than its internal length.

FullFactorialDesign

class DEMutilities.doe_samples.FullFactorialDesign(ds, dirkey='dir', **kwargs)

Bases: DEMutilities.doe_samples.DesignBase

Sampling design that produces the ‘full factorial’ lay-out that combines all possible values from each given parameter. Hence, the total numbers of samples for p parameters will be N=N1*N2*...*Np. See Wikipedia.

Parameters:
  • ds – DataSet entry in which the sampling will be saved
  • dirkey (str) – (optional) key that will link the generated samples to their simulation directory.
FullFactorialDesign(name, parent, **kwargs)
add_full_repeat(name, sampling, **kwargs)
add_parameter(name, sampling, **kwargs)

Add a parameter to the design with given name (full path) and sampling distribution.

Parameters:
  • name (str) – name (full path) of the parameter to add a sampling for.
  • sampling (Any distribution deriving from DesignBase) – Distribution from which the values in name will be chosen from. NOTE: for now, only distribution ValueList is supported for full factorial designs. Other distributions would in theory work as well, as long as somewhere, a valid N for that parameter is provided.
  • **kwargs – Key word arguments that will be passed to the add_data() method in the hdf file when adding the parameter in the design section.
generate(**kwargs)

Based on p parameters, make a full factorial sampling of N=N1*N2*...*Np samples, and add them to the design section of the provided data set entry.

Parameters:**kwargs – Key word arguments that will be passed to the method make_fullfactorial() which performs the actual generation of the full factorial sampling.

LatinHyperCubeDesign

class DEMutilities.doe_samples.LatinHyperCubeDesign(ds, dirkey='dir', **kwargs)

Bases: DEMutilities.doe_samples.DesignBase

Sampling design based on a latin hyper cube design (see Wikipedia) wherein N samples are generated for each parameter based on the given distribution. Their multidimensional distribution will be positioned in a ‘latin square’ or its N-dimensional equivalent ‘latin hyper cube’. This sampling design is powerful for performing a sensitivity analysis or an in-depth parameter study, since it allows to probe the N-dimensional sampling space in a near-random way, while still guaruanteeing a good ‘coverage’ of the whole parameter space.

Note

The generation of latin hyper cube designs relies on the module pyDOE (https://pythonhosted.org/pyDOE/). For systems with pip (https://pypi.python.org/pypi/pip) this can be easily installed as:

sudo pip install --upgrade pyDOE
Parameters:
  • ds – DataSet entry in which the sampling will be saved
  • dirkey (str) – (optional) key that will link the generated samples to their simulation directory.
LatinHyperCubeDesign(name, parent, **kwargs)
add_full_repeat(name, sampling, **kwargs)
add_parameter(name, sampling, **kwargs)

Add a parameter to the design with given name (full path) and sampling distribution.

Parameters:
  • name (str) – name (full path) of the parameter to add a sampling for.
  • sampling (Any distribution deriving from DesignBase) – Distribution from which the values in name will be chosen from
  • **kwargs – Key word arguments that will be passed to the add_data() method in the hdf file when adding the parameter in the design section.
generate(N, **kwargs)

Based on the added parameters, make a latin hypercube sampling of N samples and add them to the design section of the provided data set entry.

Parameters:
  • N (int) – Number of samples to generate. Warning: some distributions like ValueList will give an error if more samples are requested than its internal length.
  • **kwargs – Key word arguments that will be passed to the method make_lhs() which performs the actual generation of the latin hypercube sampling.
DEMutilities.doe_samples.add_samples_to_dataset(samples, ds, dirkey)

Add thee provided samples in a given data set entry.

DEMutilities.doe_samples.add_values_to_dataset(samples, ds, dirkey, design)

In a given data set entry, add a values for each parameter in samples as a seperate data entry.

DEMutilities.doe_samples.make_fullfactorial(cps, dirkey='dir', dirbasename='sample')

Returns a list of dictionaries with the same keys as in ‘cps’, and the values represent a full factorial design of the given values in cps, which can be lists of arbitrary length, dictating the number of levels for that factor.

Strings as dictionary values will be interpreted as fully specified dictionary keys whose values have to be taken over; e.g. E modulus in all contact models of the same particle.

Parameters:
  • cps (dict) – dictionary of parameters to be varied.
  • dirkey (str) – This key will be added to the resulting dictionaries with a value of a possible directory name.
DEMutilities.doe_samples.make_lhs(cps, N, distribution='uniform', dirkey='dir', dirbasename='sample', criterion=None, iterations=None)

Returns a list of dictionaries with the same keys as in ‘cps’, and the values represent a latin hyper square design of the given value ranges in cps, wich must be lists of len==2. The design will contain a maximum total of ‘N’ samples. Currently, uniform or normal distributions are supported; for ‘uniform’ the values of cps will be interpreted as [min,max], for ‘normal’ it will be [mean,std].

Strings as dictionary values will be interpreted as fully specified dictionary keys whose values have to be taken over; e.g. Young’s modulus in all contact models of the same particle.

Parameters:
  • cps (dict) – dictionary of parameters to be varied.
  • N (int) – number of samples
  • distribution – Distribution from which the values will be samples (either ‘normal’/’uniform’ or something deriving from DistributionBase that provides a valid ppf() method).
  • dirkey (str) – this key will be added to the resulting dictionaries with a value of a possible directory name.
  • criterion (str) –

    a string that tells lhs how to sample the points. Options (see also https://pythonhosted.org/pyDOE/randomized.html#latin-hypercube) include:

    • None: (default) simply randomizes the points within the intervals
    • "center" or "c": center the points within the sampling intervals
    • "maximin" or "m": maximize the minimum distance between points, but place the point in a randomized location within its interval
    • "centermaximin" or "cm": same as maximin, but centered within the intervals
    • "correlation" or "corr": minimize the maximum correlation coefficient
  • iterations (int) – number of iterations to improve criterion.