DEMutilities.postprocessing.operators. operators

Module that provides functionality for postprocessing operations by defining a common container class, to which post-processing ‘operators’ can be added that can provide any arbitrary user-defined operations. In general these operators strongly rely on and function similar to operations on arrays with numpy (http://www.numpy.org/) which must be installed.

Attention

Instead of directly ‘performing’ an operation, the execution of an operator does NOT happen when it is made but only when its containing AnalysisContainer performs its loop function, where it consecutively executes all its operators in the order that they were added. An inevitable consequence of this is that numerical errors will only be raised with a delay, which might hinder de-bugging a bit. However, we try to provide the original operator instantiation backtrace when an error is raised. The main benefit of this approach is that the main loop of a postprocessing analysis (which can be over simulation frames, but also over data set entries or any other compatible object), can be decoupled from the actual analysis, enabling modular and shared postprocessing operations and preventing unnecessary read/write operations.

Hint

To make their use easier, we have overloaded some standard Python methods on operators, so that the manipulation of the underlying data becomes more intuitive and less cumbersome. Using these invokes Function operators with corresponding functions on the underlying data. The following operations are allowed:

  • Mathematical operations: +, -, *, /, **, abs (add, subtract, multiply, divide, power, absolute value)
  • Comparisons: <, <=, >, >=, == (less than, less than or equal, more than, more than or equal, equal)
  • Boolean operations: &, |, ^, ~ (and, or, xor, not)
  • Class method size() is equivalent to calling len() on the underlying data

Attention

Standard Python boolean operations (and, or, not) should NOT be used on operators. This is because Python itself expects the results of these to be ‘booleans’, which is in our case not the desired behavior. Instead, we use the ‘bit-wise’ alternatives. In other words do NOT do this:

mask = ( r > 0 ) and ( r < 1)
inverse = (not mask)

but do this:

mask = ( r > 0 ) & ( r < 1)
inverse = ~mask

to prevent weird behavior.

Hint

The access operator [] has the same functionality as a Filter. When a mask is used as an argument it ‘applies’ the mask on the given operator, e.g. r = r[ r < 0.5 ]. This is the same behavior as np.ndarray arrays. The operator [] can also be used for regular access and slicing, e.g. r = r[0] or r = r[10:100:2].

Example usage:

import DEMutilities.postprocessing.operators.operators as ops
import mpacts.io.datasave as ds
import DEMutilities.numeric_functions as nf
import numpy as np

#Make an analysis container
a = op.AnalysisContainer()

#Any 'reader'-compatible class can be used! Here we use a Data reader that
#reads mpacts simulation frames.
dr = ds.DataReader( "simulation", folder='./' )

#Getter for 'time' form a simulation frame:
ti = a.GetData('time')

#Make a mask, selecting only times > 1.0
time_mask = ti > 1.0

#Recorder that appends the time from 'ti' in a list during a loop over data frames:
time = a.Recorder( ti )

#Print the frame index during the analysis:
a.FrameIndexPrinter(dr[-1].index)

#Magnitude of spheres x array
vmag = a.Function( nf.mag, a.Get('spheres/v'))
#Take the mean and convert velocity in km/h.
#Note that basic mathematical operations can be performed directly on
#operators, e.g. multiply with 3.6.
vmean = a.Function( np.mean, vmag, 0)*3.6

#Actually perform the analysis, by looping over the data frames in dr
a.loop( dr, mask_function = time_mask)

#the bracket operator allows access to the computed data.
#For example, here we computed mean velocity magnitude as a function of time.
output_data = ( time(), vmean() )

In order to be able to use this module import it like this:

import DEMutilities.postprocessing.operators.operators
#or assign it to a shorter name
import DEMutilities.postprocessing.operators.operators as ope

Accumulator

class DEMutilities.postprocessing.operators.operators.Accumulator(anset, data, mask=True, average=False, **kwargs)

Bases: DEMutilities.postprocessing.operators.operators.OperatorBase

Operator that internally keeps a value and adds each time a record by values by calling () on a given data object

Parameters:
  • anset (AnalysisContainer) – The analysis container in which this Operator will be added.
  • data (OperatorBase) – Operator that returns data using the __call__ method which will be called in the execute method of this operator
  • mask (Either an instance of OperatorBase or a boolean) – (optional). Mask which determines whether the current operator should be appending its results for this instantiation of execute.
  • average (bool) – When True, the value returned by () will be divided by the number of accumulations.
Key word arguments passed to the base class OperatorBase:
  • name (str): If given, this operator will be added to the AnalysisContainer with this name If not given, the unique Python object id() will be used instead.
Accumulator(name, parent, **kwargs)
deep_execute(datasource)
execute(datasource)

Internally add data from the ‘data’ member of this class upon calling its __call__ method.

Parameters:datasource (DataFrameReader) – An instance (or any object with an equivalent interface) of a DataFrameReader
reset()

Resets the accumulator so that its internal value is set to 0.

Action

class DEMutilities.postprocessing.operators.operators.Action(anset, operator, *args, **kwargs)

Bases: DEMutilities.postprocessing.operators.operators.OperatorBase

Operator that evaluates any given function and ignores the resulting return value. This is the only difference compared to Function.

Arguments:

Parameters:
  • anset (AnalysisContainer) – The analysis container in which this Operator will be added.
  • operator (func) – Any Python method
  • *args – Arguments that will be passed to the given operator
Key word arguments specific for Function itself:
  • print_first (bool): If True, the function arguments will be first printed before evaluation.

This is very useful for debugging if something goes wrong in the operator chain.

Key word arguments passed to the base class OperatorBase:
  • name (str): If given, this operator will be added to the AnalysisContainer with this name

If not given, the unique Python object id() will be used instead.

Action(name, parent, **kwargs)
deep_execute(datasource)

Deep execution of all this operator’s dependent operators, followed by execution of this operator

Parameters:datasource (DataFrameReader) – An instance (or any object with an equivalent interface) of a DataFrameReader
execute(datasource)

Evaluates the contained function, passing it the contained arguments

Parameters:datasource (DataFrameReader) – An instance (or any object with an equivalent interface) of a DataFrameReader

AnalysisContainer

class DEMutilities.postprocessing.operators.operators.AnalysisContainer

Bases: DEMutilities.postprocessing.operators.operators.BaseContainer

Main container which keeps a set of operators that can be executed with execute and provides functionality to loop over this set for different ‘states’ (e.g. a simulation data frame) automatically.

AnalysisContainer(name, parent, **kwargs)
Accumulator(*args, **kwargs)
Action(*args, **kwargs)
Comparison(*args, **kwargs)
Counter(*args, **kwargs)
DummyOperator(*args, **kwargs)
Filter(*args, **kwargs)
ForEach(*args, **kwargs)
FrameIndexPrinter(*args, **kwargs)
Function(*args, **kwargs)
GetData(*args, **kwargs)
HasData(*args, **kwargs)
class PostContainer(p)

Bases: DEMutilities.postprocessing.operators.operators.BaseContainer

add(op)
AnalysisContainer.Printer(*args, **kwargs)
AnalysisContainer.Recorder(*args, **kwargs)
AnalysisContainer.StandardDeviationEstimator(*args, **kwargs)
AnalysisContainer.add(op)
AnalysisContainer.after_loop()
AnalysisContainer.execute(data=None)

Executes the contained list of operators with a given data state (e.g. a simulation data frame)

Parameters:data – default None, a data state instance which will be passed to the underlying operators
AnalysisContainer.has_op_with_name(name)
AnalysisContainer.loop(data, mask_function=<DEMutilities.postprocessing.operators.operators.DummyOperator object>)

Loops over an iterable collection of data states and performs ‘execute’ for each iteration

Parameters:
  • data – An iterable object for which the elements can be passed to the ‘execute’ method. For example, a DataReader instance to analyze one simulation.
  • mask_function (OperatorBase) – An optional operator which will be used to perform steps of the iteration only conditionally. The return value of the object when called with the () operator should be convertible to a bool that determines whether for the given iteration, ‘execute’ should be called. If not specified, every iteration will be executed.
AnalysisContainer.post_execute()
AnalysisContainer.reset()

BaseContainer

class DEMutilities.postprocessing.operators.operators.BaseContainer

Bases: object

BaseContainer(name, parent, **kwargs)

Comparison

class DEMutilities.postprocessing.operators.operators.Comparison(anset, operator, init_value, *args, **kwargs)

Bases: DEMutilities.postprocessing.operators.operators.OperatorBase

Operator that compares between the current and a previous stored value with a custom comparison function The first value it encounters is stored regardless. We start with a given init_value.

Comparison(name, parent, **kwargs)
deep_execute(datasource)

Deep execution of all this operator’s dependent operators, followed by execution of this operator

Parameters:datasource (DataFrameReader) – An instance (or any object with an equivalent interface) of a DataFrameReader
execute(datasource)

Evaluates the comparison

Counter

class DEMutilities.postprocessing.operators.operators.Counter(anset, start_index=-1, **kwargs)

Bases: DEMutilities.postprocessing.operators.operators.OperatorBase

Simple ‘counter’ that keeps track of the number of times it has been executed.

Parameters:
  • anset (AnalysisContainer) – The analysis container in which this Operator will be added.
  • start_index (int) – Value to initialize count with The default value, -1, ensures 0-based array indexing and is usually recommended.
Key word arguments passed to the base class OperatorBase:
  • name (str): If given, this operator will be added to the AnalysisContainer with this name If not given, the unique Python object id() will be used instead.
Counter(name, parent, **kwargs)
execute(datasource)

Increments the count member by one

Parameters:datasource (DataFrameReader) – A DataFrameReader object with a valid ‘DataFrameIndex’ member
reset()

Resets the counter to given start_index

DummyOperator

class DEMutilities.postprocessing.operators.operators.DummyOperator(anset=None, **kwargs)

Bases: DEMutilities.postprocessing.operators.operators.OperatorBase

Dummy Operator which does nothing in its execute and always returns True when being called.

Parameters:anset – (optional) The analysis container in which the Operator will be added.
Key word arguments passed to the base class OperatorBase:
  • name (str): If given, this operator will be added to the AnalysisContainer with this name If not given, the unique Python object id() will be used instead.
DummyOperator(name, parent, **kwargs)
deep_execute(datasource)
execute(datasource)

Filter

class DEMutilities.postprocessing.operators.operators.Filter(anset, array, mask, **kwargs)

Bases: DEMutilities.postprocessing.operators.operators.OperatorBase

Operator that filters a given array with a given ‘mask’

Parameters:
  • anset (AnalysisContainer) – The analysis container in which this Operator will be added.
  • array (OperatorBase) – Operator on the results of which the ‘mask’ will be applied
  • mask (Either an instance of OperatorBase or any array access object (int/slice/np.ndarray)) – (optional). Mask which determines whether the current operator should be appending its results for this instantiation of execute.
Key word arguments specific for Filter itself:
  • print_first (bool): If True, the function arguments will be first printed before evaluation. This is very useful for debugging if something goes wrong in the operator chain.
Key word arguments passed to the base class OperatorBase:
  • name (str): If given, this operator will be added to the AnalysisContainer with this name If not given, the unique Python object id() will be used instead.
Filter(name, parent, **kwargs)
deep_execute(datasource)

Deep execution of all this operator’s dependent operators, followed by execution of this operator

Parameters:datasource (DataFrameReader) – An instance (or any object with an equivalent interface) of a DataFrameReader
execute(datasource)

Filters the given array with the given mask

Parameters:datasource (DataFrameReader) – An instance (or any object with an equivalent interface) of a DataFrameReader

ForEach

class DEMutilities.postprocessing.operators.operators.ForEach(anset, operator, arrays, *args, **kwargs)

Bases: DEMutilities.postprocessing.operators.operators.OperatorBase

Operator that loops over given data and for each iteration, calls a given function on the entries of the data

Parameters:
  • anset (AnalysisContainer) – The analysis container in which this Operator will be added.
  • operator – Any Python method
  • arrays (list) – A list of arrays (either lists/np.ndarrays or Operators that return arrays) over which the operator will loop and evaluate the function on each argument. The provided arrays should all be of equal length.
  • *args – Arguments that will be passed to the given operator

Key word arguments passed to the base class OperatorBase:

  • name (str): If given, this operator will be added to the AnalysisContainer with this name If not given, the unique Python object id() will be used instead.
ForEach(name, parent, **kwargs)
deep_execute(datasource)

Deep execution of all this operator’s dependent operators, followed by execution of this operator

Parameters:datasource (DataFrameReader) – An instance (or any object with an equivalent interface) of a DataFrameReader
execute(datasource)

Loops over the contained arrays, and evaluates the contained function, passing it the contained arguments

Parameters:datasource (DataFrameReader) – An instance (or any object with an equivalent interface) of a DataFrameReader

FrameIndexPrinter

class DEMutilities.postprocessing.operators.operators.FrameIndexPrinter(anset, max_index=-1, **kwargs)

Bases: DEMutilities.postprocessing.operators.operators.OperatorBase

Operator that prints the frame index of the current ‘loop iteration’ of the analysis’ main loop

Parameters:
  • anset (AnalysisContainer) – The analysis container in which this Operator will be added.
  • max_index (int) – the maximal index of the data which will be looped over
Key word arguments passed to the base class OperatorBase:
  • name (str): If given, this operator will be added to the AnalysisContainer with this name If not given, the unique Python object id() will be used instead.
FrameIndexPrinter(name, parent, **kwargs)
execute(datasource)

Print the DataFrameIndex member from the given datasource object

Parameters:datasource (DataFrameReader) – A DataFrameReader object with a valid DataFrameIndex member

Function

class DEMutilities.postprocessing.operators.operators.Function(anset, operator, *args, **kwargs)

Bases: DEMutilities.postprocessing.operators.operators.OperatorBase

Operator that evaluates any given function and stores the result as a member

Arguments:

Parameters:
  • anset (AnalysisContainer) – The analysis container in which this Operator will be added.
  • operator (func) – Any Python method
  • *args – Arguments that will be passed to the given operator
Key word arguments specific for Function itself:
  • print_first (bool): If True, the function arguments will be first printed before evaluation. This is very useful for debugging if something goes wrong in the operator chain.
Key word arguments passed to the base class OperatorBase:
  • name (str): If given, this operator will be added to the AnalysisContainer with this name If not given, the unique Python object id() will be used instead.
Function(name, parent, **kwargs)
deep_execute(datasource)

Deep execution of all this operator’s dependent operators, followed by execution of this operator

Parameters:datasource (DataFrameReader) – An instance (or any object with an equivalent interface) of a DataFrameReader
execute(datasource)

Evaluates the contained function, passing it the contained arguments

Parameters:datasource (DataFrameReader) – An instance (or any object with an equivalent interface) of a DataFrameReader

GetData

class DEMutilities.postprocessing.operators.operators.GetData(anset, path, **kwargs)

Bases: DEMutilities.postprocessing.operators.operators.OperatorBase

Operator that fetches data from a DataFrameReader object or any equivalent interface

Parameters:
  • anset (AnalysisContainer) – The analysis container in which this Operator will be added.
  • path (str) – Relative path in the data source object that should be accessed. For example, the simulation time from a DataFrameReader object is in the path './time' whereas positions of a particle container called spheres will be located in 'spheres/x'
Key word arguments passed to the base class OperatorBase:
  • name (str): If given, this operator will be added to the AnalysisContainer with this name If not given, the unique Python object id() will be used instead.
GetData(name, parent, **kwargs)
execute(datasource)

Fetches the required data from a given data source object

Note

Any data from datasource that is of type list will be converted to a numpy.ndarray before being stored internally.

Parameters:datasource (DataFrameReader) – An instance (or any object with an equivalent interface) of a DataFrameReader in which the given ‘path’ will be searched .

HasData

class DEMutilities.postprocessing.operators.operators.HasData(anset, path, **kwargs)

Bases: DEMutilities.postprocessing.operators.operators.OperatorBase

Operator that checks whether data exist in a DataFrameReader object or any equivalent interface

Parameters:
  • anset (AnalysisContainer) – The analysis container in which this Operator will be added.
  • path (str) – Relative path in the data source object that should be searched.
Key word arguments passed to the base class OperatorBase:
  • name (str): If given, this operator will be added to the AnalysisContainer with this name If not given, the unique Python object id() will be used instead.
HasData(name, parent, **kwargs)
execute(datasource)

Checks whether the provided string exist in a given data source object

Parameters:datasource (DataFrameReader) – An instance (or any object with an equivalent interface) of a DataFrameReader in which the given ‘path’ will be searched .

OperatorBase

class DEMutilities.postprocessing.operators.operators.OperatorBase(anset, kwargs={})

Bases: object

Base class for operators that fit in an analysis container

Parameters:
  • anset – The analysis container in which the Operator will be added.
  • kwargs (dict) – Dictionary containing key word arguments possibly provided by its derived classes

Note

Parameter kwargs should be provided as a real dictionary and not as unpacked key word arguments.

OperatorBase(name, parent, **kwargs)
deep_execute(datasource)
execute(datasource)
get_debug_info()
reset()

Any derived operator that has a persistent state (e.g. a recorder or an accumulator) must override this method so that it re-initializes its state when it is called. This allows subsequent evaluations of loop() (for example with different data frame parameters) that are independent of each other.

size()

Printer

class DEMutilities.postprocessing.operators.operators.Printer(anset, data, **kwargs)

Bases: DEMutilities.postprocessing.operators.operators.OperatorBase

Operator that prints its member data during a loop. Useful to monitor the progress or for debugging purposes.

Printer(name, parent, **kwargs)
deep_execute(datasource)
execute(datasource)

Recorder

class DEMutilities.postprocessing.operators.operators.Recorder(anset, data, mask=True, grow_method='append', **kwargs)

Bases: DEMutilities.postprocessing.operators.operators.OperatorBase

Operator that internally records by values by calling () on a given data object

Parameters:
  • anset (AnalysisContainer) – The analysis container in which this Operator will be added.
  • data (OperatorBase) – Operator that returns data using the __call__() method which will be called in the execute method of this operator
  • mask (Either an instance of OperatorBase or a boolean) – (optional). Mask which determines whether the current operator should be appending its results for this instantiation of execute.
  • grow_method (str) – (optional). If extend, the recorded result will be one list that is extended each iteration, if append, the recorded result will be a list to which elements will be appended The former typically produces one list of array elements, the latter typically produces a list of lists. If set, results will be added to a set that is returned as a sorted numpy array. Beware that in case of set, the iteration order is not conserved!
Key word arguments passed to the base class OperatorBase:
  • name (str): If given, this operator will be added to the AnalysisContainer with this name If not given, the unique Python object id() will be used instead.
Recorder(name, parent, **kwargs)
deep_execute(datasource)
execute(datasource)

Internally record data return by the ‘data’ member of this class upon calling its __call__ method.

Parameters:datasource (DataFrameReader) – An instance (or any object with an equivalent interface) of a DataFrameReader
grow_append(data)
grow_extend(data)
grow_set(data)
reset()

Resets the recorder instance so that its recorder list is set to [].

SegmentedAnalysisReader

class DEMutilities.postprocessing.operators.operators.SegmentedAnalysisReader(start_idx, end_idx, segment_size, increment=1)

Bases: object

Special purpose data reader class that considers AnalysisContainer as data frames from which data can be accessed and manipulated using standard operators (OperatorBase). SegmentedAnalysisReader itself calls the loop function of each data frame and advances a slice each time it is iterated over. Doing this, it can subdivide a large analysis in chunks/batches/segments of limited and manageable size and concatenate the results later in an outer loop, thereby avoiding large memory consumption of e.g. trajectory analysis of a large number of particles over a large number of steps.

The inner_loop() MUST be first specified with the arguments one would normally give to the inner analysis directly (if not, an error will be thrown).

Parameters:
  • start_idx (int) – Start index of the slices that will be set on the inner reader
  • end_idx (int) – End index of the slices that will be set on the inner reader
  • segment_size (int) – Batch/segment size; the length of each fetched data segment by the inner loop
  • increment (int) – (optional) increment of the slicing.

Example usage:

import DEMutilities.postprocessing.operators.operators as ops
import mpacts.io.datasave as ds

#AnalysisContainer of the inner loop:
a_inner = op.AnalysisContainer()
#Data reader of the inner loop:
dr_inner = ds.DataReader("simulation", folder='./')

#Lets imagine that 'somedata' can be quite big, and 'someanalysis' must record it over many frames
#The data size would get huge in memory if we would just follow the standard procedure:
somedata = op.GetData(a_inner, 'path/to/some/data')
#Note how the name will be used for access later!
someanalysis = op.Recorder(a_inner, op.Function(a_inner, my_custom_function, somedata), name='analysis')

#Instead of calling loop now on a_inner, we make an outer data reader:
a_outer = op.AnalysisContainer()
#Note that one MUST know the size MY_KNOWN_SIZE in advance, since there is no safe way to derive this
#This is the user's responsability. If there is a managing hdf5 file, one can typically extract it
#safely from this. The inner data frames are not a reliable source for this...
batch_size = 1000#batches of a 1000 particles each
dr_outer = SegmentedAnalysisReader( 0, MY_KNOWN_SIZE, batch_size )
dr_outer.inner_loop( a_inner, dr_inner )

mydata = op.GetData( a_outer, 'analysis' )
result = op.Recorder( a_outer, mydata, extend=True )#By extending, we will not even have noticed the difference
a_outer.loop( dr_outer )
print( results())
SegmentedAnalysisReader(name, parent, **kwargs)
advance_slice()

Advances the current slice with a block of length segment_size

inner_loop(anset, reader, **kwargs)

Set the inner loop with an inner AnalysisConainer and an inner DataFrameReader

Parameters:
  • anset (AnalysisConainer) – Inner analysis container
  • reader (Any reader that offers a set_slice method (typically DataReader)) – Inner DataFrameReader
  • kwargs – Key word argument passed to the loop method of anset upon iteration of the outer reader

StandardDeviationEstimator

class DEMutilities.postprocessing.operators.operators.StandardDeviationEstimator(anset, data, mask=True, weight_data=None, frames_ind_obs=1, **kwargs)

Bases: DEMutilities.postprocessing.operators.operators.OperatorBase

Operator that internally keeps a value and adds each time a record by values by calling () on a given data object

Parameters:
  • anset (AnalysisContainer) – The analysis container in which this Operator will be added.
  • data (OperatorBase) – Operator that returns data using the __call__ method which will be called in the execute method of this operator
  • mask (Either an instance of OperatorBase or a boolean) – (optional). Mask which determines whether the current operator should be appending its results for this instantiation of execute.
  • weights – (optional). Another data set that will be used to weigh the evaluated data. The weigh formulas can be found at https://en.wikipedia.org/wiki/Weighted_arithmetic_mean (Weighted sample covariance - Reliability weights).
Key word arguments passed to the base class OperatorBase:
  • name (str): If given, this operator will be added to the AnalysisContainer with this name If not given, the unique Python object id() will be used instead.

frames_ind_obs = frames till next independent observation

StandardDeviationEstimator(name, parent, **kwargs)
deep_execute(datasource)
execute(datasource)

Internally add data from the ‘data’ member of this class upon calling its __call__ method.

Parameters:datasource (DataFrameReader) – An instance (or any object with an equivalent interface) of a DataFrameReader
reset()

Resets the standarddeviationestimator so that all its internal values are set to 0.