Integrating Multiple Analytic Modules Around the Operational Intelligence Platform (OIP)

Confidential draft – Do not distribute

Integrating multiple analytic modules around the Operational Intelligence Platform (OIP)

The case of water distribution

Claude Le Pape, Alfredo Samperio, Gratien Bonvin

Draft, June 2, 2014

The investigation of analytic problems related to water distribution as part of the Arrowhead project and the examination of an exploratory use case presented by a customer of Schneider Electric strongly suggest the need to integrate multiple “analytic” modules around a common data basis.

In parallel, the ongoing development of the OI Platform (to manage in some unique manner types of data often encountered in Schneider Electric, starting with time series, and gather analytic services) suggests that such a common data basis shall be compatible or interfaced with the OI Platform.

In this document, we propose a relational data model inspired by (i) the EPANET water distribution standard, (ii) the OI platform, and (iii) other elements from the Arrowhead project which we believe could be generalized and linked once and for all to the OI platform. We also describe various analytic problems which we believe could be addressed from this basis, using specific software already available within Schneider Electric (e.g., hydraulic simulation and optimization), more generic software under development within Schneider Electric (e.g., for demand prediction), or software that could be available from external partners (e.g., from Artelys for planning and demand-response management). Part of the interest of facilitating the integration of analytic components around a common basis consists in enabling an easier evaluation of external components, in comparison or in complement to our current offer.

Let us note that we focus here on offline analytics, i.e., analytics aimed at planning actions in advance. These shall be complemented with real-time control analytics (e.g., from SpecificEnergy for real-time pump selection), which we will not consider in the present document.

In its current version, this document is clearly intended as a draft aimed at triggering the discussion, in order to decide how to go further. Comments and suggestions for improvement are the most welcome.

1.Proposed Data Model

We describe a proposed data model in relational form. As much as possible, we have tried to stick to the concepts of the EPANET model. We have allowed ourselves to deviate from EPANET whenever there was a clear advantage in doing it, either to better represent elements of the exploratory use case alluded to above, or to adopt concepts used in the OI Platform and the Arrowhead project.[1]

1.1.Network Nodes: Junctions and Tanks

A network in EPANET is described in terms of Nodesand Links.

We focus on two types of Nodes, i.e., Junctions and Tanks.

For both types, the coordinates of the node are generally useful, for analytic calculations and/or to enable some geometrical display of the network. The coordinates relate to an arbitrary origin and are expressed in meters. At this point, to keep things simple, we ignore the impact of the height of the water in atank and use the altitude of the tank as the only relevant height parameter.

In both cases, we have allowed a DemandPattern to be attached to the node.[2] The DemandPattern describes either a history or a prediction of how much water leaves the node over time, in general to serve end customers. Predicting the DemandPattern of a given node is one of the main analytic functions we consider.

For optimization concerns, it might also be useful to associate to a source node a function describing the cost of a cubic meter of water at this node. This will be done through the association of a WaterTariff object to the node. When no water can be produced at a given node (or injected from a non-described node) the WaterTariff attribute is null.

Contrarily to junctions, tanks are places in which water can be stored. The minimal data needed to manage this storage include the maximal volume of water storable in the tank, the minimal volume that shall remain in the tank at all times (by default, 0), and the current (initial) volume.

In the relational model we propose, the JUNCTION table includes the following columns:

A JUNCTION_IDwhich unambiguously identifies the junction.
The X_COORDINATE of the junction.
The Y_COORDINATE of the junction.
The Z_COORDINATE (altitude) of the junction.
An optionalDEMAND_PATTERN_IDidentifying a demand pattern for the junction. When no demand pattern is provided (DEMAND_PATTERN_ID = null), it is assumed that the junction is merely an intermediate node in the network, from which water is forwarded to other nodes.
An optionalWATER_TARIFF_IDidentifying a cost function for the water at the junction if the junction is a source.

The TANK table includes the following columns:

A TANK_IDwhich unambiguously identifies the tank.
The X_COORDINATE of the tank.
The Y_COORDINATE of the tank.
The Z_COORDINATE (altitude) of the tank.
The minimal volume VOLUME_MIN to be kept at all times in the tank.
The maximal volume VOLUME_MAX that can be kept in the tank.
The INITIAL_VOLUME at the beginning of the overall time period under consideration.
An optionalDEMAND_PATTERN_IDidentifying a demand pattern for the tank.

1.2.Network Links: Pipes, Pumps, Pumping Stations, Valves

Nodes are connected by several types of Links. At this stage, we assume the network is oriented; hence each Link has a START_NODE and an END_NODE.

A Pipe is a passive Link between its START_NODEand itsEND_NODE. Energy losses occur in a Pipe depending on various parameters including its LENGTHand DIAMETER, as well as on the MATERIAL constituting the Pipe.

The PIPE table contains the following columns:

A PIPE_ID which unambiguously identifies the pipe.
A START_NODE_IDwhere multiple links might join before the pipe.
An END_NODE_IDfrom where multiple links might branch after the pipe.
A MATERIAL_ID identifying the material constituting the pipe.
The LENGTH of the pipe.
The DIAMETER of the pipe.[3]

A Pump is an activeLink in which a motor provides electrical power, which is transformed in mechanical power used to pump water that will flow from the START_NODE to the END_NODE. The most important characteristics of a Pump describe the relations between the electrical power, the mechanical power, and the flow. In addition, a Pump can be controlled by a VARIABLE_SPEED drive.[4]

The PUMP table contains the following columns:

A PUMP_ID which unambiguously identifies the pump.
A START_NODE_ID where multiple links might join before the pump.
An END_NODE_ID from where multiple links might branch after the pump.
A Boolean VARIABLE_SPEED indicating whether the pump is controllable or not.
Two limits FLOW_MIN and FLOW_MAX providing the minimal and maximal flow recommended by the pump manufacturer to maintain the health of the pump.
The minimal power POWER_MIN of the pump and the POWER_SLOPE describing the dependency between the mechanical power used by the pump and the water flow enabled by it (when the pump is not controlled by a drive). In practice, these can be determined from the minimal and maximal flows recommended by the pump manufacturer and two curves: the FLOW_TO_HEAD curve providing the relation between the flow enabled by the pump and the pressure (expressed in meters) and the FLOW_TO_EFFICIENCY curve characteristic of the pump.
A MOTOR_EFFICIENCY factor between 0.0 and 1.0.

A PumpingStation consists of one or several pumps in parallel, i.e., with the same START_NODE and END_NODE. The characteristics of a PumpingStation can be inferred from the characteristics of the individual pumps. Hence, the PUMPING_STATION table is optional. When it is provided, it includes:

A PUMPING_STATION_ID which unambiguously identifies the pumping station.
A START_NODE_ID where multiple links might join before the pumping station.
An END_NODE_ID from where multiple links might branch after the pumping station.

A Valve is aLink in which water flow can be limited. At this stage, we associate no specific parameter with a Valve and assume the Valve can be used to set any upper limit on the flow.

The VALVE table contains the following columns:

A VALVE_ID which unambiguously identifies the valve.
A START_NODE where multiple links might join before the valve.
An END_NODE from where multiple links might branch after the valve.[5]

1.3.Materials and ageing models

At this stage, the MATERIAL table contains the following columns:

A MATERIAL_ID which unambiguously identifies the material.
Its DARCY_FRICTION_FACTOR used in classical models for estimating energy losses in a pipe.

According to the Darcy–Weisbach equation, the pressure loss in a Pipe can be written as follows:

fD * (L/D) * (V2/2)

Where:

L is the LENGTH of the Pipe
D is the DIAMETER of the Pipe
V is the velocity of the water flow in the Pipe (in m/s), which can also be written as Q/S where Q is the flow of water in the pipe (in m3/s) and S =  (D/2)2 the section of the pipe (in m2).
 is the density of the water in kg/m3 hence 1000.
fD isthedimensionlessDARCY_FRICTION_FACTOR.In reality, this factor depends on the relative roughness of the pipe and on the speed of water in the pipe. In first approximation, however, this can be supposed constant and associated to the pipe material.

We expect that Materials will also be used to describe ageing models. At this point, this is still to be explored.

1.4.Demand Patterns

A DemandPattern is a time series defining an expected output flow (to final customers or to another non-represented portion of the network) from a given node. When appropriate, the flow can be defined to be periodic over a given time period and renewed from one year to the other, possibly according to a given ANNUAL_RENEWAL_FACTOR.

In the relational model we propose, theWaterDemands table can be used to specify water demand patterns. It includes the following columns:

A WATER_DEMAND_TIME_SERIES_IDwhich unambiguously identifies the time series.
A START_TIME.
An END_TIME.
The FLOW between the given START_TIME and the given END_TIME.
An optional PERIODICITY (e.g., “NONE”, “DAY”, “WEEKDAY”, “WEEKEND”) indicating that the given demand element repeats itself periodically. When this column is not used, it is assumed that there is no periodic repetition of the demand.
An optional PERIOD_START_TIME and an optional PERIOD_END_TIME limiting the extent over which the periodical repetition applies.
An optional ANNUAL_RENEWAL Boolean (0 or 1) indicating whether the given demand element repeats itself every year. When this column is not used, it is assumed that there is no annual repetition.
An optional ANNUAL_RENEWAL_FACTOR indicating that the given demand element repeats itself every year, multiplied by the given factor.

1.5.Tariffs

Tariff descriptions can be used both for water costs and electricity costs. A Tariff is described as a time series of curves, enabling the cost to vary with the flow of water or the electrical power that is used. The TARIFFS table includes the following columns:[6]

A TARIFF_TIME_SERIES_IDwhich unambiguously identifies the time series.
A START_TIME.
An END_TIME.
Six columns describing the curve that applies from the given START_TIME to the given END_TIME: CAPACITY_MIN, CAPACITY_MAX, COST_MIN, COST_MAX, FIXED_COST, and VARIABLE_COST.
When the power or flow equals CAPACITY_MIN, the cost for being at this power or flow level for one unit of time is COST_MIN.
As soon as CAPACITY_MIN is exceeded, i.e., becomes CAPACITY_MIN, a penalty corresponding to the given FIXED_COST is paid. FIXED_COST is often equal to 0. The corresponding column is optional.
Between CAPACITY_MIN and CAPACITY_MAX, the cost grows from (COST_MIN+FIXED_COST)to COST_MAXas a quadratic function of the power or flow with the given VARIABLE_COSTas initial slope. In usual cases, COST_MAX – (COST_MIN+FIXED_COST) = VARIABLE_COST* (CAPACITY_MAX–CAPACITY_MIN) and the cost grows linearly with the capacity.
When the power or flow equals CAPACITY_MAX, the cost for being at this power or flow level for one unit of time is COST_MAX.
An optional PERIODICITY (e.g., “NONE”, “DAY”, “WEEKDAY”, “WEEKEND”) indicating that the given tariff element repeats itself periodically. When this column is not used, it is assumed that there is no periodic repetition of the tariff.
An optional PERIOD_START_TIME and an optional PERIOD_END_TIME limiting the extent over which the periodical repetition applies.
An optional ANNUAL_RENEWAL Boolean (0 or 1) indicating whether the given tariff element repeats itself every year. When this column is not used, it is assumed that there is no annual repetition.
An optional ANNUAL_RENEWAL_FACTOR indicating that the given tariff element repeats itself every year, with all costs multiplied by the given factor.

2.Analytic Modules

This section presents three analytical components considered at this point.

2.1.Demand Prediction

The demand prediction component aims at extending a given water demand pattern in the future. A prediction model linking demand with other variables (e.g., weather conditions) is first learned. Then the model is used to extend a given demand pattern for a given period of time.

A more precise specification of such an analytic component will be provided in another document in preparation.

2.2.Pumping Plan Optimization / Planning for Demand Response

Multiple options for the optimization of pumping plans and demand response could be considered. In this section, we will attempt to describe an approximate “simple” model which would make sense in the exploratory use case we are aware of. An open question is whether the approximations we make are reasonable. In particular, we ignore all transient factors. We do as if we can use a steady-state approach over a given number of individual time periods.

Given are H time periods PERIOD1 PERIOD2 … PERIODH

With start time stt and end time ett (1 ≤ t ≤ H).
With electricity cost (tariff) over the period. To ease the following description, we will in this section restrict ourselves to linear tariffs and assume that for each period t, a cost per kWh ct is given.

Given are N water towers (tanks) TOWER1 TOWER2 … TOWERN

With minimal and maximal volumes vmini and vmaxi(1 ≤ i ≤ N)
The minimum is supposed to be given. However, it would be interesting to study how the energy cost and the non-delivery risks vary with this minimum.
With a (predicted) water consumption profile PF1 PF2 … PFN
PFi is a deterministic function specifying a consumption ci,t for all t in {1 … H}
Later we may want to play with a probabilistic function and introduce a notion of robustness of the plan with respect to variability of the demand. We ignore such a potential extension for the moment.

Before each water tower TOWERi there is a valve VALVEi enabling to limit the flow and a pipe PIPEi. The goal is to define at each time t in {1 … H} the flow Fi,t between the pumping station and the water tower TOWERi in a way that guarantees that the demand will be satisfied (in the deterministic version) and that minimizes cost.

The volume Vi,tin the water tower TOWERi at the end of period PERIODt is obtained as follows:

Vi,t=Vi,t-1 + Fi,t * (ett– stt) – ci,t
We impose vmini ≤ Vi,t ≤ vmaxi
Vi,0 is the initial volume at the beginning of the first period. This value is given.

The discharge pressure PRtat the end node of the pumping station that is needed during period t depends on the flows Fi,t as follows:

PRt ≥ FORMULA(Fi,t)
We want to vary the formula, using more or less precise models with influence on three factors: (i) the amount of data needed, and hence the cost of the solution implementation; (ii) the computation time; (iii) the precision of the results. The key point is that if approximate models lead to pumping schedules which are close to the pumping schedules that would be obtained with more precise models, then the approximate models are acceptable.
Several elements shall be considered.
Precise physical models are likely to need a lot of data on pipe characteristics: can we avoid this need?
Can dynamics be ignored, without getting a too bad approximation?
Theoretically, the needed pressure also depends on the altitudes of the water towers for which the valve is open: can we ignore this?
An interesting option would consist in building a data-driven model (e.g., we build from past data a table enabling to approximate the actual function) rather than using a physical model
If the pressure is never much higher than the minimal hydrostatic pressure needed, an option might be to do as if the pressure can be constant or a simple linear or piecewise linear function of the total flow.

When the pumping station is directly linked to each water tower, one specific model we may use is the following:

PRt ≥ ghi + dffi * (Li/Di) * (Vi,t2/2)

for each i where

 is the density of the water in kg/m3 hence 1000.
g is the gravitational acceleration (9.81 m/s2).
hi is the difference of altitude between the water tower TOWERi and the pumping station.
dffi isthe DARCY_FRICTION_FACTOR of the material of the pipe PIPEi
Li is the LENGTH of PIPEi
Di is the DIAMETER of the PIPEi
Vi,t is the velocity of the water flow in the pipe, i.e., Vi,t=Fi,t/  (Di/2)2.

When there are intermediate pipes and junctions, the same formula has to be used iteratively from the tanks to compute the discharge pressure at each junction. At each junction, the application of the inequality for each outgoing pipe guarantees that the most constraining branch is taken into account.

If the pumps had no loss, the power POWERt needed over time period t would bePRt * iFi,t.[7]Taking into account the efficiency of pumps brings an additional difficulty. In practice, each pump PUMPj is contributing a flow Qj,t with iFi,t = jQj,t. When there is no drive, the mechanical power deployed by each pump PUMPj is roughly in the form:

POWER_MINj + POWER_SLOPEj * Qj,t

Taking into account the efficiency of the motor leads to:

POWERt = j(1 / MOTOR_EFFICIENCYj) * (POWER_MINj + POWER_SLOPEj * Qj,t)

The total energy cost to minimize is equal to t POWERt* (ett - stt) * ct

Once a pumping plan is obtained, studying the opportunities of demand-response could be done in multiple ways, e.g., by varying the electricity tariff or using the framework previously developed by Schneider Electric and Artelys.

2.3.Network Simulation

At this point, we do hope (but this needs to be checked) that a network description in the proposed relational model can be used as an input to perform simulations using the hydraulic tools available in Schneider Electric. This would enable us to link these tools with the OI Platform and hence with other analytic tools developed on top of the platform (e.g., demand prediction).

A more precise specification of such a link needs to be written in the future.

[1] As we are not specialists of EPANET, we may have missed important elements or opportunities to remain closer to the EPANET model, while allowing easy integration with the OI Platform. For this first version, we have also tried to keep things simple, making simplifying assumptions which could be criticized.As already mentioned, the contents of this document are proposed as a starting point, open for debate.

[2] In our understanding, EPANET enables the specification of demands only for junctions and not for tanks. Strictly speaking this would be sufficient, as a“tank” might simply be linked to a “junction”. However, we feel it could be appropriate to simplify the network description and allow a tank to be considered as a consuming node of the network.