A Flow is a model of active elements called Bricks with one or many data ports connected by links. Data is flowing through the links from one Brick to another while being transformed, stored or used to trigger and/or control activities. A flow is created by connecting prefabricated Bricks with links.
This document describes the flow elements and their relation to each other.
Flow by example
To describe Flows and their elements, examples are very useful. The goal is to create a common understanding of Flows in system integration.
Flows are handling data sets called Flow Packages (FP) and establish data flow by this.
Consider the following example of a flow: In the graphical industry using data flows is a common practice. FPs are usually passed as files between applications. Files are passed from one folder to another accompanied by additional data sets that hold the meta data to the original file or FP. These are commonly called: "job tickets", are moved from one system to another and enriched with additional data from each stage it is passing through. Data sets are copied to different places in the system and pass multiple different software tools to be prepared for production. Besides relatively big production files (usually about 5 to 20 MB), other files like tracking files are generated and passed within the network. Tracking files are slow and usually contain only a few records with some identification of a job and the action performed on it. The third type of file I want ot mention here is a plan. Plans are moved between systems as well and consumed by on or more different stations. Plans are used to synchronize different systems that work on the same set of planning data without the need to create the data twice.
This can be easily verified. The graphical industry created some standards like JDF or Prime to build more reliable systems and simplify the exchange of meaningful meta data and plans. They where partially successful with this approach. The standards are complex and leave room for interpretation. The result is a variety of different dialects of a single standard.
The flexibility of integrated systems using data flow benefits from loose coupling of interfaces. Systems based on data flow can be easily extended and scaled even beyond the borders of enterprises. But there are significant tradeoffs with moving files in complex systems:
- It is slow
Moving big files takes some time. Picking files up in a folder by a process takes some time no matter if the file is big or not. Usually a mechanism called File Stable Time is used to make sure that the file is ready.
- File system access rights are difficult to manage
In big companies IT departments manage the access rights in a Domain. Conflicting interests like security guidelines can create friction between IT and production departments. It can be observed that friction between departments responsible for the same system can result in less reliable systems.
- The passing of data between different stations is a blind spot
A data set is written into a folder and hopefully picked up by another station. If this does not happen for any reason and even if IT departments monitor these folders it may take some time until the malfunction is obvious and can be solved. In many cases there is no feedback to the sending system.
- No system internal data standard
A data set entering a flow is probably processed correctly by the first station, but still may cause trouble later in the flow. Even if existing standards define the syntax of a specific data format, the semantic depends on each each station. In many cases not even the file names are standardized.
The previous described practices can be taken as an example of flows in industrial systems used for many years. To create a reactive flow that responds fast and reliable the file approach is not the solution of choice.
Fast Data Passing
Using small data sets passed by pushing them between nodes in the network is significantly faster than using the file transfer. The speed is only limited by network latencies that can be minimized on protocol level.
Small Data Sets
Data sets passed in the flow have to be small (<1MB>). Data has to be minimized. This goal is best reached by avoiding redundant data. For example using human readable text in data sets to describe data fields can be avoided. Compressing data before sending on the network can be a solution to create small data sets but creates a computing overhead at sender and receiver.
Concurrent processing is required to scale flow processing for high system loads. The processing of data should never be the bottleneck for productivity instead of tell users to accept unavoidable waiting times.
"Make the software able to carry out its appointed task so rapidly that it responds instantaneously, as far as the user is concerned."; Michael Abrash; Graphics Programming Black Book; Game programmer; Oculus Rift Chief Scientist
Flow processing requires data like any other like any other type of computing tasks. To allow optimization and avoid excessive data checks, an internal data standard is required. The internal standard has to be flexible enough to allow intuitive modeling of real world data.
Handling Large Files
Flows have to handle large files in real world applications. There has to be a way to handle this kind of data. If a file has to be processed by a flow, a reference to the file location can be passed with the data set. The Brick that requires access to the file itself can download and process it. The first priority if possible should be to process a file without moving it through the network. If this is not possible the moving should at least be minimized.