Skip to content

Data

The goal of creating a platform general enough to be used for multiple purposes in digitization projects that grow to a yet unknown complexity involves some investigation on data as well as on functionality.

The platform must provide a general data representation and guidelines to avoid dependencies wherever possible. It is not the goal to create a storage system for enterprise data, but for system integration and automation there has to be at least the capability to enhance existing data if required or store temporary data to define and manage a system state.

Existing database systems are available but a close coupling with one of them may (likely) result in a situation that renders the platform useless regarding mission critical integration if a database is abandoned by the vendor or the community.

A way to work around this risk is to hide the database system behind an API and make it exchangeable at any time. Some of the challenges for this approach are:

  • If any vendor specific modeling language has to be avoided, which standard is available and can be used to define data structures?
  • How to define a loose coupling with a database without even knowing which data is about to be stored?
  • Modeling data especially for relational database systems requires experienced experts. How can data modeling be simplified to a point where a trained person without IT background can do the job?
  • How to avoid technical debt in data (dependencies, data trash, etc.)?

Relational and Hierarchical Data

The two major ways to model data are the hierarchical and the relational model. Hierarchical data is organized in a tree like structure with each node representing a specific record type. The structure is easy to understand highly predictable but has limitations in querying data [1].

The relational model stores data in tables as an infinite list of records. Records in tables can be connected with keys and allows fast, index based queries.