Data Stewardship

Actively protect and care for the data stored and accessed by a system, at least as much as the functional behaviour of that system.

Rationale

Information is a fundamental asset to the business, critical to its success. It may be re-used or adapted to add significant value in new business contexts, well beyond its original intent, outliving the system that first created it. Inadequate care for data concerns can result in misuse, insecurity and unintentional coupling between systems, generating widespread complexity and flakiness, as well as significant financial and reputational risk to the business. The resulting costs and constraints can be extremely painful and expensive to resolve.

Implications

  • Each bounded context must manage the data it uses, acting as a responsible owner, even if it is not necessarily the “master” of that data.
  • Teams must understand and address any data privacy concerns related to their data, including policy and legal compliance. For example: classification & retention policies, GDPR, PCI-DSS.
  • Where others need to consume an application’s data, it should expose it via an interface with an explicit contract.
  • Consumers of data exposed by an application should be able to easily find metadata about its meaning and quality e.g. definitions, accuracy, freshness, time-to-live as well as operational service levels e.g. RTO, RPO, support hours. It should be clear who they should contact to ask questions about it.
  • Where there is a potential choice of data sources, there is a risk of accidental misuse, complexity and flakiness. Consumers should first model the business domain to identify the bounded context they want information from, then aim to get as close as possible to the commonly agreed source of truth for that context. This will usually be the origin or trusted “golden record” that particular business domain relies on operationally.
  • Consumers and providers may both need to balance availability and accessibility of the data with other non-functional considerations such as performance, quality, consistency and freshness. These factors are typically in tension and need to be traded off.
  • Data can be transformed or translated as it is shared. Applications should represent their copy of data using schemas designed for the context of their business domain and use cases.
  • When making significant changes that could be breaking for consumers, including changes in meaning, make use of versioned interfaces.