Originally posted on LinkedIn here.
I work for a company providing SaaS applications in the healthcare and higher education space. Not only do we have multiple product lines, each with variations (loosely overlapping, perhaps, but various when looked at generally) in their data structures, but we have thousands of members providing self-structured content that also does not conform to a universal, standard submission structure.
I have been leading a project out of our Higher Ed space for the last couple of years to try to establish a unifying infrastructure that directly attacks this diversity through establishing a meta data repository describing the content and its various relationships to our various product lines. Not to get too deep into details, our solution uses the meta data to auto-generate on-the-fly transformation programs for delivering member data to each of their subscriptions.
Our meta data model assumes diversity of domains, not just in sources but also in targets, and provides a basic representation of the equivalences across the different domains. It provides as a first-order concept this notion of domain, permitting us to track aliases across domains, and to capture formula that establish construction logic to force consistency (this is how we add logic to our auto-generated code).
When we started, we performed extensive searches for existing software frameworks that could be applied to our use case. While we found many potential products, we never found one that actually addressed our conception of heterogeneous, multitudinous domains. What we found were many many products and frameworks that took the stance that there really would be only one “canonical” domain at the top of all else.
Our concern with these tools was that it would force our product lines to have to prematurely establish a single, overarching model of the business space. We felt this would be too constraining to our ability to develop new features and products in an aggressive and agile manner.
Hence, we undertook development of our own multi-domain, meta data repository, and the associated, basic transformation generation capability already mentioned. This has been slowly building in capabilities, and we are now supporting multiple product lines.
What is most interesting to me about the approach, however, is how it is beginning to show the potential power of a bottom-up approach to defining that overarching, “canonical” model. We can already see how, using such techniques as automated transitive closure to “clone” meta data across domains, a larger model of the Higher Ed space is emerging.
I’ve been thinking about terminology a lot, and I used to love to coin a phrase. But I’m now of a mind that permitting multiple, slightly overlapping names for slightly overlapping concepts can be a good thing. So, getting to the actual point of this discussion, I thought I’d ask for comments around some of the variations of names I’ve been working through to describe this technique/phenomenon. Let me know what you all think of these. What do they connote to you?
Emergent Data Management, or Emergent MDM: since the master model emerges from or is made visible through the amalgamation of hundreds of variations
Master-less Data Management – kind of a play on words, really, reflecting that the whole is, at core, not curated purposefully, but still implying that there is control and regularity to the endeavor
I thought about trying to incorporate the jaunty, self-contradictory nuance around the name “NoSql”, but I thought “No Data Management” was too easily mistaken as implying no actual management.
(I once posted about this before on another blog, and had come up with several other names,)
I’d also be interested in seeing hints/suggestions for other emerging technologies, services, frameworks or packages that might support the approach I’ve just described.