If you just take a look at most companies requirements when hiring a Data Architect, in special for enterprise-wide positions, you will notice that most of them have a clear hidden agenda - set up their enterprise data lake, capable of self-service and having the ability to support all kinds of users.
Many people may say this is a utopia, something impossible to achieve. I say the opposite - this is exactly what needs to be done but, using a different approach. Simpler, more effective less theoretical, and more hands-on.
Adopt a practical approach with some method, some clear rules and the impossible can be achieved.
Imagine the following situation:
You as an architect have to create the enterprise data lake to support self-service in your company. This company is a typical producer of goods and also provides post-sales support for all its products. The company has around 100 systems using multiple platforms, like ERPs, Clouds, Standalone applications, and legacy systems as well. Master data is mostly duplicated, has little integration across environments and data quality is restricted. You don´t need to worry about hardware and software as a substantial budget has been allocated to this task, but due to the diversity, low quality, and data integration, you have a daunting task ahead.
What would be your solution to this nightmare? And if you think it is a rare scenario, think again. Most companies in the world have embraced something similar, so it is a commonplace.
One possible solution:
Since you still have the lake you can:
A) Model the business and derive the core business entities; customers, products, services, finance records; by having an enterprise data model you can consider loading all systems data to this model;
B) Map the data sources and converge them to the enterprise model, making sure that all data is mapped to the core entities to avoid duplication and inconsistency;
C) Integration can occur at the Lake, as long as you map the interconnections inside it.
D) the lake can have as much as three information layers and become the only hub of the company - raw or transactional data, business data model, and dimensional data;
E) Track everything in a data catalog.
F) Provide a self-service interface based on the catalog;
G) Provide an API to access the Lake services;
H) Provide a security layer based on role-based services and encryption, with all standard and recent security protocols in place;
I)Allow access to external applications using the protocols and the API;
J) Data can come in any format as long as it conforms to the enterprise data model;
And remember, once the lake becomes active, it must be considered the place of the company where all users must go for their data and the external applications become the source providers, only since they handle the operational data part.
So, if you consider the Data Lake as a complete ecosystem, then a possible representation of this environment can be:
The enterprise data architect is responsible for implementing the data lake and its components, provide all models for it, and define the project directives for the solution teams to follow.
Regardless of the technological solution, the structure is the same for any platform chosen, which means that solution designers and platform teams will work together, but under the directions of the enterprise data and the enterprise solution architects.
In the vast majority of companies, such a project may last up to one year, regardless of the volumes or complexity. If the project follows the above suggestions the project can be implemented the same way in any kind of company.
What was described is a standard method with proven results, and it can be used anywhere providing results at least three times faster than the long traditional approaches.
A Note on Silos:
As a personal consideration, we can use the silos to our benefit, if we turn them into specialization centers, where each area of the company, just like the enterprise model, becomes a silo of its own, with improved warranties for their internal data assets. Use the silos to your benefit and collect fast results. As a sanitation measure make sure the silos run on their own with 100% reliability on their data.
If you do so, then the next steps are moving to the lake, integrating and converging to the enterprise business model.

Comments
Post a Comment