About the Project Sphynx
I created Sphynx to implement all the
principles exposed in my book, An Introduction to Modern Data Architecture, on
sale here on the blog. It is a fully-featured data catalog, capable of
ingesting and classifying physical data from databases, plus applications accessing
the data sources are written on any platform.
Its main purpose is to provide a complete
information inventory for companies of any size using a simplified repository
model. Only relevant aspects of a database are managed here so that users of
all levels can access the catalog data and access company data faster than
using a manual approach.
Although Sphynx will not be open source, it
will be sold at a minimum price to interested companies, using a subscription
model, where the users will pay for an on-premise product a monthly fee to
receive support and updates
What is Sphynx?
Sphynx is the codename for a generic data
catalog consisting of only four components. These components are extremely
light in code and provide a blazing fast response to user requests. There are
no server components, except for the database management tool, based on a
standard DBMS, and I chose to use SQL Server. Future releases will support practically
any database in the market.
Sphynx Architecture
Sphynx as said is composed of a management client,
plus a repository to store physical and logical objects, plus an API to allow
external integration with the repository. It also features utilities to
complement the product architecture, all accessible from the client.
![]() |
Sphynx Core components |
DBMS Connectivity
Sphynx can connect with any database using
standard SQL, for each of the listed platforms. For each considered database, a
series of SQL commands is executed to extract the physical data objects. The
current version of Sphynx under development supports Teradata, Oracle, SQL Server,
Postgres, and Greenplum.
What Objects will Sphynx manage?
Sphynx will manage a simple set of data
objects to provide maximum usage for end-users and IT teams, considering only
the relevant data objects. The objects considered in this version are:
- ·
Data Sources – Database Servers
containing one or multiple databases;
- ·
Databases –
Individual data object sets;
- ·
Tables – Database tables,
regardless of the DBMS;
- ·
Columns – Data attributes of a
table.
And also, some extended objects, such as:
- ·
KPI – A specialization of a
column for data aggregations;
- ·
Application – Any application
for reporting and/or analysis accessing the data sources in the catalog;
- ·
Business dimensions –
classification criteria are defined by the users to manage their data objects;
users define category types and add values to the dimensions, and those can be
used to classify objects.
- ·
Glossary – a generic glossary
that can be used to classify data objects, in addition to the Business Objects
Sphynx conceptual model
Sphynx Applications
Sphynx can be used for a variety of purposes
and among them we can mention:
- ·
Enterprise data assets
management – map and classify all data assets in a company;
- ·
Application management – manage
all data-based applications used to get or write data to the data sources;
- ·
Enterprise Governance – control
the structure and updates to existing data structures and validate them against
standards (data domains)
- ·
Data Set Generation – Create new
data sets for AI and data science applications;
- ·
Master data management – compare
enterprise data with master data standards to ensure consistency across
multiple data sources.
- · Data Cleaning and Quality
controls – an extension to enterprise governance that allows identifying low-quality
data in the data sources and proposes corrections.
Sphynx will feature an API and some
services to allow access to a repository using functions, query data, import
from a data source, import and export definitions to and from the catalog. The
API will be available in version 2.
Current Project Status
The repository data model is ready and 40%
of the full code is completed. Now the second phase starts where the UI is
being created
Platforms
Sphynx will be available initially in Windows
10, and future versions will be available for Macs and Linux.
Requirements
Sphynx 1.0 requires the client a minimum of
8Gb of RAM and the database must be SQL Server in any version after 2012. From
version 1.1 Sphynx will also be offered for using Oracle databases as their
repository.
Product Roadmap
The first reach the market in November and
there will be monthly updates until June 2021, where release 2 will be launched. After version
2, there will be annual releases based on user feedback. Each subscriber
company will receive all releases during the subscription period at no extra
costs.
Competition
Sphynx will enter an arena with bigger companies
and their products such as Collibra, Dell Boomi, Informatica, and others; but
due to its low cost and immediate usage (right after installation), it is the perfect
entry-level package for companies interested in an effective data catalog with
practical applications in the market for companies of any size.
Contact
Please contact me for further information
on Sphynx and how to subscribe, or receive a 30-day evaluation. You can also
check the updates on the blog.
Marco Aurelio
Ribeiro
+55 51 983250251
https://truedatarch.blogspot.com/


Comments
Post a Comment