Introducing the Sphynx Project

 

About the Project Sphynx

I created Sphynx to implement all the principles exposed in my book, An Introduction to Modern Data Architecture, on sale here on the blog. It is a fully-featured data catalog, capable of ingesting and classifying physical data from databases, plus applications accessing the data sources are written on any platform.

Its main purpose is to provide a complete information inventory for companies of any size using a simplified repository model. Only relevant aspects of a database are managed here so that users of all levels can access the catalog data and access company data faster than using a manual approach.

Although Sphynx will not be open source, it will be sold at a minimum price to interested companies, using a subscription model, where the users will pay for an on-premise product a monthly fee to receive support and updates

What is Sphynx?

Sphynx is the codename for a generic data catalog consisting of only four components. These components are extremely light in code and provide a blazing fast response to user requests. There are no server components, except for the database management tool, based on a standard DBMS, and I chose to use SQL Server. Future releases will support practically any database in the market.

Sphynx Architecture

Sphynx as said is composed of a management client, plus a repository to store physical and logical objects, plus an API to allow external integration with the repository. It also features utilities to complement the product architecture, all accessible from the client.

 


Sphynx Core components

            


DBMS Connectivity

Sphynx can connect with any database using standard SQL, for each of the listed platforms. For each considered database, a series of SQL commands is executed to extract the physical data objects. The current version of Sphynx under development supports Teradata, Oracle, SQL Server, Postgres, and Greenplum.

What Objects will Sphynx manage?

Sphynx will manage a simple set of data objects to provide maximum usage for end-users and IT teams, considering only the relevant data objects. The objects considered in this version are:

  • ·         Data Sources – Database Servers containing one or multiple databases;
  • ·         Databases – Individual data object sets;
  • ·         Tables – Database tables, regardless of the DBMS;
  • ·         Columns – Data attributes of a table.

And also, some extended objects, such as:

  • ·         KPI – A specialization of a column for data aggregations;
  • ·         Application – Any application for reporting and/or analysis accessing the data sources in the catalog;
  • ·         Business dimensions – classification criteria are defined by the users to manage their data objects; users define category types and add values to the dimensions, and those can be used to classify objects.
  • ·         Glossary – a generic glossary that can be used to classify data objects, in addition to the Business Objects


Sphynx conceptual model

Sphynx Applications

Sphynx can be used for a variety of purposes and among them we can mention:

  • ·         Enterprise data assets management – map and classify all data assets in a company;
  • ·         Application management – manage all data-based applications used to get or write data to the data sources;
  • ·         Enterprise Governance – control the structure and updates to existing data structures and validate them against standards (data domains)
  • ·         Data Set Generation – Create new data sets for AI and data science applications;
  • ·         Master data management – compare enterprise data with master data standards to ensure consistency across multiple data sources.
  • ·        Data Cleaning and Quality controls – an extension to enterprise governance that allows identifying low-quality data in the data sources and proposes corrections.

 APIs

Sphynx will feature an API and some services to allow access to a repository using functions, query data, import from a data source, import and export definitions to and from the catalog. The API will be available in version 2.

 

Current Project Status

The repository data model is ready and 40% of the full code is completed. Now the second phase starts where the UI is being created

   

Platforms

Sphynx will be available initially in Windows 10, and future versions will be available for Macs and Linux.

 

Requirements

Sphynx 1.0 requires the client a minimum of 8Gb of RAM and the database must be SQL Server in any version after 2012. From version 1.1 Sphynx will also be offered for using Oracle databases as their repository.

 

Product Roadmap

The first reach the market in November and there will be monthly updates until June 2021,  where release 2 will be launched. After version 2, there will be annual releases based on user feedback. Each subscriber company will receive all releases during the subscription period at no extra costs.

Competition

Sphynx will enter an arena with bigger companies and their products such as Collibra, Dell Boomi, Informatica, and others; but due to its low cost and immediate usage (right after installation), it is the perfect entry-level package for companies interested in an effective data catalog with practical applications in the market for companies of any size.

Contact

Please contact me for further information on Sphynx and how to subscribe, or receive a 30-day evaluation. You can also check the updates on the blog.

 

Marco Aurelio Ribeiro

macr2011@gmail.com

+55 51 983250251

https://truedatarch.blogspot.com/

 

Comments