The changing business environment and advancements in technology resulted in the exponential growth in the demand for data storage. As businesses require more data about their everyday business activities and the people they interact with so does the data demand increases.
Government legislation, through the Promotion of Access to Information Act, provides for the constitutional right of access to any information that is held by the State or any other person and that is required for the exercise or protection of any rights. It promotes the retention of data for a number of years for the exercise or protection of any rights.
Thus data recorded today must be available and easily accessible in the foreseeable future. This applies to data related to business transactions, emails, internet related traffic, voice calls, etc, resulting in the creation of large data repositories containing read-only data.
These requirements place a tremendous overhead on the data management function and consumes huge amount of resources within the business. Increase in database license fees and hardware costs are forcing executives to look at alternatives to address this business problem. Data must be backed up, included in business continuity strategies, data management plans, large data warehouses, etc. Searching through the large volumes of data is both time consuming and expensive. Most of the data is stored in large relational databases that are maintained by highly skilled individuals. In most instances the data is never modified or altered after creation and seldom adds any business value once it has been used.
This paper provides a fresh look at how this business problem can be addressed using a file-based data repository that is optimized for large volumes of read-only data as well as accessing online relational databases via a common data access layer. Organisations can now have reduce the amount of data kept online by migrating data to flat files. This will reduce the cost of keeping and managing huge databases which still providing a common access layer to the data.
Implemented as a standalone solution, or as a complementary technology to existing investments in relational databases, Cornastone’s Data Access solution provides a high-performance solution for capturing, storing and accessing enormous volumes of historical transaction file based data at approximately ten percent of the cost of storing the same data in relational database solution of equal performance. Cornastone achieves this remarkable cost-performance by focusing on the specific needs of business event data, and meeting those needs in a highly optimized fashion. Cornastone uses ODBC to create a common access layer to all data stored in an organisation.
Our file based Indexing and Query service is specifically focused on the requirements of capturing high-transaction rate data, storing it as cost-effectively as possible, and then providing fast access to the specific data of interest, even when it might be buried within literally billions of other transactions. Relational database technology is certainly capable of handling the exact same task; however, as a general purpose technology with a robust feature set that imposes significant performance overhead, achieving comparable performance to our solution requires up to ten times the hardware investment.
Cornastone Data access layer connects to existing relational databases and to the file server via a Virtualisation Service which provides a common access point to data irrespective of the source. It also masks the operational names and details of the operational stores from the applications accessing the data sores as well as provides for a level of security by only exposing required data to end user applications.
Cornastone Data access Solution can dramatically improve both the length of history available, as well as access to that history, while at the same time dramatically improving the performance of other applications by offloading the burden of capturing and storing enormous volumes of business event data from applications that previously struggled to deal with it.
Figure 1 depicts the functional components that make up the Cornastone architecture. The architecture is comprised of three logical services, i.e.
the indexing service and
the query service,
as well as a configuration editor and a management console
The indexing service discovers new data files when they appear in the target file directories, parses those files based upon predefined file structures, and then creates the required indexes to enable rapid queries. The indexing service supports multiple file parsing conventions, which cover:
ASCII and binary file formats
Fixed and variable format records
Fixed, variable and delimited fields
A rich and extensible portfolio of field data types
An extensive range of byte ordering protocols
The query service accepts queries in SQL format through ODBC, decides the best way to service each query using optimization logic, retrieves the data from the source data files, and then formats the returned fields for presentation back through ODBC as a table view
The virtualisation service operates as a remote database server that connects to both Flat file data via the query service as well as relational databases. Common linked tables are created in this layer to the various data sources. Other relational databases and business intelligence tools can then access record level information directly from the source via one common virtual data store. Thus The Cornastone Data access layer does not load or move the data it finds, but provides direct access to it via SQL.
The current demonstration box is a Linux server with 2 x 3GHz CPUs, 1 GB memory and 6 x 400GB disks. This small server manages volumes in excess of 25 billion records and is able to accrue an additional 200 million records daily with 5 indexes maintained across the entire dataset.
The solution is able to discover, parse and index each file within seconds of it appearing on the file system. Highly selective queries can be simultaneously against the entire record population in sub-second time.
Since the query is passed directly to the DBMS of the operational data store, performance is based on the response time of the source data system.