|
Data Integrity Institute Inc. has developed and through research continues to systematically improve upon some of the most efficient sorting algorithm based on advanced nanotechnology research.
Today's major database and ETL engines use comparison based sorting algorithms derived from the well known quick sort, heap sort, merge sort, etc., algorithms. All of these comparison based algorithms operate on the item level, utilizing multiple active and passive comparisons per item.
Active comparison compares a selected item with other items by calling comparison functions customized to handle a comparison for a particular item type, which returns three possible outcomes: equal (0), greater than (+1), or less than (-1). The comparison process can be very slow because of the need to compare each item to other items multiple times and especially because of the customized comparison function call overhead.
Passive comparison checks each item several times for the position of that item in the memory array and/or in the file to ensure that that item is within its expected boundaries. This is done to eventually swap compared items and for many other purposes. The position of an item is represented by an integer, but it is still time consuming to perform the checks several times for each item.
Through its nanotechnology research initiative, Data Integrity Institute Inc. has developed the most efficient sorting algorithm that does not operate on the item level, but rather on the item sub level (nano level), and never performs either active or passive comparison of items. With such fine granulation there is no overhead making the sorting process fast and efficient.
Nanotechnology Sorting Statistics ( on a single workstation )
Nanotechnology Sorting sorts an array of 1 billion ( 1 000 000 000 ) double, 64 bit, floating point numbers (IEEE Standard 754), within one second on the single 4 core CPU with 32 GB RAM memory.
Workstation: Single
Motherboard: One
CPU: One
Cores per CPU: Four
RAM: 32 GB
Operating system: Either Microsoft Windows 64 bit or Linux 64 bit
Sorted data: Random generated array of 1 billion double, 64 bit, floating point numbers (IEEE Standard 754)
FPU used: No, all notechnology, sub item level access
Data types supported: Any
SQL data types supported: All SQL data types, including all types of Binary Large Objects (BLOB)
Most efficient data to sort: Integer
Linearity distortion (slowing) when sorted double, 64 bit, floating point numbers (IEEE Standard 754), data type, compared to 64 bit integer data type: Less than 0.01 (1%)
- Description: Sorting of an array of double, 64 bit, floating point numbers, will take no more than 1.01 of time of sorting of a same size array of 64 bit integers
Linearity distortion (slowing) when size of a sorted array is doubled: Less than 0.01 (1%)
- Description: Sorting of a twice size array of same data type will take no more than 2.01 of time
Nanotechnology Sorting ( hard drive operations )
When an array to be sorted exceeds the amount of available RAM so hard drive is used, Nanotechnology Sorting accesses a hard drive in following steps:
1. Sequential read of data
- Description: Data are read sequentially in very large blocks, depends of available RAM. Next block is taken next from previous, and so on.
2. Sequential write of data
- Description: Data are written sequentially in very large blocks, depends of available RAM. Next block is placed next to previous, and so on.
3. Random read of data
- Description: Data are read randomly, rather in large blocks (still not each item individually), depends of available RAM. Next block is taken according internal Nanotechnology Sorting algorithm logic.
4. Sequential write of data
- Description: Data are written sequentially in very large blocks, depends of available RAM. Next block is placed next to previous, and so on.
Number of hard drives support: Limited by operating system
Number of files support: Limited by operating system
Size of files support: Limited by operating system
Number of nodes support: Limited by operating system
Size of RAM support: Limited by operating system
Support for multiple operating systems for a single sorting session: Yes
Number of multiple operating systems for a single sorting session: No limit
Support for different operating systems for a single sorting session: Yes
Number of different operating systems for a single sorting session: No limit
Contact Data Integrity Institute Inc.
For further information on Data Integrity Institute Inc.'s research or how Data Integrity Institute Inc. can help you to implement, save or maintain an enterprise ETL or metadata project, please send a detailed inquiry to: info@DataIntegrityInstitute.com , or call (416) 282-2298
|