Thursday, January 26, 2006

OldPosts: RDBMS - a 'sql' server of my own

RDBMS : Relational Database Management Systems.
Actually this idea came to me while i was thinking of what project to do in the compilers course; that was last summer. It was orignally about parsing SQL, but it was extended to a full idea of a DBMS.
I liked the idea. Overtime, ideas to implement and how to implement it started to accumulate. Until it had to burst out. Knowing how to do something, and wanting to do it, is a powerful force. So i started seriuosly thinking about implementing it this vacation. I started by reading some book called "Database management systems, 2nd edition". A great introduction which gave me a great backgroung on the basics of the database theory and Relational algebra and calculus ( search for it in http://wikipedia.com ).
There is different kinds of DBs, i will mention the names only, not diving in details. Old systems includes Navigational databases. Then about 70s they invented the well-known commonly-used Relational databases. Now there is Object Databases, supporting OOP. This is NOT the whole list, it is just what i could remember for the moment.
What really made me think seriously about making one myself, is an articel i read on Mohamed Meshref's Blog about one student who made one server which can -under some circumstances- execute some queries faster than Oracle's.
With all that information on my head, i opend Visual C++, created an empty header file, gazed at the screen for about 10 minutes trying to figure how to start. As from experience i knew that all systems starts small, i did that. I made a small program that saves an array of structs in a file. And then focused my "grey brain cells" -like Agatha Christie said- on how to generalize that to any user input; on the same struct ( username, pk, and age ). I figured it needs a SQL parser, so i left that part, and went in the other direction, how to generalize the struct contents first ?. I found a great one here http://groups.google.com/group/comp.lang.c/browse_thread/thread/527b451b96ce957a/d612253f370e6aab?q=variable+length&rnum=3#d612253f370e6aab . So i started designing the datastructures required.
I forgot to mention that in coincidence i read in "Introcution to Algorithms" about B-Trees and how they are used in DBMS to index tables in order to minimize disk I/O. Imagine thay can minimize disk I/O for a billion-record table from 32 IO read ( using binary search ), to only 3 IO reads ( of course it traded-off to 3000 comparisons ).
The second step after designing the data structure is defining the way of altering a table's content. Then after that build a way to handle 'parsed' queries and mapping them to these APIs. Parsing SQL is not an issue, some hours with Yacc/flex, and BOOM we have one.
The only remaining part, is what is required "exactly" from a DBMS. The basic implementation/engine is defined by Relational calculus. Other parts will just complay to any SQL standard, i have chosen SQL-92.
As usual i feel this is too short article, maybe i will follow up sometime with part2 or something continuing the in-depth details of the data-structures and how i 'will' -isA- handle the data integrity checks, and table altering, and the indexing problems - B-Trees is not easy to implement yo know !
I'll just go complete the data-structure heirarchy now, wish me luck!

No comments: