Description
Relational Database Fundamentals
In This Chapter
– Organizing information
– Defining “database” in digital terms
– Deciphering DBMS
– Looking at the evolution of database models
– Defining “relational database” (can you relate?)
– Considering the challenges of database design
SQL (pronounced ess-que-ell, not see’qwl, though database geeks still argue about that) is a language specifically designed with databases in mind. SQL enables people to create databases, add new data to them, maintain the data in them, and retrieve selected parts of the data. Developed in the 1970s at IBM, SQL has grown and advanced over the years to become the industry standard. It is governed by a formal standard maintained by the International Standards Organization (ISO).
Various kinds of databases exist, each adhering to a different model of how the data in the database is organized.
SQL was originally developed to operate on data in databases that follow the relational model. Recently, the international SQL standard has incorporated part of the object model, resulting in hybrid structures called object-relational databases. In this chapter, I discuss data storage, devote a section to how the relational model compares with other major models, and provide a look at the important features of relational databases.
Before I talk about SQL, however, I want to nail down what I mean by the term database. Its meaning has changed, just as computers have changed the way people record and maintain information.
Keeping Track of Things
Today people use computers to perform many tasks formerly done with other tools. Computers have replaced typewriters for creating and modifying documents. They’ve surpassed electromechanical calculators as the best way to do math. They’ve also replaced millions of pieces of paper, file folders, and file cabinets as the principal storage medium for important information. Compared with those old tools, of course, computers do much more, much faster — and with greater accuracy. These increased benefits do come at a cost, however: Computer users no longer have direct physical access to their data.
When computers occasionally fail, office workers may wonder whether computerization really improved anything at all. In the old days, a manila file folder “crashed” only if you dropped it — then you merely knelt down, picked up the papers, and put them back in the folder. Barring earthquakes or other major disasters, file cabinets never “went down,” and they never gave you an error message. A hard-drive crash is another matter entirely: You can’t “pick up” lost bits and bytes. Mechanical, electrical, and human failures can make your data go away into the Great Beyond, never to return.
Taking the necessary precautions to protect yourself from accidental data loss allows you to start cashing in on the greater speed and accuracy that computers provide.
If you’re storing important data, you have four main concerns:
– Storing data has to be quick and easy because you’re likely to do it often.
– The storage medium must be reliable. You don’t want to come back later and find some (or all) of your data missing.
– Data retrieval has to be quick and easy, regardless of how many items you store.
– You need an easy way to separate the exact information you want now from the tons of data that you don’t want right now.
State-of-the-art computer databases satisfy these four criteria. If you store more than a dozen or so data items, you probably want to store those items in a database.
What Is a Database?
The term database has fallen into loose use lately, losing much of its original meaning. To some people, a database is any collection of data items (phone books, laundry lists, parchment scrolls . . . whatever). Other people define the term more strictly.
In this book, I define a database as a self-describing collection of integrated records. And yes, that does imply computer technology, complete with programming languages such as SQL.
A database consists of both data and metadata. Metadata is the data that describes the data’s structure within a database. If you know how your data is arranged, then you can retrieve it. Because the database contains a description of its own structure, it’s self-describing. The database is integrated because it includes not only data items but also the relationships among data items.
The database stores metadata in an area called the data dictionary, which describes the tables, columns, indexes, constraints, and other items that make up the database.
Because a flat-file system (described later in this chapter) has no metadata, applications written to work with flat files must contain the equivalent of the metadata as part of the application program.
Database Size and Complexity
Databases come in all sizes, from simple collections of a few records to mammoth systems holding millions of records. Most databases fall into one of three categories, which are based on the size of the database itself, the size of the equipment it runs on, and the size of the organization that is maintaining it:
– A personal database is designed for use by a single person on a single computer. Such a database usually has a rather simple structure and a relatively small size.
– A departmental or workgroup database is used by the members of a single department or workgroup within an organization. This type of database is generally larger than a personal database and is necessarily more complex; such a database must handle multiple users trying to access the same data at the same time.
– An enterprise database can be huge. Enterprise databases may model the critical information flow of entire large organizations.
What Is a Database Management System?
Glad you asked. A database management system (DBMS) is a set of programs used to define, administer, and process databases and their associated applications. The database being managed is, in essence, a structure that you build to hold valuable data. A DBMS is the tool you use to build that structure and operate on the data contained within the database.
You can find many DBMS programs on the market today. Some run on large and powerful machines, and some on personal computers, notebooks, and tablets. A strong trend, however, is for such products to work on multiple platforms or on networks that contain different classes of machines. An even stronger trend is to store data in data centers or even to store it out in the cloud, which could be a public cloud run by a large company such as Amazon, Google, or Microsoft, via the Internet, or it could be a private cloud operated by the same organization that is storing the data on its own intranet.
These days, cloud is a buzzword that is bandied about incessantly in techie circles. Like the puffy white things up in the sky, it has indistinct edges and seems to float somewhere out there. In reality, it is a collection of computing resources that is accessible via a browser, either over the Internet or on a private intranet. The thing that distinguishes the computing resources in the cloud from similar computing resources in a physical data center is the fact that the resources are accessible via a browser rather than an application program that directly accesses those resources.