Saturday, January 17, 2004

Data Modeling

Database Systems; Design, Implementation, and Management
Rob/Coronel

Chapter 1

  1. Introducing the database
    It is necessary in study of databases we need to know the diffference between Data and information. Data are raw facts. Information is data that has been procesed so that it can be useful to people who must make decisions. Out of necessity, for this information to be accurate, it must be clear, timely and useful. For this to be done efficently a computer is used normally and the main tool for this would be a database. A database is shared, integratedcomputer structure that houses a collection of: 1)End user data (raw facts) 2) Metadata, or data about the data. A Database Management System (DBMS) is a collection of programs that manage the database structure and controls access to the data stored in the database. When we speak of database design, we speak of the data structure itself since we cannot change the software that we purchace to do the databases.
    1. Why Database Design is Important
      You must have a well designed structure to your data for it to be able to be used well. The best DBMS will not work well with poorly designed data. Redundant data, same data in several places, is a sign of a poorly designed error proned database. This will lead to poor decisions based on poor information.
    2. A Practical Approach to Database Design
      We will study good database design and not the outer limits of database theory.
  2. The Historical Roots of Database:Files and File Systems
    File systems are now for the most part obsolete but the study of our past lets us see the mistakes not to make in the future. Originally the data was kept in a system of physical folders and cabinets and data that was needed to be extracted to useful information took a long time. As computers became more prevalant, the data processing (DP) specialist became a new professional in many companies. They used File systems for this. Some terminology:
    DataRaw facts. Has little meaning unless organized in logical manner. Smallest piece is charachter
    FieldCharachter or group of charachters that have a specific meaning. Used to define and store data
    RecordLogically connected set of one or more fields.
    FileA collection of related records
    As file systems grew and the need for maintaining them did also, the data processing manager became a prominante position.
  3. A File System Critique
    1. File System Data Management
      Data retirvial is done with use of 3GL, third generaltion languages, time consuming and high skill. This makes 'ad hoc' or at the spur of the moment data needs an impossibility. Furthermore changing field in a file system can cause extensive reprograming not only in the file in question, but in any programs that reference that file. File system method promotes ownership of the data which tends to scatter the same data over many locations.
    2. Structural and Data Dependence
      File systems exhibit structural dependence, meaning that access to file is dependant to its structure. Changing any one field is also a problem, and the means a file system is data depandent. Data Logical Format, how the human sees the layout of data, and data physical format, how the computer actual stores it, are two important concepts. In a file system, this layout has to be known and the program accessing things needs to know how the data is stored(how the computer stores it) and the format of the information the program is storing.
    3. Field Definitions and Naming Conventions
      A person designing a file system as well as any database should be aware of definitons and naming conventions. Definitions deals with what is being stored. Do we want to store a person's name as one long string or break it up in to last name and first name. Naming conventions also is important. Generaly you would put the ownership as part of the nameplus a short sesical representation of what the data contains. An example of this would be CUS_Renew_Date.
    4. Data Redundancy
      As noted, data redundency is a common problem with File Systems. This can lead to data incosistancy where data is not updated accross the whole system, effecting the data integrity. Because the way the data is stored, one column may have several repeat entries, this can lead to data anomalies, like when a sales agent leaes the company and a new one is assigned. If the agent is not corrected in all fields, a non-exisitant sales agent is assigned to account.
  4. Database Systems
    1. The Database System Environment
      The database system is composed of five major components.
      • Hardware - the physical devices used by a system. Computer(s) and peripherals
      • Software - Programs used by the computer in the database system. These include:
        • Operating system - the programs that run the computer be it a mainframe or a micro (and inbetween)
        • DBMS software - programs that mange the database
        • Application programs/utilty software - used to access and manipulate the data in the DBMS.
      • People
        • System administrators - oversees the database system's general operations
        • Database administrators - (DBA) - manage the DBMS and makes sure it is functioning properly
        • Database Designers - Design the database, and if the foundation is not good, the rest will crumble.
        • Systems analysts/programers - design and implement the programs needed to access the data
        • End Users- the people who all this was put together for in the first place
      • Procedures - instructions and rules that govern the design and use of the database system. These are the rules that the company runs itself by. Though important part, it is usually fogotten.
      • Data - the facts stored in the database.
      The exsistance of these will be as complex as the enviornment is.
    2. Types of Database Management Systems
      • How many users
        • single user - only one user at a time
        • muti-user - many users can access database at same time
      • Location
        • centralized - data at one location
        • distributed - accross several different sites.
        • Type of information
          • transactional/productional DBMS - designed to give immediate response to time critical information
          • decision support system - uses large pools of data, often called data wharehouses, to help decision makers make decisions. Data comes from many sources and often time is not as critical as in transactional systems
      Desigining a data base these items must be taken into concideration.
    3. DBMS Functions
      Data directory management definitions of the data elements and their relationships
      Data storage managementThe complex structures of the data storage are handeled by the DBMS
      Data transformation and presentationThe DBMS handles the transformation of the data in the format it is stored in to the format that can make sense to the user.
      Security managementThe DBMS handles the user security and data privacy as preset by what the admnstrators create.
      Multiuser access controlThe DBMS handles the necessities so that multiple users can access the database at the same time.
      Backup and recovery managementThe DBMS has tools that make backup and recovery procures for both normal and special circumstances
      Data Integrity ManagementThe DBMS will enforce rules so that the data is not redundant nor in the wrong format for use.
      Database Access Language/Application Programming InterfacesDBMS provides access per a query language, a way for users to get information. This consist of data definition language (defining the data structure) and Data manipulation language (the way the data is extracted for use).
      Database communication interfacesThe DBMS allows for requests through a network (example, a query form a browser that is then fed back to browser in format that it can use)
    4. Managing the Database Systems: a Shift in Focus
      A shift in focus has happened as can be seen. The DBMS takes over many of the tasks that the file system's DP specialist had in the past.
    5. Database Design and Modeling
      Database design is done with models, simplified abstractions of the data. If the model is good, the databse design should be as well. If the design is good, then the applications should be. Like a building, if the foundation (model) is not good, then the building (database and applications) will be faulty and not do what it is designed to do.
  5. Database Models
    A conceptual model is based on what is represented in the database. An implementation model is based on how the data is represented in the database. Conceptual models are based on three types and show the realtionship of entities. They are:
    1. one to many relationship - 1:M - example painter paints many paintings, painting only has one painter
    2. many to many relationship - M:N - example employee has job skill, job skill is had by many employee
    3. one to one relationship - 1:1 - example one store has one senior manager

    1. The Hierarchical Database Model

    2. In 1960's Rockwell was doing the contracting for the Apollo project and much of the data for it was stored in computer file systems, leading to large amounts of data derdundency. Because of this they developed the Generalized Update Access Method (GUAM). Later IBM jined Rockwell in what they created, changed it from tape to (new) disk dtorage and called it the Information Magement System (IMS). It used a hierarchial database model, resembling a upside down tree, or an organizational chart for a company. This created a one to many approach.
      Basic structure As noted it took on the structure of a organizational chart. This allowed for an entity to have many children, but it could only have one parent. The base layer is called the root. In order to store the data the way the computer would see it, the tree would have to be flatened a bit. The final asembly might have 3 parts off of the root, 1, 2 and 3. 1 may have 1A and from it 1a1 and 1a2. 2 may have no other parts. 3 might have 3A and 3B, with 3a having 3a1, 3a2 and 3a3, and 3b having nothing. Drawing this out as a tree makes sense. But the computer would store it in order, in a manner like this:

      root -> 1 -> 1A -> 1A1 -> 1A2 -> 2 -> 3 -> 3A1 -> 3A2 -> 3A3 -> 3B

      The dat goes left to right and top to bottom. This is called preorder travel or hierarchial sequence. This order would be the way the data is stoed on the computer. Because of the data that is used more activly should be moved more to the front of the storage. If 3A2 were the most changed item than the data model should be changed to put the 3 to the left side.
      Advantages Simplicity, security, independece of data, integrity of data and efficiency (if in a 1:M relationship).
      Disadvantages Complex to implement, difficult to manage, no structural independence, complex application programing, limitations of implmentation (1:M realtionships only), lack of standards.
    3. The Network Database Mode
      Created to reprresent more complex data. It also improved database performance and imposed a database standard. The Confrence on Data Systems Languages (CODASYL) attempted to create a standard for databases after they had standardized Cobol. The recomended three things
      1. A network schema or conceptual organization of the data
      2. A sub schema that defines what the application program sees
      3. A data management language to define the characteristics and data structure.

      Though ANSI implemented it as a standard, it was 'stretched' to meet the vendor's needs at many times.
      Basic structureResembles the hierachial model. Major difference is that an entity could have more than one parent.
      Advantages Simple concept, M:N relationships handled better, Data access flexibility, data integrity, data independence, and conformace to standards.
      Disadvantages Complexity of system, lack of structural independence.
    4. The Relational Database Model
      First created in 1970 but impractial till the 90's.
      Basic structureInformation is stored in tables with rows and columns. Each table in the databse will share some information with another table. These columns will have the same name so that the information stored in one could be brought into the other table by relation. This allows for all three relational types to be met.
      Advantages Structural independence, conceptual simplicity, easier design/implementation/management/use, ad hoc querying, powerful.
      Disadvantages high overhead for software and hardware, poor design is too easy to do, islands of information are possible because it is too easy for many create their own systems.
    5. The Entity Relationship Data Model
      Because of the complexity of the data, tools are needed to allow user to graphicaly create the database. This becomes the entity relationship model (E-R).
      Basic structureEntity instances or entity occurances (rows in tables), has an attribute (collumn in tables). This entities can have relationship to other information elsewhere. This model was developed by Peter Chen, named the Chen Model in 1976. It uses rectangles to represent entities and diamonds to depict the relationship between them. A similar model, the Crow's foot model uses single or 3 prong lines to represent the various type of relationships that are handled by charachters in Chen's model.
      Advantages simple concepts, visual representation, allows for good communications, integrates with relational databse model.
      Disadvantages Limited constraint representation, limited relationship representation, no data manipulation language, loss of information content.
    6. The Object-Oriented Database Model
      First called the semantic data model (SDM) it uses the concepts of an object of a model. The object contains the structure of the data as well as the things that can be done to that data, as well as the relationships they have.
      Basic structureObject represents one instance of an entity. Attributes describe the properties of an object. Objects that are similar are grouped in classes. Classes are organized in a class hierarchy. An object can inherit attributes and methods form another class.
      Advantages Semantic content added, visual presentation (includes samantic content, integrity of database, structure and data indpendence.
      Disadvantages Standards, complex to navigate date, steep learning curve, high system overhead.
  6. Wrap-Up: The Evolution of Data Models
    1. Database Models and the Internet
  7. Summary
  8. Key Terms
  9. Review Questions
  10. Problems

No comments:

Post a Comment