0% found this document useful (0 votes)
14 views

01-introduction-annotated

databases

Uploaded by

thandiwegreens
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views

01-introduction-annotated

databases

Uploaded by

thandiwegreens
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 54

Course Introduction

Lecture 1: Course Introduction &


History of Database Systems

1 / 54
Course Introduction

Welcome!

• This course focuses on the design and implementation of database management


systems (DBMSs).
• We will study the internals of modern database management systems.
• We will cover the core concepts and fundamentals of the components that are used in
high-performance transaction processing systems (OLTP) and large-scale analytical
systems (OLAP).

2 / 54
Course Introduction

Today’s Agenda

• Course Outline & Logistics


• History of Database Systems

3 / 54
Course Introduction Course Outline & Logistics

Course Outline & Logistics

4 / 54
Course Introduction Course Outline & Logistics

Why you should take this course?

• You want to learn how to make database systems scalable, for example, to support
web or mobile applications with millions of users.
• You want to make applications that are highly available (i.e., minimizing downtime)
and operationally robust.
• You have a natural curiosity for the way things work and want to know what goes on
inside major websites and online services.
• You are looking for ways of making systems easier to maintain in the long run,even as
they grow and as requirements and technologies change.
• If you are good enough to write code for a database system, then you can write code
on almost anything else.

5 / 54
Course Introduction Course Outline & Logistics

Course Objectives

• Learn about modern practices in database internals and systems programming.


• Students will become proficient in:
▶ Writing correct + performant code
▶ Proper documentation + testing
▶ Working on a systems programming project

6 / 54
Course Introduction Course Outline & Logistics

Course Topics

• Logging & Recovery Methods


• Concurrency Control
• Query Optimization, Compilation
• New Hardware (NVM, FPGA, GPU)

7 / 54
Course Introduction Course Outline & Logistics

Background

• I assume that you have already taken an intro course on database systems (e.g.,, GT
4400).
• We will discuss modern variations of classical algorithms that are designed for today’s
hardware.
• Things that we will not cover: SQL, Relational Algebra, Basic Algorithms + Data
Structures.

8 / 54
Course Introduction Course Outline & Logistics

Background

• All programming assignments will be written in C++11.


• You will learn how to debug and profile multi-threaded programs.
• Assignment 1 will help get you caught up with C++.

9 / 54
Course Introduction Course Outline & Logistics

Course Logistics

• Course Web Page


▶ Schedule: https://www.cc.gatech.edu/ jarulraj/courses/8803-s21/
• Discussion Tool: Piazza
▶ https://www.piazza.com/gatech/spring2021/cs8803dsi
▶ For all technical questions, please use Piazza. Don’t email me directly.
▶ All non-technical questions should be sent to me
• Grading Tool: Gradescope
▶ You will get immediate feedback on your assignment.
▶ You can iteratively improve your score over time.
• Virtual Office Hours
▶ Will be posted on Piazza.

10 / 54
Course Introduction Course Outline & Logistics

Course Logistics

• Course Policies
▶ The programming assignments and exercise sheets must be your own work.
▶ They are not group assignments.
▶ You may not copy source code from other people or the web.
▶ Plagiarism will not be tolerated.
• Academic Honesty
▶ Refer to Georgia Tech Academic Honor Code.
▶ If you are not sure, ask me.

11 / 54
Course Introduction Course Outline & Logistics

Late Policy

• You are allowed ten total slip days (for programming assignments and exercise sheets).
• You lose 25% of an assignment’s points for every 24 hrs it is late.
• Mark on your submission (1) how many days you are late and (2) how many late days
you have left.

12 / 54
Course Introduction Course Outline & Logistics

Teaching Assistants

• Gaurav Tarlok Kakkar


▶ M.S. (Computer Science)
▶ Worked at Adobe (2 years).
▶ Research Topic: Video analytics using deep learning.
• If you are acing through the assignments, you might want to hack on the video
analytics system (codenamed EVA) that we are building.
• Drop me a note if you are interested!

13 / 54
Course Introduction Course Outline & Logistics

Course Rubric

• Project (20%)
• Programming Assignments (45%)
• Exercise Sheets (15%)
• Mid-term Exam (20%)

14 / 54
Course Introduction Course Outline & Logistics

Project - Outline

• A key component of this course will be an original research project.


• Students will organize into groups and choose to implement a project that is:
▶ Relevant to the topics discussed in class.
▶ Requires a significant programming effort from all team members.

15 / 54
Course Introduction Course Outline & Logistics

Project - Outline

• You don’t have to pick a topic until midway through the course.
• We will provide sample project topics.
• This project can be a conversation starter in job interviews.

16 / 54
Course Introduction Course Outline & Logistics

Project – Deliverables

• Proposal: 2-page report + presentation


• Status Update: 3-page report + presentation
• Final: 4-page report + presentation

17 / 54
Course Introduction Course Outline & Logistics

Project – Proposal

• Five minute presentation to the class that discusses the high-level topic.
• Each proposal must discuss:
▶ What is the problem being addressed by the project?
▶ Why is this problem important?
▶ How will the team solve this problem?

18 / 54
Course Introduction Course Outline & Logistics

Project – Status Update

• Five minute presentation to update the class about the current status of your project.
• Each presentation should include:
▶ Current development status.
▶ Whether anything in your plan has changed.
▶ Any thing that surprised you.

19 / 54
Course Introduction Course Outline & Logistics

Project – Final Presentation

• Ten minute presentation on the final status of your project during the finals week.
• You’ll want to include any performance measurements or benchmarking numbers for
your implementation.
• Demos are always hot too.

20 / 54
Course Introduction Course Outline & Logistics

Programming Assignments

• Five assignments based on the BuzzDB academic DBMS.


• Goal is to familiarize you with the internals of database management systems.
• We will use Gradescope for giving you immediate feedback on programming
assignments and Piazza for providing clarifications.
• We will provide you with test cases and scripts for the programming assignments.
• If you have not yet received an invite from Gradescope, you can use the entry code
that will be shared on Piazza.

21 / 54
Course Introduction Course Outline & Logistics

Exercise Sheets

• Three pencil-and-paper tasks.


• You will need to upload the sheets to Gradescope.
• We will share the grading rubric for exercise sheets via Gradescope.

22 / 54
Course Introduction Course Outline & Logistics

Exercise Sheet #1

• Hand in one page with the following information:


▶ Digital picture (ideally 2x2 inches of face)
▶ Name, interests, More details on Gradescope
• The purpose of this sheet is to help me:
▶ know more about your background for tailoring the course, and
▶ recognize you in class

23 / 54
Course Introduction History of Database Systems

History of Database Systems

24 / 54
Course Introduction History of Database Systems

History Repeats Itself

• Reference
• Design decisions in early database systems are still relevant today.
• The “SQL vs. NoSQL” debate is reminiscent of “Relational vs. CODASYL” debate.
• Old adage: he who does not understand history is condemned to repeat it.
• Goal: ensure that future researchers avoid replaying history.

25 / 54
Course Introduction History of Database Systems

1960s – IBM IMS

• Information Management System


• Early database system developed to keep track of purchase orders for Apollo moon
mission.
▶ Hierarchical data model.
▶ Programmer-defined physical storage format.
▶ Tuple-at-a-time queries.

26 / 54
Course Introduction History of Database Systems

Hierarchical Data Model

27 / 54
Course Introduction History of Database Systems

Hierarchical Data Model

sno sname scity sstate parts


students 1001 Maria New York NY part-1
1002 Rahul rahul@cs MA part-2

pno pname psize qty price


part-1
999 Batteries Large 10 100

pno pname psize qty price


part-2
999 Batteries Large 14 99

28 / 54
Course Introduction History of Database Systems

Hierarchical Data Model

• Advantages
▶ No need to reinvent the wheel for every application
▶ Logical data independence: New record types may be added as the logical requirements
of an application may change over time.

29 / 54
Course Introduction History of Database Systems

Hierarchical Data Model

• Limitations
▶ Information is repeated.
▶ Tree structured data model is very restrictive: Existence depends on parent tuples.
▶ No Physical data independence: Cannot freely change storage organization to tune a
database application because there is no guarantee that the applications will continue to
run
▶ Optimization: A tuple-at-a-time user interface forces the programmer to do manual query
optimization, and this is often hard.

30 / 54
Course Introduction History of Database Systems

1960s – IDS

• Integrated Data Store


• Developed internally at GE in the early 1960s.
• GE sold their computing division toHoneywell in 1969.
• One of the first DBMSs:
▶ Network data model.
▶ Tuple-at-a-time queries.

31 / 54
Course Introduction History of Database Systems

1960s – CODASYL

• COBOL people got together and proposeda


standard for how programs will access a
database. Lead by Charles Bachman.
▶ Network data model.
▶ Tuple-at-a-time queries.

32 / 54
Course Introduction History of Database Systems

Network Data Model

33 / 54
Course Introduction History of Database Systems

Network Data Model

• Advantages
▶ Graph structured data models are less restrictive
• Limitations
▶ Poorer physical and logical data independence: Cannot freely change storage
organizations or change application schema
▶ Slow loading and recovery: Data is typically stored in one large network. This much
larger object had to be bulk-loaded all at once, leading to very long load times.

34 / 54
Course Introduction History of Database Systems

1970s – Relational Data Model

• Ted Codd was a mathematician working at IBM


Research.
• He saw developers spending their time
rewriting IMS and Codasyl programs every
time the database’s schema or layout changed.
• Database abstraction to avoid this maintenance:
▶ Store database in simple data structures.
▶ Access data through high-level declarative
language.
▶ Physical storage left up to implementation.

35 / 54
Course Introduction History of Database Systems

1970s – Relational Data Model

36 / 54
Course Introduction History of Database Systems

Relational Data Model

• Advantages
▶ Set-a-time languages are good, regardless of the data model, since they offer physical data
independence
▶ Logical data independence is easier with a simple data model than with a complex one.
▶ Query optimizers can beat all but the best tuple-at-a-time DBMS application
programmers.

37 / 54
Course Introduction History of Database Systems

1970s – Relational Data Model

• Early implementations of relational DBMS:


▶ System R – IBM Research
▶ INGRES – U.C. Berkeley
▶ Oracle – Larry Ellison

38 / 54
Course Introduction History of Database Systems

1980s – Relational Data Model

• The relational model wins.


▶ IBM comes out with DB2 in 1983.
▶ “SEQUEL” becomes the standard (SQL).
• Many new “enterprise” DBMSs, but Oracle wins marketplace.
• Examples: Teradata, Informix, Tandem, e.t.c.

39 / 54
Course Introduction History of Database Systems

1980s – Object-Oriented Data Model

• Avoid relational-object impedance mismatch by tightly coupling objects and


database.
• Analogy: Gluing an apple onto a pancake
• Objects are treated as a first class citizen.
• Objects may have many-to-many relationships and are accessed using pointers.
• Few of these original DBMSs from the 1980s still exist today but many of the
technologies exist in other forms (e.g., JSON, XML)
• Examples: Object Store, Mark Logic, e.t.c.

40 / 54
Course Introduction History of Database Systems

1980s – Object-Oriented Data Model

41 / 54
Course Introduction History of Database Systems

1980s – Object-Oriented Data Model

42 / 54
Course Introduction History of Database Systems

1990s – Boring Days

• No major advancements in database systems or application workloads.


▶ Microsoft forks Sybase and creates SQL Server.
▶ MySQL is written as a replacement for mSQL.
▶ Postgres gets SQL support.
▶ SQLite started in early 2000.

43 / 54
Course Introduction History of Database Systems

2000s – Internet Boom

• All the big players were heavyweight and expensive.


• Open-source databases were missing important features.
• Many companies wrote their own custom middleware to scale out database across
single-node DBMS instances.

44 / 54
Course Introduction History of Database Systems

2000s – Data Warehouses

• Rise of the special purpose OLAP DBMSs.


▶ Distributed / Shared-Nothing
▶ Relational / SQL
▶ Usually closed-source.
• Significant performance benefits from using Decomposition Storage Model (i.e.,
columnar storage)

45 / 54
Course Introduction History of Database Systems

2000s – NoSQL Systems

• Focus on high-availability & high-scalability:


▶ Schema-less (i.e., “Schema Last”)
▶ Non-relational data models (document, key/value, etc)
▶ No ACID transactions
▶ Custom APIs instead of SQL
▶ Usually open-source

46 / 54
Course Introduction History of Database Systems

2010s – NewSQL

• Provide same performance for OLTP workloads as NoSQL DBMSs without giving up
ACID:
▶ Relational / SQL
▶ Distributed
▶ Usually closed-source

47 / 54
Course Introduction History of Database Systems

2010s – Hybrid Systems

• Hybrid Transactional-Analytical Processing.


• Execute fast OLTP like a NewSQL system while also executing complex OLAP queries
like a data warehouse system.
▶ Distributed / Shared-Nothing
▶ Relational / SQL
▶ Mixed open/closed-source.

48 / 54
Course Introduction History of Database Systems

2010s – Cloud Systems

• First database-as-a-service (DBaaS) offerings were containerized versions of existing


DBMSs.
• There are new DBMSs that are designed from scratch explicitly for running in a cloud
environment.

49 / 54
Course Introduction History of Database Systems

2010s – Specialized Systems

• Shared-disk DBMSs
• Embedded DBMSs
• Times Series DBMS
• Multi-Model DBMSs
• Blockchain DBMSs

50 / 54
Course Introduction History of Database Systems

2010s – Specialized Systems

51 / 54
Course Introduction Conclusion

Conclusion

52 / 54
Course Introduction Conclusion

Parting Thoughts

• There are many innovations that come from both industry and academia.
▶ Lots of ideas start in academia but few build complete DBMSs to verify them.
▶ IBM was the vanguard during 1970-1980s but now there is no single trendsetter.
▶ The era of cloud systems has begun.
• The relational model has won for operational databases.

53 / 54
Course Introduction Conclusion

Next Class

• Recap of topics covered in the first course


• Submit exercise sheet #1 via Gradescope.

54 / 54

You might also like