Database design for Google Calendar: pt. 7


Previously in the series

This is the final post in the “Modeling Google Calendar” series. Previously:

This series illustrates the database design approach explained in the book called “Database Design using Minimal Modeling”, scheduled to be released in Summer 2024. Here is the website of the book: https://databasedesignbook.com/. You can subscribe to receive updates on the book and to get notified about new database design content.

In the final chapter we’re going to design physical tables, based on the logical model defined in the previous chapters.

Part 7. Creating SQL tables

In the previous chapters we defined the complete logical model, so most of the work is actually already done. The rest is pretty straightforward.

For teaching purposes we’re going to use one specific table design strategy: “one table per anchor”. It is one of the most common approaches to physical table design. There are several more possible strategies, we’re going to discuss them in a book.

We’re going to revisit the tables from the previous section, and fill in our choices:

  • For anchors, fill in “Physical table” columns;
  • For each attribute, fill in “Physical column”, and choose the “Physical type”;
  • For each M:N link, choose the name of the physical table;
  • For each 1:N link, fill in the column name in the table that corresponds to the N-side anchor;

Anchors: choose names for tables

Here we just choose a straightforward plural name for each table.

Some companies or applications enforce different naming conventions (singular, camel case, etc.). In that case, you would just use the names that comply with the convention.

Attributes: choose the column name and physical type

For the physical column name, we choose some sensible name. For example:

  • day_events.end_date would be the column name for the “When does the DayEvent begin?” attribute;
  • ​​time_slots.is_skipped would be the column name for the “Is this TimeSlot skipped?” attribute

And so on. Due to the way relational databases work, you have to choose a very short name. In many cases this name by itself is not enough to fully explain the meaning of the data. That’s one of the reasons why we begin with the logical schema, and use long-ish human-readable questions to define the semantics of the attributes.

For the physical type, we choose a sensible type without much discussion. This topic is discussed extensively in the book. There is also a list of recommended data types for each logical type in the book, and we just use that directly.

If you’re working with an existing system, you may be required to choose an alternative physical data type for the column. For example, your database server may support a better suited data type; or there could be some engineering guidelines that make you choose a different data type.

We’re going to discuss this in the book. However, full discussion of all physical design concerns is well outside of the scope of any book. This is the stuff that you spend your career on learning.

For this problem, we used around half of logical attribute types:

  • string: 4 attributes;
  • date: 4 attributes;
  • either/or/or: 3 attributes;
  • integer: 2 attributes;
  • date/time (local): 4 attributes;
  • yes/no: 2 attributes.

You can see that the physical definitions of attributes of the same logical type are almost the same. The only differences are: a) maximum length of strings; and b) NULL vs NOT NULL.

We choose “NOT NULL” for attributes where the value always needs to be there due to business requirements. For example, the name of the event, or the start date of the all-day event. For tangled attributes, we choose nullable physical types (“NULL”). We discuss nullability in the book.

1:N Links

For 1:N links, we use a column in the N-side anchor table. For example:

Choosing the column name is usually quite easy. The only complication could be when there are two and more different links between the same two anchors. We have that situation with the timezones, and we’ll use two different columns.

M:N links

For M:N links we must use a separate table for each link. Every such table will have identical structure, only the column names would be different.

We only need to find a good name for such a table. There is no naming method that works for all, you will have to try some combinations, looking for readability. For links this is especially difficult because it’s not clear which of the two anchors is more important.

Same as with attributes, due to the way relational databases work, the name of the table is going to be quite short. In many cases this name by itself is not enough to fully explain the meaning of the data. That’s one of the reasons why we prepare logical schema, and use human-readable sentences to define the semantics of the links.

Anyway, here is the full table of links with the names chosen for the tables.

Finally: the tables

As we mentioned in the previous section, we’re going to have 8 (eight) SQL tables: 6 for anchors and 2 for M:N links. One anchor (DayOfTheWeek) is special, so we don’t create a physical table for that. We use a very common approach to designing physical tables. Other approaches are also possible, but this discussion is outside the scope of this post. So, let’s just write down all the tables, and add all the attributes that we have.

This is a very straightforward and even boring process at this point.


CREATE TABLE users (
id INTEGER NOT NULL PRIMARY KEY AUTO_INCREMENT,
email VARCHAR(64) NOT NULL
);

CREATE TABLE timezones (
id INTEGER NOT NULL PRIMARY KEY AUTO_INCREMENT,
name VARCHAR(64) NOT NULL
);

CREATE TABLE day_events (
id INTEGER NOT NULL PRIMARY KEY AUTO_INCREMENT,
user_id INTEGER NOT NULL,
name VARCHAR(128) NOT NULL,
begin_date DATE NOT NULL,
end_date DATE NOT NULL,
repeated VARCHAR(24) NULL,
repetition_step INTEGER NULL,
repeated_monthly_on VARCHAR(24) NULL,
repeated_until VARCHAR(24) NULL,
repeated_until_date VARCHAR(24) NULL,
repeated_reps INTEGER NULL
);

CREATE TABLE time_events (
id INTEGER NOT NULL PRIMARY KEY AUTO_INCREMENT,
user_id INTEGER NOT NULL,
start_timezone_id INTEGER NOT NULL,
end_timezone_id INTEGER NOT NULL,
name VARCHAR(128) NOT NULL,
begin_local_time DATETIME NOT NULL,
end_local_time DATETIME NOT NULL
);

CREATE TABLE day_slots (
id INTEGER NOT NULL PRIMARY KEY AUTO_INCREMENT,
day_event_id INTEGER NOT NULL,
the_date DATE NOT NULL,
is_skipped TINYINT UNSIGNED NOT NULL DEFAULT 0
);

CREATE TABLE time_slots (
id INTEGER NOT NULL PRIMARY KEY AUTO_INCREMENT,
time_event_id INTEGER NOT NULL,
begin_local_time DATETIME NOT NULL,
end_local_time DATETIME NOT NULL,
start_timezone_id INTEGER NOT NULL,
end_timezone_id INTEGER NOT NULL,
is_skipped TINYINT UNSIGNED NOT NULL DEFAULT 0
);

CREATE TABLE day_event_dows (
day_event_id INTEGER NOT NULL,
day_of_week VARCHAR(3) NOT NULL,
PRIMARY KEY (day_event_id, day_of_week),
KEY (day_of_week)
);

CREATE TABLE time_event_dows (
time_event_id INTEGER NOT NULL,
day_of_week VARCHAR(3) NOT NULL,
PRIMARY KEY (time_event_id, day_of_week),
KEY (day_of_week)
);

Is that really it? Mostly, yes. Though we need to talk about indexes and about the attributes that we’ve skipped for brevity.

Most experienced database developers would look at the schema above and immediately notice that some “obvious” indexes are missing. For example, day_events.user_id must certainly be indexed. Unfortunately, there is no hard and fast rule on what columns (and combinations of columns) need to be indexed. That depends on how the tables are going to be queried by the application. The best book about database indexes is called “Use The Index, Luke” (https://use-the-index-luke.com/). Go read it.

When we were talking about logical schema (especially in the beginning), we skipped some of the attributes, because they were either trivial, or too complicated. For example, we would probably want to add the name of the user, and the column that stores the user’s password hash. Some of the data elements just don’t add anything new to this text, for example the event location, or the list of invited guests. As an exercise, you could go ahead and add the elements that we did not discuss, the ones that you’re interested in. Add a few rows to the catalog tables, fill in the contents of each cell, and then edit the schema definition above to include the missing pieces of data.

What’s next

Here is a short summary of the process:

  • start with a free-form text that describes the business problem you’re working on;
  • write down the list of anchors in a structured format;
  • write down the structured list of attributes, pay particular attention to the questions;
  • write down the structured list of links, pay particular attention to the sentences, because they help you make sure that you get cardinality right;
  • draw a graphical schema based on the logical model, if visual representation helps you think;
  • fill in the physical model details: table names, column names, data types;
  • write down the SQL tables, using information from the previous step;
  • share the logical model with your team;

This series illustrated the database design approach explained in the book called “Database Design using Minimal Modeling”, scheduled to be released in Summer 2024. Here is the website of the book: https://databasedesignbook.com/. Leave your email address to receive updates on the book and to get notified about new posts.

Minimal Modeling

Minimal Modeling: making sense of your database

Read more from Minimal Modeling

Previously in the series This is the fifth post in the “Modeling Google Calendar” series. Previously: https://databasedesignbook.com/posts/database-design-for-google-calendar-pt-1 for introduction, problem description and the first part, “Basic all-day events”; https://databasedesignbook.com/posts/database-design-for-google-calendar-pt-2 “Time-based events”, and some discussion of timezones; https://databasedesignbook.com/posts/database-design-for-google-calendar-pt-3 “Repeated all-day...

Introduction This is the fourth post in the “Modeling Google Calendar” series. Previously: https://databasedesignbook.com/posts/database-design-for-google-calendar-pt-1 for introduction, problem description and the first part, “Basic all-day events”; https://databasedesignbook.com/posts/database-design-for-google-calendar-pt-2 “Time-based events”, and some discussion of timezones; https://databasedesignbook.com/posts/database-design-for-google-calendar-pt-3 Repeated all-day events; This...

Introduction This is the third post in the “Modeling Google Calendar” series. Previously: https://databasedesignbook.com/posts/database-design-for-google-calendar-pt-1 for introduction, problem description and the first part, “Basic all-day events”; https://databasedesignbook.com/posts/database-design-for-google-calendar-pt-2 “Time-based events”, and some discussion of timezones; This series illustrates the database design approach explained in the book called “Database Design using Minimal...