In database design, normalization is the systematic process of organizing data to reduce redundancy and improve data integrity. One of the most fundamental and frequent patterns you will encounter is the One-to-Many (1:N) relationship.
Imagine we are building a system to track traffic violations. A naive approach might be to put all the information—the driver's details and the violation details—into a single "big table." However, this approach quickly reveals a major flaw: redundancy.
Initial Big Table
Redundancy Highlighted (Scott Morgan)
As shown in the highlighted example, Scott Morgan's name, VIN, and SSN are stored multiple times—once for every violation he commits. This design creates several problems:
To solve these issues, we apply normalization by splitting the data into two distinct logical entities: Drivers and Traffic Violations.
Drivers Table
Traffic Violations Table
By separating these entities, we achieve a "Single Source of Truth." Scott Morgan's personal information is now stored exactly once in the Drivers table, regardless of whether he has one violation or one hundred. This makes the database cleaner, smaller, and much easier to maintain.
Separating the tables is only half the battle; we still need a way to link them back together. We achieve this using a Foreign Key.
id.driver_id column. This column "points" to the id of the driver who committed the violation.Driver ID (Scott Morgan = 3)
Violations pointing to ID 3
Now, instead of duplicating Scott's entire profile, we simply store the integer 3. If we ever need to see Scott's name for a specific violation, we use a JOIN operation to fetch it from the Drivers table using that ID. This ensures that any update to the Drivers table is instantly reflected across all related violations.
A point of confusion for beginners may be deciding which table should hold the reference. Why don't we put a list of violations in the Drivers table?
Trying to put the relationship on the "One" side (Drivers) leads to two highly inefficient designs:
violation_1, violation_2, etc., what happens when a driver commits their 11th violation? We would have to change the entire table structure.By placing the Foreign Key on the "Many" side (Violations), we can have an unlimited number of violations for a single driver without ever changing the table schema.
Problem: Too many columns
Problem: Indeterminate cell size
When designing databases, engineers use Entity Relationship Diagrams (ERD). There are two primary ways to visually represent a One-to-Many connection.
Classic notation uses a simple line or arrow labeled with 1 on the "One" side and N (or sometimes M) on the "Many" side.
Here is how our specific Drivers-to-Violations relationship appears in the ERD diagram, where we typically present tables with their column names and data types instead of sample data rows.
Crow's Foot Notation is the industry standard for modern ERDs. The "three-pronged" foot symbolizes the "Many" side, while vertical bars and circles represent mandatory or optional relationships.
In formal Crow’s Foot notation, every end of a relationship line must have two symbols: an inner symbol and an outer symbol (totaling 4 symbols for the entire relationship).
There are four primary configurations based on whether the ends are mandatory or optional:
Both sides are required.
Example: A traffic_violation record must be linked to exactly one driver, and a driver must have at least one traffic_violation record in the system to be tracked.
The "One" side is required, but the "Many" side is not.
Example: A traffic_violation must belong to exactly one driver, but a driver can exist in the database without having any traffic_violations (a clean record).
The "One" side is optional, but the "Many" side is required.
Example: A traffic_violation can optionally be linked to a driver (e.g., a camera caught a speeding car but the driver hasn't been identified yet), but if a driver is present, they must have at least one traffic_violation record associated with them.
Both sides are optional.
Example: A traffic_violation can exist without an identified driver, and a driver can exist in the database without any registered traffic_violations.
Now, let's look at the actual SQL implementation. The magic happens with the REFERENCES keyword, which tells the database that these two tables are linked.
REFERENCES?
By marking the driver_id column as a REFERENCE to the drivers(id) column, we enable Referential Integrity.
PostgreSQL acts as a "guardian" of your data:
driver_id that doesn't exist in the drivers table, orThis prevents "orphaned" records and ensures your database never becomes a mess of disconnected information.
CASCADE
We can also configure the relationship to allow the removal of a driver even if they have records in the traffic_violations table. Using ON DELETE CASCADE will automatically remove all associated violations when a driver is deleted.
There are two ways to implement this, depending on whether the tables already exist.
Option 1: Updating an existing table
If the tables were already created as shown above, you must drop the current constraint and add a new one with the CASCADE option:
Option 2: Creating a new table
If you haven't created the traffic_violations table yet, you can include the instruction directly in the definition:
In a SQL database, Crow's Foot modality is controlled by NOT NULL constraints on the Foreign Key.
Mandatory Modality: A violation must have a driver:
Optional Modality: A violation can exist without a driver (e.g., an anonymous violation):
With the structure in place, we can load our sample data from CSV files.
When working with data visualization in Tableau, defining relationships is critical for accurate analysis. Here is how our drivers.csv and traffic_violations.csv datasets are connected.
Defining the Logical Relationship: In Tableau's Data Source tab, we drag the drivers and traffic_violations tables into the canvas. By connecting them, Tableau creates a "noodle" (logical relationship).
Mapping Foreign Keys: Here, we explicitly tell Tableau which columns form the link. We select the Primary Key (id) from the drivers source and the Foreign Key (driver_id) from the traffic_violations source. This ensures that every violation is correctly attributed to the right driver during visualization.
Performance Options (Cardinality): In the "Performance Options" settings, we can optimize the relationship. We set Cardinality to One-to-Many (One driver to Many traffic_violations). We also configure Referential Integrity to "All records match". ( "Some records match," would be appropriate if we have drivers without violations or anonymous violations.) These settings help Tableau generate more efficient SQL queries.
Verifying the Relationship: Once the relationship is established, we can easily create visualizations. For example, by dragging Id, First Name, and Last Name from the drivers table to the Rows shelf and Id from the traffic_violations table to the Text mark (in the Marks card), then setting its aggregation to Count, Tableau automatically performs the equivalent of a LEFT JOIN and a `GROUP BY`. This allows us to see exactly how many violations each driver has accumulated.
+48 790-430-860
analysislessons@gmail.com