Skip to Content
Learn
Joining Tables in R
Join on Specific Columns I

In the previous example, the inner_join() function “knew” how to combine tables based on the columns that were the same between two tables. For instance, orders and customers both had a column called customer_id. This won’t always be true when we want to perform a join.

Generally, the orders data frame would not have the column order_id and the customers data frame would not have the column customer_id. Instead, they would both have a column id and it would be implied that the id was the order_id for the orders table and the customer_id for the customers table. They would look like this:

Orders

idcustomer_idproduct_idquantitytimestamp
12312017-01-01
22232017-01-01
33112017-01-01
43222017-02-01
53332017-02-01
61422017-03-01
71112017-02-02
81412017-02-02

Customers

idcustomer_nameaddressphone_number
1John Smith123 Main St.212-123-4567
2Jane Doe456 Park Ave.949-867-5309
3Joe Schmo798 Broadway112-358-1321

How would this affect our joins?

Because the id columns would mean something different in each table, our default joins would be wrong.

One way that we could address this problem is to use the dplyr function rename() to rename the columns for our joins. In the example below, we will rename the column id in the customers data frame to customer_id, so that orders and customers now have a common column to join on.

customers <- customers %>% rename(customer_id = id) inner_join(orders, customers)

Instructions

1.

The id column of products stores the same information as the product_id column of orders. Rename the id column of products to product_id. Save the updated data frame to products.

2.

Join orders and products. Save the result to the variable orders_products, and view it.

Folder Icon

Take this course for free

Already have an account?