Let’s say we have a column called "type"
with data entries in the format "admin_US"
or "user_Kenya"
, as shown in the table below.
id | type |
---|---|
1011 | “user_Kenya” |
1112 | “admin_US” |
1113 | “moderator_UK” |
Just like we saw before, this column actually contains two types of data. One seems to be the user type (with values like “admin” or “user”) and one seems to be the country this user is in (with values like “US” or “Kenya”).
We can no longer just split along the first 4 characters because admin
and user
are of different lengths. Instead, we know that we want to split along the "_"
. We can thus use the tidyr function separate()
to split this column into two, separate columns:
# Create the 'user_type' and 'country' columns df %>% separate(type,c('user_type','country'),'_')
type
is the column to splitc('user_type','country')
is a vector with the names of the two new columns'_'
is the character to split on
This would transform the table above into a table like:
id | type | country | usertype |
---|---|---|---|
1011 | “user_Kenya” | “Kenya” | “user” |
1112 | “admin_US” | “US” | “admin” |
1113 | “moderator_UK” | “UK” | “moderator” |
Instructions
View the head()
of students
. Notice that the students’ names are stored in a column called full_name
.
Separate the full_name
column into two new columns, first_name
and last_name
, by splitting on the ' '
character .
Provide as an extra argument to the separate()
function extra ='merge'
. This will ensure that middle names or two-word last names will all end up in the last_name
column.
Save the result to students
, and view the head()
.