Key Concepts

Review core concepts you need to learn to master this subject

dplyr package

The dplyr package provides functions that perform data manipulation operations oriented to explore and manipulate datasets. At the most basic level, the package functions refers to data manipulation “verbs” such as select, filter, mutate, arrange, summarize among others that allow to chain multiple steps in a few lines of code. The dplyr package is suitable to work with a single dataset as well as to achieve complex results in large datasets.

Introduction to Data Frames in R
Lesson 1 of 2
  1. 1
    Data lies at the heart of nearly every problem in the business world and society. Having the right tools to manipulate data and organize it in a meaningful way is integral to performing data analys…
  2. 2
    A data frame is an R object that stores tabular data in a table structure made up of rows and columns. You can think of a data frame as a spreadsheet or as a SQL table. While data frames can be cre…
  3. 3
    When working with data frames, most of the time you will load in data from an existing data set. One of the most common formats for big datasets is the CSV. CSV (comma separated values) is a t…
  4. 4
    When you have data in a CSV, you can load it into a data frame in R using readr’s read_csv() function: df <- read_csv(‘my_csv_file.csv’) * In the example above, the read_csv() function is called …
  5. 5
    When you load a new data frame from a CSV, you want to get an understanding of what the data looks like. If the data frame is small, you can display it by typing its name df. If the data frame is …
  6. 6
    One of the most appealing aspects of dplyr is the ability to easily manipulate data frames. Each of the dplyr functions you will explore takes a data frame as its first argument. The _pipe operato…
  7. 7
    Suppose you have a data frame called customers, which contains the ages of your business’s customers: |name|age|gender| |-|-|-| |Rebecca Erikson|35|F| |Thomas Roberson|28|M| |Diane Ochoa|42|NA|…
  8. 8
    Sometimes rather than specify what columns you want to select from a data frame, it’s easier to state what columns you do not want to select. dplyr’s select() function also enables you to do just t…
  9. 9
    In addition to subsetting a data frame by columns, you can also subset a data frame by rows using dplyr’s filter() function and comparison operators! Consider an orders data frame that contains dat…
  10. 10
    The filter() function also allows for more complex filtering with the help of logical operators! Take a look at the same orders data frame from the last exercise: |id|first_name|last_name|email…
  11. 11
    Sometimes all the data you want is in your data frame, but it’s all unorganized! Step in the handy dandy dplyr function arrange()! arrange() will sort the rows of a data frame in ascending order by…
  12. 12
    There you have it! With the power of readr and dplyr in your hands, you can now: * load data from a CSV into a data frame * inspect the data frame with head() and summary() * select() the columns y…

What you'll create

Portfolio projects that showcase your new skills

Pro Logo

How you'll master it

Stress-test your knowledge with quizzes that help commit syntax to memory

Pro Logo