This article gives an overview of exploratory data analysis (EDA). Many people associate data science with fields like machine learning and artificial intelligence, but EDA often takes up a larger percentage a data scientist’s day-to-day work! This is because:

  • Before fitting any sort of machine learning model, it is important to inspect a dataset and get to know it! Often the best way to improve a model is to spend more time thinking about the data itself. EDA can help you make decisions about what data to include, exclude, or transform.
  • Sometimes a data scientist does not plan to fit a predictive model to their data at all. Instead, their goal may be to inspect and analyze existing data to answer questions like: What proportion of visitors to a website made a purchase? Or, how has the purchase rate changed over the last 6 months?

This article gives a more formal definition of EDA and describes some of the techniques involved.

