Cereal Nutritional Information

Data Source

cereal-nutrition.csv (9.7 KB)

cereal_albertsons.csv (3.8 KB)

Data Description

The cereal_nutrition dataset contains nutritional information about 95 cereals. The cereal_albertsons dataset contains information about 89 cereals in a grocery store.

Data Provenance and Purpose

This data was collected by me in April/May 2019. It was not collected to answer any particular real-world research question.

Nutritional information in cereal_nutrition was obtained from the websites of four cereal manufacturers (General Mills, Kellogg’s, Post, and Quaker) in April 2019.

Location and price information in cereal_albertsons was recorded at an Albertson’s in Irvine, California on May 1, 2019.

Variable Names and Descriptions

cereal_nutrition

There are 32 columns in this dataset.

  • Cereal.Name: the name of the cereal
  • Manufacturer: the company that makes the cereal
  • Serving.Size: the size (in g) of a single serving of cereal
  • Calories: the number of calories in a single serving
  • Total.Fat: the total amount of fat (in g) in a single serving
  • Saturated.Fat: the amount of saturated fat (in g) in a single serving
  • Sodium: the amount of sodium (in mg) in a single serving
  • Potassium: the amount of potassium (in mg) in a single serving
  • Total.Carbohydrate: the total amount of carbohydrates (in g) in a single serving
  • Dietary.Fiber: the amount of dietary fiber (in g) in a single serving
  • Sugar: the amount of sugar (in g) in a single serving, including natural and added sugars
  • Protein: the amount of protein (in g) in a single serving
  • Vitamin.A: The % recommended daily value (RDV) of Vitamin A in a single serving
  • Vitamin.C: The % RDV of Vitamin C in a single serving
  • Calcium: The % RDV of calcium in a single serving
  • Iron: The % RDV of iron in a single serving
  • Vitamin.D: the % RDV of Vitamin D in a single serving
  • Vitamin.E: the % RDV of Vitamin E in a single serving
  • Thiamin: the % RDV of thiamin (Vitamin B1) in a single serving
  • Riboflavin: the % RDV of riboflavin (Vitamin B2) in a single serving
  • Niacin: the % RDV of niacin (Vitamin B3) in a single serving
  • Pantothenic.Acid: the % RDV of pantothenic acid (Vitamin B5) in a single serving
  • Vitamin.B6: the % RDV of Vitamin B6 in a single serving
  • Folic.Acid: the % RDV of folic acid (folate) in a single serving
  • Vitamin.B12: the % RDV of Vitamin B12
    in a single serving
  • Zinc: the % RDV of zinc in a single serving
  • Magnesium: the % RDV of magnesium in a single serving
  • Phosphorus: the % RDV of phosphorus in a single serving
  • Selenium: the % RDV of selenium in a single serving
  • Copper: the % RDV of copper in a single serving
  • Manganese: the % RDV of manganese in a single serving

cereal_albertsons

There are 6 columns in this dataset.

  • Cereal.Name: the name of the cereal
  • Manufacturer: the company that makes the cereal
  • Size: the size (in oz) of a regular-sized box
  • Shelf: the shelf (1 = bottom, 5 = top) on which the box’s price tag was located
  • Location: the location along the aisle (1 = closest to front of store, 12 = closest to back of store) in which the box’s price tag was located
  • Price: the regular price (in dollars) of the box of cereal

Classroom Uses

Data Science Content

  • Merging Data: These datasets are useful for illustrating the difference between left outer join, right outer join, full outer join, and inner join. It helps that you get a different number of rows after each type of join.
  • Exploratory Data Analysis: It is fairly easy to prompt data science students to form their own initial questions and practice both the thought process and coding process of exploratory data analysis.
  • Variable Transformation: I ask students to repeat their analysis after normalizing to 1 ounce of cereal (instead of 1 serving) and see how the results change.
  • Clustering: the cereal_nutrition dataset is (just barely) small enough that you can get decent-looking dendrograms out of hierarchical clustering, and students enjoy seeing what cereals are “most similar” to other cereals. That dataset is also extremely useful for illustrating what happens when you perform k-means clustering without normalizing appropriately.

Content-with-Context

There is little background context that students need to understand to start working with this dataset, as most of them have at least heard of some of the cereals.

I scrapped an activity asking students to use this data to devise a nutritional ranking system, such as that developed by the Rudd Center for Food Policy & Obesity, but I might revisit the activity again.

Culturally Responsive Pedagogy

I have seen students spontaneously discuss what they ate for breakfast as children while they’re working with this data. Students who grew up outside the United States contribute meaningfully to these discussions as well. With more intention in my lesson plan, this dataset would definitely support the engage and value identities and support deep learning elements of CRP.

1 Like

This dataset is ideal for my student’s honors research project, which investigates manipulative marketing tactics and ethical concerns in the ultra-processed breakfast cereal industry targeting children. It offers critical supplemental evidence for data analysis. Thank you for sharing!

Hi Dwight! Love this.

Is this Kaggle data set the same thing?

Nope! I’m pretty sure that cereal dataset is about 30 years old and has a bunch of cereals that students have never heard of. You can think of this as an “updated” version of that dataset.

1 Like

Very cool. I made a CODAP link for these two data sets and added the background information.

Check it out!
https://codap.concord.org/app/static/dg/en/cert/index.html#shared=https%3A%2F%2Fcfm-shared.concord.org%2FhJiXYwnoeow9ydZnnhXR%2Ffile.json

1 Like