Cereal Nutritional Information

Dwight · September 28, 2024, 3:39am

Data Source

Data Description

The cereal_nutrition dataset contains nutritional information about 95 cereals. The cereal_albertsons dataset contains information about 89 cereals in a grocery store.

Data Provenance and Purpose

This data was collected by me in April/May 2019. It was not collected to answer any particular real-world research question.

Nutritional information in cereal_nutrition was obtained from the websites of four cereal manufacturers (General Mills, Kellogg’s, Post, and Quaker) in April 2019.

Location and price information in cereal_albertsons was recorded at an Albertson’s in Irvine, California on May 1, 2019.

Variable Names and Descriptions

cereal_nutrition

There are 32 columns in this dataset.

Cereal.Name: the name of the cereal
Manufacturer: the company that makes the cereal
Serving.Size: the size (in g) of a single serving of cereal
Calories: the number of calories in a single serving
Total.Fat: the total amount of fat (in g) in a single serving
Saturated.Fat: the amount of saturated fat (in g) in a single serving
Sodium: the amount of sodium (in mg) in a single serving
Potassium: the amount of potassium (in mg) in a single serving
Total.Carbohydrate: the total amount of carbohydrates (in g) in a single serving
Dietary.Fiber: the amount of dietary fiber (in g) in a single serving
Sugar: the amount of sugar (in g) in a single serving, including natural and added sugars
Protein: the amount of protein (in g) in a single serving
Vitamin.A: The % recommended daily value (RDV) of Vitamin A in a single serving
Vitamin.C: The % RDV of Vitamin C in a single serving
Calcium: The % RDV of calcium in a single serving
Iron: The % RDV of iron in a single serving
Vitamin.D: the % RDV of Vitamin D in a single serving
Vitamin.E: the % RDV of Vitamin E in a single serving
Thiamin: the % RDV of thiamin (Vitamin B1) in a single serving
Riboflavin: the % RDV of riboflavin (Vitamin B2) in a single serving
Niacin: the % RDV of niacin (Vitamin B3) in a single serving
Pantothenic.Acid: the % RDV of pantothenic acid (Vitamin B5) in a single serving
Vitamin.B6: the % RDV of Vitamin B6 in a single serving
Folic.Acid: the % RDV of folic acid (folate) in a single serving
Vitamin.B12: the % RDV of Vitamin B12
in a single serving
Zinc: the % RDV of zinc in a single serving
Magnesium: the % RDV of magnesium in a single serving
Phosphorus: the % RDV of phosphorus in a single serving
Selenium: the % RDV of selenium in a single serving
Copper: the % RDV of copper in a single serving
Manganese: the % RDV of manganese in a single serving

cereal_albertsons

There are 6 columns in this dataset.

Cereal.Name: the name of the cereal
Manufacturer: the company that makes the cereal
Size: the size (in oz) of a regular-sized box
Shelf: the shelf (1 = bottom, 5 = top) on which the box’s price tag was located
Location: the location along the aisle (1 = closest to front of store, 12 = closest to back of store) in which the box’s price tag was located
Price: the regular price (in dollars) of the box of cereal

Classroom Uses

Data Science Content

Merging Data: These datasets are useful for illustrating the difference between left outer join, right outer join, full outer join, and inner join. It helps that you get a different number of rows after each type of join.
Exploratory Data Analysis: It is fairly easy to prompt data science students to form their own initial questions and practice both the thought process and coding process of exploratory data analysis.
Variable Transformation: I ask students to repeat their analysis after normalizing to 1 ounce of cereal (instead of 1 serving) and see how the results change.
Clustering: the cereal_nutrition dataset is (just barely) small enough that you can get decent-looking dendrograms out of hierarchical clustering, and students enjoy seeing what cereals are “most similar” to other cereals. That dataset is also extremely useful for illustrating what happens when you perform k-means clustering without normalizing appropriately.

Content-with-Context

There is little background context that students need to understand to start working with this dataset, as most of them have at least heard of some of the cereals.

I scrapped an activity asking students to use this data to devise a nutritional ranking system, such as that developed by the Rudd Center for Food Policy & Obesity, but I might revisit the activity again.

Culturally Responsive Pedagogy

I have seen students spontaneously discuss what they ate for breakfast as children while they’re working with this data. Students who grew up outside the United States contribute meaningfully to these discussions as well. With more intention in my lesson plan, this dataset would definitely support the engage and value identities and support deep learning elements of CRP.

prusmevichientong · September 29, 2024, 2:52am

This dataset is ideal for my student’s honors research project, which investigates manipulative marketing tactics and ethical concerns in the ultra-processed breakfast cereal industry targeting children. It offers critical supplemental evidence for data analysis. Thank you for sharing!

bdruken · November 4, 2024, 4:24pm

Hi Dwight! Love this.

Is this Kaggle data set the same thing?

Dwight · November 5, 2024, 5:00am

Nope! I’m pretty sure that cereal dataset is about 30 years old and has a bunch of cereals that students have never heard of. You can think of this as an “updated” version of that dataset.

bdruken · November 6, 2024, 5:39pm

Very cool. I made a CODAP link for these two data sets and added the background information.

Check it out!
https://codap.concord.org/app/static/dg/en/cert/index.html#shared=https%3A%2F%2Fcfm-shared.concord.org%2FhJiXYwnoeow9ydZnnhXR%2Ffile.json