STAT 29000: Project 7 — Spring 2021
Motivation: Being able to analyze and create good visualizations is a skill that is invaluable in many fields. It can be pretty fun too! As you probably noticed in the previous project, matplotlib
can be finicky — certain types of plots are really easy to create, while others are not. For example, you would think changing the color of a boxplot would be easy to do in matplotlib
, perhaps we just need to add an option to the function call. As it turns out, this isn’t so straightforward (as illustrated at the end of [this section](#p-matplotlib-boxplot)). Occasionally this will happen and that is when packages like seaborn
or plotnine
(both are packages built using matplotlib
) can be good. In this project we will explore this a little bit, and learn about some useful pandas
functions to help shape your data in a format that any given package requires.
Context: In the next project, we will continue to learn about and become comfortable using matplotlib
, seaborn
, and plotnine
.
Scope: python, visualizing data
Dataset
The following questions will use the dataset found in Scholar:
/class/datamine/data/apple/health/watch_dump.xml
Questions
Question 1
In an earlier project we explored some XML data in the form of an Apple Watch data dump. Most health-related apps give you some sort of graph or set of graphs as an output. Use any package you want to parse the XML data. Record
is a dense category in this dataset. Each Record
has an attribute called creationDate
. Create a barplot of the number of Records
per day. Make sure your plot is polished, containing proper labels and good colors.
You could start by parsing out the required data into a |
The |
-
Python code used to solve the problem.
-
Output from running your code (including the graphic).
Question 2
The plot in question 1 should look bimodal. Let’s focus only on the first apparent group of readings. Create a new dataframe containing only the readings for the time period from 9/1/2017 to 5/31/2019. How many Records
are there in that time period?
-
Python code used to solve the problem.
-
Output from running your code.
Question 3
It is hard to discern weekly patterns (if any) based on the graphics created so far. For the period of time in question 2, create a labeled bar plot for the count of Record
by day of the week. What (if any) discernable patterns are there? Make sure to include the labels provided below:
labels = ["Mon", "Tue", "Wed", "Thu", "Fri", "Sat", "Sun"]
-
Python code used to solve the problem.
-
Output from running your code (including the graphic).
Question 4
Create a pandas
dataframe containing the following data from watch_dump.xml
:
-
A column called
bpm
with thebpm
(beats per minute) of theInstantaneousBeatsPerMinute
. -
A column called
time
with thetime
of each individualbpm
reading inInstantaneousBeatsPerMinute
. -
A column called
date
with the date. -
A column called
dayofweek
with the day of the week.
You may want to use |
This is one way to convert the numbers 0-6 to days of the week:
|
-
Python code used to solve the problem.
-
Output from running your code.
Question 5
Create a heatmap using seaborn
, where the y-axis shows the day of the week ("Mon" - "Sun"), the x-axis shows the hour, and the values on the interior of the plot are the average bpm
by hour by day of the week.
-
Python code used to solve the problem.
-
Output from running your code (including the graphic).