In the process of adding old and new material.

Projects

Unsupervised Machine Learning to Quantify Exercise Performance

Unsupervised Machine Learning to Quantify Exercise Performance
[Note: This builds on previous work, scraping MyFitnessPal here, importing and cleaning Jefit data here, and creating a class to merge and deliver basic analysis on both here] We'd like compare the effects of nutrition on exercise performance. Firstly, we must define what constitutes "good" performance and how we can extract it from exercise logs...

Read more ...

Nutrition and Exercise Habits – Basic Analysis

Nutrition and Exercise Habits – Basic Analysis
Click here to skip the code and see results. The below code will import previously manicured data to offer quantitative advice regarding dietary and exercise logs, as well as draw some descriptive conclusions about our general habits. (see https://josetorres.us/data-science/jefit-etl-with-python/ and https://josetorres.us/data-science/scraping-myfitnesspal-with-python/ for data acquisition, cleaning, and warehousing steps.) In [1]: %matplotlib inline import sqlite3 # importing databases import numpy...

Read more ...

Scraping MyFitnessPal with Python

Scraping MyFitnessPal with Python
UPDATE: Updated some methods for recent MFP site changes including JavaScript handling. MyFitnessPal is a great website and app to log nutritional intake and other health metrics. To really delve deep into the numbers and find patterns, I wanted to import the data into Python. In sum the below code will return an SQL database...

Read more ...

Jefit ETL with Python

Jefit ETL with Python
The fitness app market is enormous, but some are clearly superior than others for a given use pattern. Jefit fits this niche perfectly for me, which makes it a pity it tries its hardest to limit exporting your own data. Only by making a backup within the app (an option unavailable on its fully fledged...

Read more ...

Using P-Ratio to Plan a Diet with Python and Excel

Using P-Ratio to Plan a Diet with Python and Excel
[To access the P-Ratio Excel spreadsheet associated with this, click here.] As a very goal oriented person, it helps me to define a structure and a finish line, especially when it comes to something as mentally difficult as dieting. When we estimate how much weight we want to lose, we ballpark without concern to the...

Read more ...

Increasing GPS Accuracy using Accelerometer Data I – Creating an Android App to Record Data

Increasing GPS Accuracy using Accelerometer Data I – Creating an Android App to Record Data
https://www.youtube.com/watch?v=F2_alblwOGM If you've ever looked at the path drawn by your phone's GPS you might have realized there are some inaccurate periods, usually when there are few or no cell towers around to help triangulate. This problem is exacerbated when out in the wilderness, where there can be dense foliage and little reception. I first...

Read more ...

Lil Bits Tiny and tiny and fits right in

Analyzing Pedometer Data with R

Analyzing Pedometer Data with R
Loading and preprocessing the data library(dplyr) data <- read.csv("activity.csv") What is mean total number of steps taken per day? We first group the data by date, then collapse the steps rows into each day using the sum function as below: grouped.by.day <- group_by(data, date) steps.daily.summed <- summarise_each(grouped.by.day, funs(sum)) barplot(steps.daily.summed$steps, names.arg = steps.daily.summed$date, xlab = 'Date',...

Read more ...

Economic Effects of Weather Events with R

Economic Effects of Weather Events with R
Synopsis Using data from the NOAA Storm Database ranging from 1950 to November 2011, we review what effect different types of weather events have on the United States. We compare what weather events are most harmful to the health of the population, in terms of injuries and fatalities, in total number (sum) and per weather...

Read more ...

Classifying via Random Forests with R

Classifying via Random Forests with R
Predicting Exercise Form via accelerometer data Synopsis We analyze a data set containing accelerometer measurements by which a type of dumbbell curl is classified. With over 19000 observations in our training set, 159 predictors each, and 5 potential classifications, model choice is a large factor in predictive performance. Using a random forest implementation, we achieve...

Read more ...

Automatic vs Manual for better MPG with R

Automatic vs Manual for better MPG with R
Executive Summary Using the mtcars dataset, we explore the relationship between several variables and gas mileage in the form of miles per gallon (MPG). We cannot deliver a solid conclusion on whether transmission type is the causal factor in MPG, as the coefficient sign flipped depending what covariates were included in a linear regression. If...

Read more ...

Inferential Analysis of Supplement Efficacy with R

Inferential Analysis of Supplement Efficacy with R
[latexpage] Inferential Data Analysis with ToothGrowth Dataset Exploratory Analysis summary(data) ## len supp dose ## Min. : 4.2 OJ:30 Min. :0.50 ## 1st Qu.:13.1 VC:30 1st Qu.:0.50 ## Median :19.2 Median :1.00 ## Mean :18.8 Mean :1.17 ## 3rd Qu.:25.3 3rd Qu.:2.00 ## Max. :33.9 Max. :2.00 We can see that there are three columns

Read more ...

Verifying the Central Limit Theorem with R

Verifying the Central Limit Theorem with R
[latexpage] We know the mean and variance of an exponential distribution are $\frac{1}{\lambda}$ and $\frac{1}{\lambda^2}$. Given $\lambda$ = 0.2 and n = 40, this implies our mean should be $\frac{1}{0.2} = 5 = \mu$ and variance $\frac{1}{0.2^2} = 25 = \sigma^2$. In a given sampling distribution of the mean we expect the mean to remain the...

Read more ...