The fitness app market is enormous, but some are clearly superior than others for a given use pattern. Jefit fits this niche perfectly for me, which makes it a pity it tries its hardest to limit exporting your own data. Only by making a backup within the app (an option unavailable on its fully fledged web interface), transferring it to a computer and then decrypting it can one access the SQL database inside.
Performing the backup and transferring it are trivial, and a huge thanks to josama who posted a .jar to decrypt the file at the Jefit support forums.
In [1]:
import pandas as pd
import numpy as np
import sqlite3
import warnings
import re
warnings.filterwarnings('ignore')
Raw export SQL below after decryption
In [2]:
conn = sqlite3.connect("jefit.bak")
cursor = conn.cursor()
df = pd.read_sql_query("SELECT * FROM sqlite_master", conn)
print(df)
This table tracks weight and other dimensional measurements. Note that whether one enters data through the app vs the website can result in not imputing previous values, resulting in a lot of 0 values.
In [3]:
df = pd.read_sql_query("SELECT * FROM profile", conn)
print(df.tail())
This table keeps track of all-time records for each exercise
In [4]:
df = pd.read_sql_query("SELECT * from exerciseRecord", conn)
print(df.head())
As can be seen below, the measurements for time based activities (planks), cardio (running), and exercises with weights all follow a different scheme. In fact, some of the time based activities can have an arbitrary number of columns that need to be parsed dynamically.
In [5]:
df = pd.read_sql_query("SELECT * from exerciseLogs", conn)
print(df.tail())
print(df[df['ename'] == 'Treadmill Running'])
Function to parse and clean dimensional measurements
In [6]:
def load_jefit_measurements(jefitbak="jefit.bak"):
# init sqlite
conn = sqlite3.connect(jefitbak)
# extract weight/bodyfat/tape measurements
profile_logs_query = "SELECT mydate,weight,fatpercent,chest,arms,waist,calves,hips, \
thighs,shoulders,neck,forearms FROM profile;"
measurements = pd.read_sql(profile_logs_query, conn, parse_dates=['mydate'])
measurements.columns = ['Date', 'Weight', 'Bodyfat', 'Chest', 'Arms', 'Waist',
'Calves', 'Hips', 'Thighs', 'Shoulders', 'Neck', 'Forearms']
measurements = measurements.set_index('Date', drop=True).sort()
# remove any rows with no data, separate into 3, and then nan missing data
measurements = measurements.loc[(measurements!=0).any(axis=1)]
bodyfat = pd.DataFrame(measurements.Bodyfat.replace(0, np.nan).dropna())
weight = pd.DataFrame(measurements.Weight.replace(0, np.nan).dropna())
measurements = measurements.drop(['Weight', 'Bodyfat'],axis=1).replace(0, np.nan).dropna()
return weight, bodyfat, measurements
weight, bodyfat, measurements = load_jefit_measurements()
In [7]:
print(weight.head())
In [8]:
print(bodyfat.head())
In [9]:
print(measurements.head())
Helper functions to clean up the different types of exercise data
In [10]:
def _time_based_df(exercises_df, time_based_list):
# extract time based exercises from full list
time_df = exercises_df.query('ExerciseName in %s' %time_based_list).set_index(['Date', 'ExerciseName']).sort()
# clean log data from 0x0.0, 0x0.0 etc to simply Time
time_df = time_df['Logs'].apply(lambda x: pd.Series(re.split('x|,', x))) # , dtype=np.float64))
# dropping empty columns
time_df.replace(0, np.nan, inplace=True)
time_df.replace(0.0, np.nan, inplace=True)
time_df = time_df.dropna(how='all', axis=1)
# count how many columns left (i.e. max # of sets done for time-based exercises)
num_cols = len(time_df.columns)
cols = ['Set' + str(idx) + 'Time' for idx in range(1, num_cols+1)]
cols.insert(0, 'ExerciseName')
# time in seconds
time_df = time_df.reset_index().set_index(['Date'])
time_df.index = pd.to_datetime(time_df.index)
time_df.columns = cols
time_df['Sets'] = 1
return time_df
In [11]:
def _cardio_based_df(exercises_df, cardio_based_list):
# extract cardio based exercises from full list
cardio_df = exercises_df.query('ExerciseName in %s' %cardio_based_list).set_index(['Date', 'ExerciseName']).sort()
# clean log data from 0x0.0, 0x0.0 etc to appropriate columns
# split , and x's using regular expressions
cardio_df = cardio_df['Logs'].apply(lambda x: pd.Series(re.split('x|,',x)))
cardio_df.drop(cardio_df.columns[[0,2,4,6,7,8]], axis=1, inplace=True)
# sometimes a random tenth column with a few nan's is present.. inexplicable
cardio_df = cardio_df.dropna(how='any', axis=1)
cardio_df = cardio_df.reset_index().set_index(['Date'])
cardio_df.index = pd.to_datetime(cardio_df.index)
# cals, miles, mph, seconds
cardio_df.columns = ['ExerciseName', 'CardioCalsBurned', 'CardioDistance', 'CardioSpeed', 'CardioTime']
cardio_df['Sets'] = 1
return cardio_df
In [12]:
def _lifts_based_df(exercises_df, non_lift_list):
# extract lift exercises, removing cardio/time based
lifts_df = exercises_df.query('ExerciseName not in %s' %non_lift_list).set_index(['Date', 'ExerciseName']).sort()
# cleaning log data from 0x0.0,0x0.0,0x0,0x30,0x... garbage to weight x reps
# first step, split by , to isolate each set
lifts_df = lifts_df['Logs'].apply(lambda x: pd.Series(re.split(',',x)))
# stack here makes separate row for each wgt x rep set
lifts_df = pd.DataFrame(lifts_df.stack())
# stacking adds another index level we don't want
lifts_df.index = lifts_df.index.droplevel(-1)
lifts_df.columns = ['Logs']
# now we split each set up into wgt x rep
lifts_df = pd.DataFrame(lifts_df.Logs.str.split('x').tolist(), columns=['Weight', 'Reps'],
index=lifts_df.index, dtype=float)
lifts_df = lifts_df.reset_index().set_index(['Date'])
lifts_df.index = pd.to_datetime(lifts_df.index)
# add sets, volume, 1rm column to all rows to help calculations later
lifts_df['Sets'] = 1
lifts_df['Volume'] = lifts_df.Weight * lifts_df.Reps
lifts_df['1RM'] = (lifts_df.Volume * 0.033) + lifts_df.Weight
return lifts_df
Combining the exercise helper functions into one call
This function will extract and transform all the exercise data from a Jefit backup file
In [13]:
def load_jefit_exercises(jefitbak="jefit.bak", cardiolist = ['Treadmill Running', 'Walking'],
timelist = ['Plank']):
# init sqlite
conn = sqlite3.connect(jefitbak)
# extract lifting/cardio/time based logs
exercise_logs_query = "SELECT mydate, ename, logs FROM exerciseLogs;"
exercises = pd.read_sql(exercise_logs_query, conn, parse_dates=['mydate'])
exercises.columns = ['Date', 'ExerciseName', 'Logs']
# clean all logs return in 3 dataframes
time_df = _time_based_df(exercises, timelist)
cardio_df = _cardio_based_df(exercises, cardiolist)
lifts_df = _lifts_based_df(exercises, (timelist+cardiolist))
# merge time_based back into lifts
lifts_df = pd.merge(lifts_df.reset_index(), time_df.reset_index(),
on=['Date', 'ExerciseName', 'Sets'], how='outer')
lifts_df = lifts_df.set_index(['Date']).sort()
# merge cardio back into lifts
lifts_df = pd.merge(lifts_df.reset_index(), cardio_df.reset_index(),
on=['Date', 'ExerciseName', 'Sets'], how='outer')
lifts_df = lifts_df.set_index(['Date']).sort()
# extract bodypart for each exercise
bodypart_query = "SELECT exercisename, bodypart FROM workOutExerciseList;"
bodyparts = pd.read_sql(bodypart_query, conn)
bodyparts.columns = ['ExerciseName', 'Bodypart']
# manually adding exercises since removed from jefit's database
bodyparts.loc[len(bodyparts)] = ['Front Two Dumbell Raise', '6']
# some duplicates present if have multiple routines with same exercise
bodyparts.drop_duplicates('ExerciseName', inplace=True)
lifts_df = pd.merge(lifts_df.reset_index(), bodyparts, on='ExerciseName',
how='left').set_index(['Date']).sort()
bodypart_dict = {0: 'Abs', 1: 'Back', 2: 'Biceps', 3: 'Chest', 4: 'Forearms',
5: 'Glutes', 6: 'Shoulders', 7: 'Triceps', 8: 'Upper Legs',
9: 'Lower Legs', 10: 'Cardio', np.nan: 'Unknown'}
lifts_df.Bodypart = lifts_df.Bodypart.replace(bodypart_dict)
lifts_df.fillna(0, inplace=True)
return lifts_df
lifts_df = load_jefit_exercises()
print(lifts_df.columns)
print(lifts_df.tail())
Finally, we encapsulate everything with the Load function, which will store all the dataframes as tables in a database
In [14]:
def loadJefit(jefitbak="jefit.bak", jefitDB="jefit_clean.db", cardiolist = ['Treadmill Running', 'Walking'], timelist = ['Plank']):
conn = sqlite3.connect(jefitDB)
lifts = load_jefit_exercises(jefitbak, cardiolist, timelist)
weight, bodyfat, measurements = load_jefit_measurements()
lifts.to_sql('lifts', conn, if_exists='replace')
weight.to_sql('weight', conn, if_exists='replace')
bodyfat.to_sql('bodyfat', conn, if_exists='replace')
measurements.to_sql('measurements', conn, if_exists='replace')
conn.commit()
conn.close()
loadJefit()
The resulting transformed and cleaned database
In [15]:
conn = sqlite3.connect("jefit_clean.db")
cursor = conn.cursor()
df = pd.read_sql_query("SELECT * FROM sqlite_master", conn)
print(df)
Confirming everything is still as it should be
In [16]:
df = pd.read_sql_query("SELECT * FROM lifts", conn)
print(df.tail(2))
Now that we finally have access to our Jefit data, we can begin having some fun deriving insights! More to follow.