Getting Started with GPS Data Analysis in Python

Possibly the last thing the world needs is another gps analysis tool, but I have been recording my bike rides, hikes, walks, runs, and even canoe and 4×4 trips consistently since 2015, so it seems like a good data set to work from to get beyond the basics in Python, data conditioning, analysis, and visualization.

Understanding the Data Source

Whether recorded using your phone, sports watch, bike computer, or handheld, most services will output GPS data in the form of a .gpx file. I use a web service called Tapiriik to export all activities I upload to RideWithGPS, Strava, or Garmin Connect to a folder in Dropbox, so I’ve already got a complete set of .gpx files to work with.

Here’s a sample of the .gpx file we’ll be working from:

<?xml version='1.0' encoding='UTF-8'?> <gpx xmlns:gpxtpx="http://www.garmin.com/xmlschemas/TrackPointExtension/v1" xmlns:gpxext="http://www.garmin.com/xmlschemas/GpxExtensions/v3" xmlns:gpxdata="http://www.cluetrust.com/XML/GPXDATA/1/0" xmlns="http://www.topografix.com/GPX/1/1" creator="tapiriik-sync"> <metadata> <name>Midnight Massacre singlespeed </name> </metadata> <trk> <name>Midnight Massacre singlespeed </name> <trkseg> <trkpt lat="33.48812802" lon="-97.16527113"> <time>2017-08-05T23:59:57+00:00</time> <ele>220.0</ele> </trkpt> <trkpt lat="33.48812882" lon="-97.16526836"> <time>2017-08-05T23:59:59+00:00</time> <ele>220.0</ele> </trkpt> ... <trkpt lat="33.48796018" lon="-97.16536649"> <time>2017-08-06T02:17:50+00:00</time> <ele>219.8</ele> </trkpt> <trkpt lat="33.48797073" lon="-97.16536446"> <time>2017-08-06T02:17:52+00:00</time> <ele>219.8</ele> </trkpt> </trkseg> </trk> </gpx>
Code language: HTML, XML (xml)

You can see it’s just an XML document with a standardized format for representing waypoints, tracks, and routes. The basic component of the GPX file is the <trkpt> or “track point”.

At minimum, the <trkpt> includes longitude, latitude, and a timestamp. It may also include elevation, either from a barometer on the device or added in post-processing from a digital elevation model.

My Garmin Vivoactive 3 also includes heart rate from the onboard optical heart rate sensor, and cadence from an additional ant+ sensor on the crank-arm of my bike. These attributes are stored in a sub-level of the <trkpt> called <extensions>.

The snippet below is from a different file that includes these attributes.

<trkpt lat="32.845418" lon="-96.704896"> <time>2020-07-27T13:17:37+00:00</time> <ele>145.0</ele> <extensions> <gpxtpx:TrackPointExtension> <gpxtpx:hr>75</gpxtpx:hr> <gpxtpx:cad>0</gpxtpx:cad> </gpxtpx:TrackPointExtension> </extensions> </trkpt>
Code language: HTML, XML (xml)

For this exercise, we’ll stick to an older file recorded with my phone and won’t be dealing with the extra data types yet.

I followed along with these articles before doing my own thing here:

Loading the Data

Since GPX files are just XML, we could write a script to parse the records ourselves, but for this we’ll be using the gpxpy library. gpxpy creates a “gpx object”, a nested set of arrays for tracks, segments, and points, respectively. The points array contains the data from <trkpt> as attributes: longitude, latitude, elevation, timestamp, and extension. points.extension contains the additional attributes and is accessed with:

ext = point.extensions[0].getchildren()[0] hr = int(ext.text)
Code language: Python (python)

We’ll pour the data from the gpx object into a Pandas dataframe object, a data table or lightweight database held in memory with lots of useful high-level functions for data manipulation and analysis.

import gpxpy import pandas as pd gpx_file = open('2017-08-05_Midnight Massacre singlespeed _Cycling.gpx', 'r') # open the file in read mode gpx_data = gpxpy.parse(gpx_file)
Code language: Python (python)

Let’s have a look at the gpx object gpx_data and see how data is organized within it.

len_tracks = len(gpx_data.tracks) len_segments = len(gpx_data.tracks[0].segments) len_points = len(gpx_data.tracks[0].segments[0].points) print(f"Tracks: {len_tracks} \nSegments: {len_segments} \nPoints: {len_points} \n")
Code language: Python (python)
Tracks: 1 Segments: 1 Points: 3774
Code language: HTTP (http)

Looks like there is only 1 Track with 1 Segment with 3774 Points.

Since the point data is burried under an unweildy path, lets make a variable as a shortcut to it. Then we’ll iterate through the points and append them to a dataframe.

gpx_points = gpx_data.tracks[0].segments[0].points df = pd.DataFrame(columns=['lon', 'lat', 'elev', 'time']) # create a new Pandas dataframe object with give column names # loop through the points and append their attributes to the dataframe for point in gpx_points: df = df.append({'lon' : point.longitude, 'lat' : point.latitude, 'elev' : point.elevation, 'time' : point.time}, ignore_index=True)
Code language: Python (python)

Exploratory Data Analysis & Cleanup

Now that that the data is in a dataframe, it’s ready for some basic Exploratory Data Analysis to get a feel for the data set, how it’s organized, and see if any cleanup is needed. Since the data came from a well-formatted XML file, we know its structure, but for the sake of practice lets run through a few things anyway.

I’ll start with a look at the first n rows of the data with df.head(n) and a couple of quick summaries of the dataframe with df.info() and df.describe().

print(f"df.head(10) displays the first 10 rows of the dataframe:\n\n{df.head(10)}\n\n") print(f"df.info() gives a summary of the dataframe's contents and data-types:\n") print(df.info(), "\n\n") # I had to pull this out separately because it was outputting the info table before the text, no idea why. print(f"df.describe() gives a basic statistical summary of the numerical data:\n\n{df.describe()}")
Code language: Python profile (profile)
df.head(10) displays the first 10 rows of the dataframe: lon lat elev time 0 -97.165271 33.488128 220.0 2017-08-05 23:59:57+00:00 1 -97.165268 33.488129 220.0 2017-08-05 23:59:59+00:00 2 -97.165268 33.488129 220.0 2017-08-06 00:00:01+00:00 3 -97.165269 33.488129 220.0 2017-08-06 00:00:03+00:00 4 -97.165269 33.488130 220.0 2017-08-06 00:00:05+00:00 5 -97.165269 33.488129 220.0 2017-08-06 00:00:07+00:00 6 -97.165268 33.488130 220.0 2017-08-06 00:00:09+00:00 7 -97.165267 33.488127 220.0 2017-08-06 00:00:11+00:00 8 -97.165264 33.488127 220.0 2017-08-06 00:00:13+00:00 9 -97.165261 33.488125 220.0 2017-08-06 00:00:15+00:00 df.info() gives a summary of the dataframe's contents and data-types: <class 'pandas.core.frame.DataFrame'> RangeIndex: 3774 entries, 0 to 3773 Data columns (total 4 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 lon 3774 non-null float64 1 lat 3774 non-null float64 2 elev 3774 non-null float64 3 time 3774 non-null datetime64[ns, SimpleTZ("Z")] dtypes: datetime64[ns, SimpleTZ("Z")](1), float64(3) memory usage: 118.0 KB None df.describe() gives a basic statistical summary of the numerical data: lon lat elev count 3774.000000 3774.000000 3774.000000 mean -97.165497 33.537495 229.502676 std 0.022932 0.028110 9.971957 min -97.203490 33.487960 198.600000 25% -97.182061 33.514003 222.300000 50% -97.165627 33.532457 229.500000 75% -97.148153 33.559099 236.000000 max -97.119257 33.589732 255.600000
Code language: HTML, XML (xml)

From this we can see that all columns have 3774 rows with non-null values. Good! That means we don’t have any NaN or other issues to clean up.

One thing that we do need to fix is the “time” datatype. Pandas is usually pretty good at recognizing datatypes. If it doesn’t, the default assignment is “object”, the same type it assigns to string data. Since this is fundamentally time-series data, we definitely want that to be correctly assigned.

We can use the .astype() function in Pandas to tell it that the data in the “time” column is of type datetime64.

Also, I know that the ride started at about 19:00 CST, but the timestamp on the first trackpoint shows 24:00. The extra +00:00 at the end of each timestamp tells us that the time is in UST, so we’ll also need to correct the records in the “time” column to CST.

The Pandas function .tz_convert() will handle that for us. Here’s a good guide to handling time data in Pandas.

And finally, timezone aware timestamps can cause some issues with calculations, so we’ll want to strip away the timezone indicator with tz.localize().

Note to self: In the future it would be good to use the coordinates of the first point to determine time zone and apply the correction automatically.

Note to reader: Also, Pandas seems to intermittently recognize the time data and apply the correct datatype. I left the .astype() example in case yours doesn’t.

# df['time'].astype('datetime64') df['time'] = df['time'].dt.tz_convert('US/Central') df['time'] = df['time'].dt.tz_localize(None) df.info() df.head(10)
Code language: PHP (php)
<class 'pandas.core.frame.DataFrame'> RangeIndex: 3774 entries, 0 to 3773 Data columns (total 4 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 lon 3774 non-null float64 1 lat 3774 non-null float64 2 elev 3774 non-null float64 3 time 3774 non-null datetime64[ns] dtypes: datetime64[ns](1), float64(3) memory usage: 118.0 KB lon lat elev time 0 -97.165271 33.488128 220.0 2017-08-05 18:59:57 1 -97.165268 33.488129 220.0 2017-08-05 18:59:59 2 -97.165268 33.488129 220.0 2017-08-05 19:00:01 3 -97.165269 33.488129 220.0 2017-08-05 19:00:03 4 -97.165269 33.488130 220.0 2017-08-05 19:00:05 5 -97.165269 33.488129 220.0 2017-08-05 19:00:07 6 -97.165268 33.488130 220.0 2017-08-05 19:00:09 7 -97.165267 33.488127 220.0 2017-08-05 19:00:11 8 -97.165264 33.488127 220.0 2017-08-05 19:00:13 9 -97.165261 33.488125 220.0 2017-08-05 19:00:15
Code language: HTML, XML (xml)

Basic Visualization

The data is now loaded into a dataframe and basic conditioning and cleanup are finished. Let’s see what it looks like!

We’ll start with some basic plots using plotly.express.

References:

import plotly.express as px
Code language: JavaScript (javascript)

Plotting the Longitude vs Latitude will give a pretty good picture of the track. Since this is a simple XY scatter, it has no notion of coordinate systems, projections, or the ellipsoidal shape of the Earth. Those factors will be important later.

fig_1 = px.scatter(df, x='lon', y='lat', template='plotly_dark') fig_1.show()
Code language: JavaScript (javascript)
Figure 1. Static png of simple longitude vs latitude plot of gps data using plotly express.

Now let’s look at Elevation vs Time.

Since this ride was a loop back to the start, the elevation should end pretty close to where it started. My phone doesn’t have it’s own barometer so the elevation data will have come from RideWithGPS’s digital elevation model. DEM-based elevations should align at the start-finish, so if there’s a disparity, that could point to a problem in the dataset.

fig_2 = px.line(df, x='time', y='elev', template='plotly_dark') fig_2.show()
Code language: JavaScript (javascript)
Figure 2. Static png of simple time vs elevation plot of gps data using plotly express.

Looks like it lines up well.

How about something fancy-ish?

fig_3 = px.scatter_3d(df, x='lon', y='lat', z='elev', color='elev', template='plotly_dark') fig_3.update_traces(marker=dict(size=2), selector=dict(mode='markers')) fig_3.show()
Code language: JavaScript (javascript)
Figure 3. Static png of 3D scatter plot displaying longitude, latitude, and elevation with plotly express.

THAT is pretty damn cool.

References:

Okay just one more because I ran across it while looking up how to do the plots above.

fig_4 = px.line_mapbox(df, lat='lat', lon='lon', hover_name='time', mapbox_style="open-street-map", zoom=11) fig_4.update_layout(margin={"r":0,"t":0,"l":0,"b":0}) fig_4.show()
Code language: JavaScript (javascript)
Figure 4. Static png of gps data displayed on OpenStreetMap basemap using plotly express and mapbox.

I really cannot express how satisfying that is. A few lines of code and we’ve got GPX track data on a map.

Okay time to get into it for real and deal with distance and speed.

But first…

The Earth is Not Flat

In a simple flat plane, the good ol’ distance formula d = sqrt((x2-x1)^2 + (y2-y1)^2) is all we need to calculate distance, then delta_d/delta_t gives us speed.

Latitude and Longitude represent two points on a spherical surface with respect to a given reference point, or “datum”.

To find the distance between any two points on the surface of a sphere, we must consider that the path is not a straight line but rather follows a circular path across the sphere’s surface. This is known as the “great-circle distance”.

Imagine a circle drawn on the surface of the sphere, with the same center, and passing through both points, dividing the sphere into equal halves. This is a “great-circle”.

The length of the segment of the circle between the two points is the “great-circle distance”.

The Haversine formula is used to calculate the the great-circle distance between two points on a sphere from latitude and longitude.

Great circle distance diagram from Wikipedia.
Diagram of great-circle distance concept from Wikipedia.

However, the Earth is not quite spherical.

Due to its rotation, the equator bulges out slightly forming an oblate spheroid. This shape is approximated by an ellipsoid.

The current standard reference ellipsoid is the World Geodetic System model defined in 1984, known as WGS 84.

Calculating the distance between two points on an ellipsoid is significantly more complex than on a sphere. Formulae developed by Vincenty in 1975 handled the problem well, with some exceptions, and were the standard until improved by Karney in 2013.

WGS 84 reference frame diagram from Wikipedia.

The latitude and longitude reported by our GPS are geodetic coordinates for points on the surface of the WGS 84 reference ellipsoid.

The coordinates of the same points on a different reference ellipsoid, North American Datum 1927 (NAD27), for example, would be different, because the NAD 27 reference ellipsoid is a slightly different shape.

It’s important to understand this because any algorithm operating on these coordinates must reference the correct ellipsoid model, as does any mapping application.

The geopy package contains methods to calculate the distance using either the spherical method (Haversine) or the geodesic method (Vincenty/Karney) with your choice of reference ellipsoid.

Accounting for elevation change

If we move X miles horizontally on a flat surface, we will have travelled X miles. If that surface were not flat, and we gained some Z miles of elevation, our true distance travelled is the hypotenuse of a triangle with sides X and Z.

The latitude and longitude only give us the location on the surface of smooth ellipsoid. The GPS track we’re analyzing contains elevation data and, as we saw in the Elevation vs Time plot, it is constantly changing. What is the effect of elevation on the total distance?

Since the distance between points in our GPS track is known to be very small, we can safely ignore the complication of Earth’s curvature and work out the added distance from elevation change with the Euclidean distance formula.

Calculating Distance and speed

To calculate distance and speed we will first need to calculate the distance dist for each segment between a point and the previous point. We will also need the change in elevation, delta_elev, to calculate the contribution of elevation to the distance and the length of time between records, delta_time, which we’ll use for calculating instantaneous speed.

Out of curiosity, lets run the distance calculations with both spherical and geodesic models to see how different they really are at this scale.

from geopy import distance from math import sqrt, floor import datetime # first create lists to store the results, these will be appended ot the dataframe at the end # Note: i'll be working from the gps_points object directly and then appending results into the dataframe. It would make a lot more sense to operate directly on the dataframe. delta_elev = [0] # change in elevation between records delta_time = [0] # time interval between records delta_sph2d = [0] # segment distance from spherical geometry only delta_sph3d = [0] # segment distance from spherical geometry, adjusted for elevation dist_sph2d = [0] # cumulative distance from spherical geometry only dist_sph3d = [0] # cumulative distance from spherical geometry, adjusted for elevation delta_geo2d = [0] # segment distance from geodesic method only delta_geo3d = [0] # segment distance from geodesic method, adjusted for elevation dist_geo2d = [0] # cumulative distance from geodesic method only dist_geo3d = [0] # cumulative distance from geodesic method, adjusted for elevation for idx in range(1, len(gpx_points)): # index will count from 1 to lenght of dataframe, beginning with the second row start = gpx_points[idx-1] end = gpx_points[idx] # elevation temp_delta_elev = end.elevation - start.elevation delta_elev.append(temp_delta_elev) # time temp_delta_time = (end.time - start.time).total_seconds() delta_time.append(temp_delta_time) # distance from spherical model temp_delta_sph2d = distance.great_circle((start.latitude, start.longitude), (end.latitude, end.longitude)).m delta_sph2d.append(temp_delta_sph2d) dist_sph2d.append(dist_sph2d[-1] + temp_delta_sph2d) temp_delta_sph3d = sqrt(temp_delta_sph2d**2 + temp_delta_elev**2) delta_sph3d.append(temp_delta_sph3d) dist_sph3d.append(dist_sph3d[-1] + temp_delta_sph3d) # distance from geodesic model temp_delta_geo2d = distance.distance((start.latitude, start.longitude), (end.latitude, end.longitude)).m delta_geo2d.append(temp_delta_geo2d) dist_geo2d.append(dist_geo2d[-1] + temp_delta_geo2d) temp_delta_geo3d = sqrt(temp_delta_geo2d**2 + temp_delta_elev**2) delta_geo3d.append(temp_delta_geo3d) dist_geo3d.append(dist_geo3d[-1] + temp_delta_geo3d) # dump the lists into the dataframe df['delta_elev'] = delta_elev df['delta_time'] = delta_time df['delta_sph2d'] = delta_sph2d df['delta_sph3d'] = delta_sph3d df['dist_sph2d'] = dist_sph2d df['dist_sph3d'] = dist_sph3d df['delta_geo2d'] = delta_geo2d df['delta_geo3d'] = delta_geo3d df['dist_geo2d'] = dist_geo2d df['dist_geo3d'] = dist_geo3d # check bulk results {floor(sum(delta_time)/60)}min {int(sum(delta_time)%60)}sec \n print(f"Spherical Distance 2D: {dist_sph2d[-1]/1000}km \nSpherical Distance 3D: {dist_sph3d[-1]/1000}km \nElevation Correction: {(dist_sph3d[-1]) - (dist_sph2d[-1])} meters \nGeodesic Distance 2D: {dist_geo2d[-1]/1000}km \nGeodesic Distance 3D: {dist_geo3d[-1]/1000}km \nElevation Correction: {(dist_geo3d[-1]) - (dist_geo2d[-1])} meters \nModel Difference: {(dist_geo3d[-1]) - (dist_sph3d[-1])} meters \nTotal Time: {str(datetime.timedelta(seconds=sum(delta_time)))}")
Code language: Python (python)
Spherical Distance 2D: 51.73662623269029km Spherical Distance 3D: 51.74915725516392km Elevation Correction: 12.53102247363131 meters Geodesic Distance 2D: 51.740080195107666km Geodesic Distance 3D: 51.752603641615416km Elevation Correction: 12.523446507751942 meters Model Difference: 3.44638645149098 meters Total Time: 2:17:55
Code language: CSS (css)

Okay, that is definitely not the most efficient way to do that, but it’ll do for now.

As a quick check, our Total Time is exactly the same as what RWGPS reported. That’s good further confirmation that the dataset is behaving and doesn’t have anything weird going on.

Did the elevation correction make much of a difference?

Not really, but this course also had a maximum elevation difference of around 60 meters. On a course with more topography, I would expect it to be more meaningful, but we can check that another time.

Was there a significant difference between models?

Again, not really, just 3 meters over ~52,000 meters travelled (0.006%). Our points are quite close together relative to the curvature of either model, so it is probably reasonable to stick with the much simpler great-circle distance. This will be hugely beneficial when running the calculation over 5 years of GPS records.

Elevation Change

fig_6 = px.line(df, x='dist_geo3d', y='delta_elev', template='plotly_dark') fig_6.show()
Code language: JavaScript (javascript)
Figure 6. Static png of simple plot of distance vs elevation using plotly express.

Here’s a plot of the Elevation Change vs Distance. We saw that the elevation correction was fairly trivial (in this context), just 12 meters over almost 52 kilometers. The greatest elevation difference between two consecutive points was just 2.6 meters. The distance between points would have to be similarly small for that to have a significant effect.

Moving Time

We have Total Activity Time, but what about Moving Time? Most activity analysis apps offer this metric. RideWithGPS says my moving time for this ride was 2:03:05. It is cutting 00:14:50 off the time by dropping records with instantaneous speed below some threshold.

Lets look at a Time vs Distance plot and see if the stationary periods stand out.

fig_5 = px.line(df, x='time', y='dist_geo3d', template='plotly_dark') fig_5.show()
Code language: JavaScript (javascript)
Figure 5. Static png of simple plot of time vs distance using plotly express.

Yep, every pause is seen as a jog in the Distance vs Time plot (Time continued to increase but Distance did not).

Let’s make a new column with the instantaneous speed in meters per second at each data point and check out a histogram of the results. Then we’ll calculate the total moving time by excluding records below a thresholve value. We can run this test for a series of thresholds and see which aligns with RideWithGPS.

df['inst_mps'] = df['delta_geo3d'] / df['delta_time'] fig_8 = px.histogram(df, x='inst_mps', template='plotly_dark') fig_8.update_traces(xbins=dict(start=0, end=12, size=0.1)) fig_8.show()
Code language: JavaScript (javascript)
Figure 8. Static png showing histogram of instantaneous speed (m/s) using plotly express.

I love a Gausian Distribution. Let’s lop off that stoppage time at the low end and balance it out. Average human walking speed is about 1.4 meters per second. I’ve had rides that required some walking, so the threshold must be below that.

Note: Python’s range() won’t accept floats, so I brought in numpy to handle it rather than write a list of thresholds manually

import numpy as np for threshold in np.arange(0.1, 2.1, 0.1): # delta-distance thresholds from 0 to 2 meters by 0.1 meters stop_time = sum(df[df['inst_mps'] < threshold]['delta_time']) print(f"Distance Threshold: {round(threshold,2)}m ==> {str(datetime.timedelta(seconds=stop_time))}")
Code language: PHP (php)
Distance Threshold: 0.1m ==> 0:12:40 Distance Threshold: 0.2m ==> 0:13:38 Distance Threshold: 0.3m ==> 0:14:04 Distance Threshold: 0.4m ==> 0:14:22 Distance Threshold: 0.5m ==> 0:14:30 Distance Threshold: 0.6m ==> 0:14:34 Distance Threshold: 0.7m ==> 0:14:40 Distance Threshold: 0.8m ==> 0:14:48 Distance Threshold: 0.9m ==> 0:14:50 Distance Threshold: 1.0m ==> 0:15:00 Distance Threshold: 1.1m ==> 0:15:04 Distance Threshold: 1.2m ==> 0:15:04 Distance Threshold: 1.3m ==> 0:15:10 Distance Threshold: 1.4m ==> 0:15:12 Distance Threshold: 1.5m ==> 0:15:18 Distance Threshold: 1.6m ==> 0:15:18 Distance Threshold: 1.7m ==> 0:15:20 Distance Threshold: 1.8m ==> 0:15:20 Distance Threshold: 1.9m ==> 0:15:25 Distance Threshold: 2.0m ==> 0:15:29
Code language: PHP (php)

It looks like RWGPS is using a threshold of 0.9 m/s instantaneous speed to filter out stoppage time.

Average Speed and Moving Average

Let’s apply the same threshold and calculate Moving Time and Average Moving Speed.

We’ll start by making a new dataframe df_moving with only the records above the threshold. This simplifies the arguments we hand to our functions, but it also takes up more memory, May not be the best way to handle a very large dataset.

df.fillna(0, inplace=True) # fill in the NaN's in the first row of distances and deltas with 0. They were breaking the overall average speed calculation df_moving = df[df['inst_mps'] >= 0.9] # make a new dataframe filtered to only records where instantaneous speed was greater than 0.9m/s avg_mps = (sum((df['inst_mps'] * df['delta_time'])) / sum(df['delta_time'])) avg_mov_mps = (sum((df_moving['inst_mps'] * df_moving['delta_time'])) / sum(df_moving['delta_time'])) print(f"Maximum Speed: {round((2.23694 * df['inst_mps'].max(axis=0)), 2)} mph") print(f"Average Speed: {round((2.23694 * avg_mps), 2)} mph") print(f"Average Moving Speed: {round((2.23694 * avg_mov_mps), 2)} mph") print(f"Moving Time: {str(datetime.timedelta(seconds=sum(df_moving['delta_time'])))}")
Code language: PHP (php)
Maximum Speed: 26.06 mph Average Speed: 13.99 mph Average Moving Speed: 15.66 mph Moving Time: 2:03:05
Code language: CSS (css)

The average speed over the entire ride was 13.99mph. Filtering down to only “moving” records brings the average speed up to 15.66mph, much better, and it matches with RWGPS’s metrics.

Smoother Plots

When we plot speed vs time, we’re seeing instantaneous speed every 1 to 4 seconds (mostly). That’s a lot of points jumping up and down. The figure isn’t useful to look at.

We could down-sample and display fewer points, but that could completely cut out highs and lows.

A more elegant way that still preserves the character of the data is a moving average. In this way, we generate a new column that contains the average value of all points a given count before and after.

Pandas has a function called .rolling() that does exactly that.

df['avg_mps_roll'] = df['inst_mps'].rolling(20, center=True).mean() fig_16 = px.line(df, x='time', y=['inst_mps', 'avg_mps_roll'], template='plotly_dark') # as of 2020-05-26 Plotly 4.8 you can pass a list of columns to either x or y and plotly will figure it out fig_16.show()
Code language: PHP (php)
Figure 9. Static png of plot of time vs instantaneous speed and rolling average speed using plotly express.

Elevation Gain

Another metric we still need is the total elevation gain. The author of the original article wrote a custom function to return only positive inputs, then used map to apply it to df['delta_elev'] and generate a temporary list of positive values.

def positive_only(x): if x > 0: return x else: return 0 pos_only = list(map(positive_only, df['delta_elev'])) print(f"Total Elevation Gain: {round(sum(pos_only), 2)}")
Code language: PHP (php)
Total Elevation Gain: 381.2
Code language: CSS (css)

That method worked, but I think it can be simplified, or at least compressed with lambda and filter.

elev_gain = sum(list(filter(lambda x : x > 0, df['delta_elev']))) print(f"Total Elevation Gain: {round(elev_gain, 2)}")
Code language: PHP (php)
Total Elevation Gain: 381.2
Code language: CSS (css)

filter is like map but as the name implies, it is a filter on the series you apply it to. If the function evaluates True, it passes the value from the series, if not, the value is skipped.

References:

But is there a pure-Pandas way to do it? Can we simplify further?

print(f"Elevation Gain: {round(sum(df[df['delta_elev'] > 0]['delta_elev']), 2)}") print(f"Elevation Loss: {round(sum(df[df['delta_elev'] < 0]['delta_elev']), 2)}")
Code language: PHP (php)
Elevation Gain: 381.2 Elevation Loss: -381.4
Code language: CSS (css)

There is, actually, we used it earlier and it’s pretty slick. It’s called boolean indexing. 

df['delta_elev'] > 0 returns the row indices for every value in delta_elev that is positive.

We used those indices in the next level up to select the values we want to sum, e.g. df[idx]['delta_elev'], and sum them together.

I wrapped it in round() to clean up the output.

References:

Wrapping Up

We’ve been able to replicate most of the metrics that sites like Strava, Garmin, and RideWithGPS present for activities.

  • Distance
  • Elevation Gain/Loss
  • Total Time
  • Stopped Time
  • Moving Time
  • Max Speed
  • Average Speed
  • Moving Average Speed

Average Pace and Moving pace are transformations of Average Speed and Moving Speed. Max Grade and Average Grade are simple calculations on delta_elev and segment length, as are Ascent Time and Descent Time. VAM and Calories will take some further research to understand.

Next Steps

In the long term, we’ll analyze a large number of .gpx files. There’s a tongue-in-cheek axiom in cycling that goes “It doesn’t get easier, you just go faster.” Seems like an easy one to test since I’ve got 5 years of cycling records to work with.

The challenge will be handling that volume of points efficiently. At the least, I don’t want to read-in hundreds of .gpx over and over while work on the code. Some kind of external storage like JSON or CSV would be a simple solution, but an external database like PostGIS (built on PostgreSQL) might be better, and I’d learn some database handling finally. As an added bonus, it can be read by QGIS.

Once all the .gpx are usefully gathered and today’s code has been collapsed and made more efficient, we’ll take a crack at the ever satisfying heatmap, probably with folio and Mapbox.

Chicken Waterer

Anyone who has kept chickens knows that it’s a daily battle to keep their water clean and fresh. Our favorite waterer, the double-walled galvanized metal style, needed to be cleaned out almost daily, and with North Texas summer temps often hitting the 110s, we were struggling to keep their water cool for more than a few hours at a time. Since we can’t come home and refresh their water every few hours all summer, I had to come up with something.

The concept is pretty simple. Put an insulated water jug in a shaded area of the coop (they never roost in the hen-house, so I may as well re-purpose the space) to supply nipple or cup waterers. The difficult part, much more difficult than I ever expected, was figuring out how to connect a hose to the water container. The spigot isn’t designed for this, and replacement options only offered barbed connections meant to be crammed into pvc hose, but after 4 hours of wandering around the hardware store plumbing aisles, I’m happy with the result.

Tools:

  • Tape measure
  • Level
  • Drill
  • 3/8″ drill bit
  • Driver (optional)
  • Teflon tape for pipe threads
  • Adhesive for PVC pipe and fittings
  • Saw (for cutting PVC pipe, I use a hack saw because the fine teeth make a smooth cut)
  • Workbench & Clamps (will make life easier)
  • A bit of sandpaper ~180 grit
    • to de-burr and clean up cut edges of pvc.

Supplies:

  • For the Reservoir
    • Water container (Igloo 5 Gallon Drinking Water Cooler)
    • Coupling – Brass – 3/8″ FIP (e.g. Everbilt #LFA-760)
      • used as a nut
    • Nipple – Brass – 3/8″ MIP x 1 1/2″ (e.g. Everbilt #LFA-786)
      • could probably get away with just 1″ length
    • Coupling – Brass – 1/2″ FIP x 3/8″ FIP (e.g. Everbilt #LFA-815)
    • Rubber Grommet (e.g. Jandorf #61508)
      • 1 1/8″ OD x 5/8″ ID x 3/8″ Thick x 7/8″ Groove Diameter x 1/8″ Groove Width
    • Hose Bibb – Brass – 1/2″ 1/4-turn MPT x MHT with 3/4″ male threaded outlet that works with water hose connections (e.g. Everbilt #VHBQTCC3EB)
    • 2x standard rubber washers for water hose fittings
    • 3x Metric M16 stainless steel washers
      • these slide perfectly over the 3/8″ brass nipple
  • For the Drinking Cups/Nipples
    • 3/4″ PVC pipe, length depends on your desired setup.
    • 3/4″ x 3/4″ x 3/4″ Adapter PVC Fitting (e.g. Genova #75623)
    • 3/4″ x 3/4″ x 3/4″ Street Elbow PVC Fitting (e.g. Lasco #412007RMC)
    • 3/4″ x 3/4″ Cap PVC Fitting (e.g. Lasco #448007RMC)
    • A few 3/4″ conduit straps (e.g. Sigma Electric ProConnex #42821)
    • Water hose, length depends on your desired setup.
      • Needs 3/4″ female threaded fitting on one end, and 3/4″ male threaded fitting on the other.
    • 3/4″ threaded double-female water hose adapter fitting (e.g. B&K #GH-662B)
    • A few 3/4″+ exterior wood screws.
    • Poultry Watering Cups or Nipples (e.g. Harris Farms #1000304)

Assembling the new spigot:

Remove push-button spigot from the water container. There’s only the one plastic nut on the inside. Then fit the rubber grommet into the hole.


The new spigot will be assembled as shown, with the 3/8″ nipple passing through the rubber grommet.

Thread the 3/8″ nipple into the 3/8″ coupling until snug, then slide on an M16 washer and rubber washer. The coupling is just serving as a nut since I couldn’t find a pipe-threaded nut.

Push the remaining 3/8″ nipple through the rubber grommet from the inside of the container. A spray silicone lubricant or similar will make this much easier, but be sure whatever you use use is food-safe and will not degrade the rubber.


Slide on another rubber washer and two M16 washers (one isn’t thick enough with the 1 1/2″ nipple).

Thread on the 3/8″ FIP x 1/2″ FIP coupler and 1/2″ hose bib. Use teflon tape to ensure a good seal.

Once assembled, give it a leak check.

And that’s it for the reservoir.

Assembling and installing the waterer:

Before you get started, take a moment to think about where you want to put the fountain-end of the waterer.

For watering cups like I’ve used, consider:

  • They need to be mounted high enough to avoid debris kicked up by dust-bathing birds.
  • …low enough for your smallest bird to access them easily
  • …in a place where birds won’t be moving around or roosting above them. Anything under a chicken will get pooped on.

The mounting location you pick will determine the lengths of PVC pipe and water hose you need.

Here, I picked a location along the back wall, protected from droppings by our coop’s “mezzanine”. This didn’t work out, but for a different reason that I’ll come to later.


Cut your 3/4″ PVC pipe to length.

Mark a line down it’s length using a straight-edge. You’ll drill holes for the cups along this line so that they are at the same angle once the pipe is mounted.

Mark points along the length of the pipe where you want cups. Then drill out the holes with the 3/8″ bit.

Now, carefully thread the water cups into these holes. Once again using teflon tape to ensure a good seal.


Clamp the pipe to your workbench so that the cups are level.

Add the 3/4″ x 3/4″ x 3/4″ Adapter PVC Fitting to the end that is towards where you plan to place the water cooler. Thread the double-female water hose adapter onto this end.

Add the 3/4″ x 3/4″ x 3/4″ Street Elbow PVC Fitting to the end opposite the water cooler, then thread on the 3/4″ x 3/4″ Cap PVC Fitting. This end is your clean-out, so you can flush out any debris that happens to make it in here. You could use another straight fitting like the other end, but I went with the bend so that it could also be loosened to vent air bubbles from the system. Not really necessary.

I recommend a dry-fit first, make sure everything works out like you imagined fully installed, then take it apart to cement the fittings permanently.


Use clamps to support the waterer while you get it leveled, then mount it with the conduit straps. If you tighten the bottom screw fully, you can loosen the top one to rotate the pipe and adjust the cup position. Just be sure to re-tighten it after.


Place your cooler in the coop. Since my girls have never used the hen-house to roost at night (they’ve always preferred to roost in the run) I opted to store the water in here where it’s shaded.

Connect the cooler to the waterer with the short length of water hose. I used a ready-made 3ft extension, but fittings are available make a custom length. The double-female adapter makes it easy to install and remove the hose while the waterer is mounted.

That’s it!

Chickens may not be brilliant, but ours learned to use the water cups pretty quickly once we introduced them to it. What we think worked best was to fill the cup (hit the float with your finger), then take each of the girls over and dip their beak into it. We then repeated this with the cup empty and made their beak hit the float. They don’t really have to learn to hit the float to fill the cup themselves, they’ll bump it naturally when drinking. Check it out!

Interested in that small door in the last pic above? I’ll explain in another post, linked here when it’s published. Probably another year from now.


Follow-up & Changes

First, pretend that I published the guide above soon after the day I built it, instead of a year later. Then, pretend that I came back with an update and some changes just a few weeks after, and that I’m not actually writing all of this at once. Got it? Great.

Two things came up that forced me to change things up at bit.

First, the water is still too hot.

In the post above, I suggested three things to consider when deciding where to place your waterer. Well, I overlooked something almost hilarious given the reason for this whole project. It’s real damn hot in the summer, even in the shade, and I had placed the waterer such that the pipe and hose were both in full afternoon sun.

I moved the pipe to the other side of the coop where it would be shaded most of the day, but as it turns out, water hose and pvc will still soak up a lot of ambient heat. The cooler worked great, but the girls just didn’t go through the water fast enough to keep cool water flowing to the cups.

So, I cut the hose short and made a new waterer with the goal of having as little water as possible outside the cooler at any time.

In addition to the tools and materials for the original project, you will need:

  • Sharp knife
  • Screwdriver
  • 1/8-27 NPT pipe tap (e.g. Irwin Hanson #359001)
  • T-Handle Tap Wrench (e.g. Irwin Hanson TR-2E #12002)
  • 5/8″ x 3/4″ metal hose mender (male or female) (e.g. Yardsmith #8811 or #8812)
  • Short piece of scrap 3/4″ PVC pipe
  • 3/4″ slip-on PVC cap (e.g. Lasco #447007RMC)

To shorten the hose, just cut it and add a clamp-on fitting.

For the waterer, I used a short 3-4″ section of 3/4″ PCV pipe with a standard slip-on cap glued on one-end and another 3/4″ x 3/4″ x 3/4″ adapter at the other. The cups are actually threaded into the cap and pipe. Which brings me to the second thing.

Second, a few drips a minute will empty a 5 gallon jug while you’re at work.

Nearly every article, forum, post, Youtube video, and Amazon review about these water cups will tell you not to worry about tapping the PVC with the correct threads. They’re wrong. I’ll skip all the ways I tried and just tell you to either buy or borrow the tap and do it right the first time.

The Harris Farms Poultry Cups Model #1000304 use a 1/8-27 NPT thread. The NPT designates National Pipe Thread, tapered. The threads are ever so slightly tapered so that as they come together, they form a tight seal.

I had trouble finding the correct tap at my local hardware stores, both big box and independent, but they’re easy to find online and cost less than $10. Don’t forget a T-Handle Tap Wrench too, also about $10.

The tap calls for a 21/64″ bit, but a 5/16″ (slightly smaller) will do fine since we’re tapping relatively soft plastic.

Drill the holes, then twist in the tap until it has cut threads through the entire hole. Unscrew the tap and then simply screw on the cups, with a little teflon tape. Since doing this properly mine haven’t leaked a drop.

A Plan: Bike-camping at 4R Ranch Vineyards & Winery

The campground at 4R Ranch, from when we camped ahead of the 2018 Spinistry Red River Riot.

Back in spring of 2018 (perhaps earlier?), 4R Ranch introduced a small campground a short way inside their property. It’s a simple setup, open field, some trees, and very pretty, tucked away in the other Texas hill country you may not have known about. Amenities include running water, fire pits, showers, and restrooms. (and a short walk to the tasting room…)

Riders of the 2018 Spinistry Red River Riot were invited to camp before and after to save having to drive in the day of the race, and it was absolutely worthwhile. Not having to drive was nice, but whereas most events you arrive in a flurry of activity, and only talk to the riders passing you throughout the day (just me?), the campground promoted an entirely different and much more fulfilling pre-ride experience. Every big ride should start like this.

A few weeks ago Walt added their humble campground to Hipcamp (think AirBnB for private campgrounds) just as I was searching for somewhere to go over Memorial Day weekend after all the state parks were booked-up. Perfect!

The plan, or A plan, such as it is, is to ride from Denton and pick up local prairie-cycling wizard @Pondero from his home near Bolivar (many thanks to him for the majority of the route planning) then carry on winding through the back-roads to Muenster. After a quick stop for lunch, we’ll make the final jaunt to camp and that’s where the plan stops, at least until we ride back the next day.

Things to know:
– If you plan to use the A-Train to get to Denton, don’t forget that it doesn’t run on Sundays.