Ashley Zacharias
Data Analyst
Cyclistic Bike-Sharing Co.
This project analyzes trends with bike usage between casual riders and annual members with a fictional bike-share company in Chicago, Illinois, called Cyclistic.
Summary
Project Background
For the Google Data Analytics Capstone Project, I will perform many real world tasks of a junior data analyst for a marketing team for a fictional bike-share company called Cyclistic. Cyclistic is located in Chicago, Illinois, and offers more than 5,800 bicycles and 600 docking stations. Cyclistic has reclining bikes, hand tricycles, and cargo bikes, making bike-share more inclusive to people with disabilities and riders who can’t use a standard two-wheeled bike.
Cyclistic launched its successful bike-share offering in 2016. Until now, Cyclistic’s marketing strategy relied on building general awareness and appealing to broad consumer segments. One approach that helped make these things possible was the flexibility of its pricing plans: single-ride passes, full-day passes, and annual memberships. Customers who purchase single-ride or full-day passes are referred to as casual riders. Customers who purchase annual memberships are Cyclistic members
The purpose of this Case Study is to analyze how the company can maximize the number of annual memberships with a new marketing strategy. Cyclistic’s finance analysts have concluded that annual members are much more profitable than casual riders. Although the pricing flexibility helps Cyclistic attract more customers, the director of marketing, Lily Moreno, believes that maximizing the number of annual members will be key to future growth.
In order to answer the key business questions, I have followed the 6 Phases of Data Analysis (Ask, Prepare, Process, Analyze, Share, Act) as defined in this course.
The Ask Phase
Stakeholder Question
Lily Moreno, who is the Director of Marketing and my manager in this scenario, has set a clear goal for our data analytics team: Design marketing strategies aimed at converting casual riders into annual members. In order to do that, however, the marketing analyst team needs to better understand how annual members and casual riders differ, why
casual riders would buy a membership, and how digital media could affect their marketing tactics. We are interested in analyzing the Cyclistic historical bike trip data to identify trends
The business task includes the following:
-
How do annual members and casual riders use Cyclistic bikes differently?
-
Why would casual riders buy Cyclistic annual memberships?
-
How can Cyclistic use digital media to influence casual riders to become members?
My role is to focus on the first question, how annual members and casual riders use Cyclist bikes differently.
The Prepare Phase
Data Integrity
Dataset Information
The dataset used for this case study can be found in “Divvy Tripdata” and is made available by Motivate International Inc. under their license. It is a public, open source data set fictionally designed for this Case Study.
There is a csv document for every month since January 2020 and the data is in wide format. For this study, I utilized the past 12 months of data, which includes files from October 2021 to September 2022.
All files were securely downloaded and stored with clear file names in the format YYYYMM-divvy-tripdata.
The content of the data contains 13 fields including information about ride IDs, dates, membership types, and geographic locations of start and end stations. Data records varied from about 103,000 to 824,000 between the 12 files.
Data Limitations
The data is current and relevant, as it has been collected through this past month. It has been collected by the bike-share company directly and has privacy protection for its customer data. There are data-privacy issues that prohibit us from using riders’ personally identifiable information. This means that we won’t be able to connect pass purchases to credit card numbers to determine if casual riders live in the Cyclistic service area or if they have purchased multiple single passes.
The Process Phase
Data Cleaning and Preperation
This phase incorporates the steps to clean the data and prepare it for effective analysis. It is important to keep the business task and questions in mind here in order to ensure the data needed is readily available for later use in the analysis.
I first used Microsoft Excel to explore and clean the data. Some of the steps I took were:
-
I verified all heading names were meaningful and clear.
-
I viewed the data and stored a copy for cleaning in each month’s file.
-
I removed rows with invalid data. I also checked and adjusted data types in each column.
-
I removed the columns with start and end station ids and names, as too many of these were blank, inconsistent, or invalid. As we don’t have access to information about rider residence in correlation with Cyclistic service areas, I also removed the latitude and longitude columns for the stations.
-
Then I created a few new columns using the LEFT() and RIGHT() functions to separate the date and time in the started_at and ended_at columns into start_date, end_date, start_time, and end_time. This was necessary for the following step.
-
I calculated the ride_length by subtracting the start_time from the end_time. Considering that the rides may have crossed over into the next day, I crafted an =IF function to properly calculate the elapsed time for these scenarios.
-
I copied and pasted the new cells as values in order to remove the started_at and ended_at columns without interfering with the functions.
-
Finally, I added a day_of_week column using the =WEEKDAY function with 1=Sunday and 7=Saturday.
​
​
​
​
​
​
As the files are rather large for Microsoft Excel or Google Sheets, I then uploaded them to RStudio to finish cleaning. Some of the steps I took were:
-
I imported the necessary RStudio packages including
-
‘tidyverse’ (a meta package for ggplot2, readr, tidyr, dplyr, etc),
-
‘lubridate’ and ‘chron’ (used for dates), and
-
'data.table' (to merge files).
-
​
​
​
​
​
​
-
Then I uploaded all individual month files and combined them into one dataframe.
-
I previewed the rows and checked data types in each column to ensure they make sense.
-
I removed duplicates for each month, rows with N/A values, and rows that had a start_time later than an end_time (rows with 0 or negative ride_lengths).
-
I added columns about month, day, time, and season that would help me calculate the most frequent patterns for riding.
​
​
​
​
​
​
The specifics to my steps above along with the code I used can be found in my RMarkdown.
The Analysis Phase
Aggregating Data
With my data now cleaned and organized, I can complete some calculations to better understand casual versus member riders at Cyclistic, which is the business task set.
-
I computed the ride_length and converted this time to minutes.
-
I also calculated some statistical information including the minimum, maximum, and mean ride times.
-
For casual riders the maximum ride time was 1409 minutes (or nearly 24 hours) and minimum time was 1 minute. The average ride was 10 minutes.
-
For member riders the maximum ride time was 1166 minutes (or nearly 20 hours) and minimum time was 1 minute. The average ride was 8 minutes.
-
​
​
​
​
-
I compared the number of casual and member riders as well as finding the total number of riders.
-
There are just over 2 million casual riders and about 3.5 million member riders. The total number of riders is about 5.8 million.
-
​
​
​
​
​
-
I created a tibble, or small data table, to look at how many riders there were at each time of day, grouped by rider type.
-
The results were similar for each rider type, as the most popular time for both casual and member riders was between 3:00-7:00 p.m.
-
-
I calculated the most frequent day, month, and time for riding with each rider type.
-
Casual riders most frequently used Cyclistic bikes on Saturday while member riders preferred Tuesday.
-
Both types of riders used Cyclistic bikes most often during the summer months, particularly July and August for casual and member riders respectively.
-
​
​
​
​
​
​
​
-
Finally, I compared which season (fall, winter, spring, summer) and time of day (morning, afternoon, evening, night) had the most and least frequent riding.
-
As calculated previously with specific times, afternoon and evening were the most popular riding times for all rider types. Summer followed by fall were the most popular seasons for riding.
-
The Share Phase
Data Insights
I created a Tableau Story that I would use to present to the Cyclistic team and my manager. This presentation answers the first question of our business task, to contribute to the development of marketing strategies to convert casual riders into annual members by first comparing how they each use Cyclistic.
The first area to focus on is how many members are casual versus annual. Our data showed us that out of about 5.8 million members, 2.4 million or 40% are casual riders while 3.5 million or 60% are annual members.
​
​
​
​
​
​
​
​
​
​
​
​
​
​
​
Looking at the types of bikes each type of rider uses is also important to understanding the types of products users are interested in. Cyclistic offers electric and classical bikes as well as docking stations to hold docked bikes. Casual riders account for all of the docked bikes, but they use electric more than the classical style of bike. Annual members currently show the opposite, using more classical than electric bikes.
The summer months had the most bike usage for both types of riders. Many riders also used Cyclistic bikes in late spring and early fall. As Cyclistic is located in Chicago, this makes sense due to the warmer temperatures and sunnier weather during these times of the year.
​
​
​
​
​
​
​
​
​
​
​
​
​
​
Casual riders rode Cyclistic bikes most frequently and for the longest periods of time during the weekend. However, annual members used the bikes mostly during the week, particularly on Tuesdays, and had a pretty consistent ride length time throughout the week. One might infer casual riders ride more for leisure while annual members ride in a form of commuting/transportation based on their peak activity days.
​
​
​
​
​
​
​
​
​
​
​
​
​
​
​
The late afternoon and early evening hours are the most common times of the day to ride for both casual and annual members.
​
​
​
​
​
​
​
​
​
​
​
​
​
​
​
The Act Phase
Recommendations
Along with the recommendations from my team, I would include the following to support Cyclistic’s marketing strategies to boost casual riders into annual members:
-
Offer and advertise deals for annual memberships prior to the peak ride periods in the late spring/early summer, persuading casual riders to upgrade their memberships.
-
Available discounts and priority service can be offered to annual members during peak days and times of casual rider use. This means when casual riders are most interested in using the Cyclistic service, typically the weekend and afternoons/evenings, this can persuade these riders to upgrade their memberships to receive these discounts as well.
-
Special discounts can also be offered on certain types of bikes for annual members only, especially electric bikes, as casual riders preferred these.​
-
-
An increase in weekend rental fees, especially for docked bikes, for casual riders is another option to consider a push for membership upgrade.
​
​
​
​
​
​
​
​
​
​
​
​
​
​
To view the full Tableau Story Presentation, please click here.