Cyclistic bikeshare data analysis
1.0 Introduction
Developing strategies is a crucial element in maximizing a company's profit potential. Cyclistic, a bike-sharing company located in the heart of Chicago, has recorded over 4.5 million rides in the past 12 months. The company believes that a slight shift in their marketing strategy can enhance their profit margins. Cyclistic offers a variety of ride packages, including single-ride, full-day passes, and annual memberships, to cater to the diverse needs of its customers. In this analysis, single ride and full-day passes are categorized as casual riders while annual membership riders are categorized as members.
Despite its growing customer base, Cyclistic has identified an opportunity to further optimize profitability. This analysis, conducted by the marketing team's data analyst, aims to explore the differences between annual members and casual riders. The insights gained from this analysis will inform the development of strategies to convert current casual riders into annual members.
2.0 About The data
The data I will be looking at is provided by a bike-sharing company in Chicago called 'Divvy Bikes.' For the purpose of this analysis, we will use a fictional name, 'Cyclistic,' for the bike-sharing platform.
The data contains trip data from 2014 to 2023, however, for this analysis, we will only be using data from the last twelve months, May 2022 to April 2023. There are separate data for each month, and each monthly data contains the following columns.
- "ride_id" contains distinct ID for each row. This column is in string format.
- "rideable_type" contains the different categories of bikes: classic bike, electric bike and docked bike. This column is in string.
- "started_at": Contains the date and time each ride starts. The column is in string format.
- "ended_at": Containing the date and time each ride ends. The column is in string format.
- "start_station_name" Contains the exact name of the station where the rider starts riding from. The column is in string format
- "start_station_id" contains the ID of each start station. The column is in string format.
- "end_station_name" Contains the exact name of the station where the rider ends his ride. The column is in string format.
- "end_station_id" contains the ID of each station where the rider ends the trip. The column is in string format.
- "start_lat" `contains the latitude of the location where the trip starts. The column is in string format.
- "start_lng" contains the longitude of the location where the trip starts. The column is in string format.
- "end_lat" contains the latitude of the location where the trip ends. The column is in string format.
- "end_lng" contains the longitude of the location where the trip ends. The column is in string format.
- "member_casual" denotes if the rider is a casual rider or an annual member. The column is in string format.
Data integrity was maintained as the data was obtained from a reputable bike-sharing company in Chicago. The data included a few null values in the start_station_name, end_station_name, start_station_id and end_station_id columns. However, these values were inconsequential as they were not used in our analysis. The data is unbiased as it records every ride within this period. The data is also recent as we have all the rides till the date of the analysis.
3.0 Data cleaning
All the columns were in string formats and were in quotation marks, I started by removing the quotation marks from all the columns that had them. I joined all the data from the twelve tables of the twelve months together using the Union statement on SQL. Next, I created new columns to convert the strings from the started_at and ended_at columns to DateTime format.
I created a column for trip duration, to get this, I found the difference in HH:MM:SS between the ended_at cells and their corresponding started-at cells. II then used data from the started_at column to create new columns for month, season and day of the trip. Lastly, I converted the string of the start_lat and end_lat columns to Longitude data and converted that of start_lng and end_lng data to longitude.
The SQL code for this data-cleaning is in the link below: datacleaninglink
3.0 Exploratory data analysis
I compared the number of rides per month and per week for both members and casuals and found out that the number of members and casual rides peaked in the summer months. However, membership rides peaked during weekdays while casual rides peaked during weekends.
I also compared the average duration of rides for casuals and members per month, day of the week and seasons.
The exploratory data analysis is in the link: exploratorydataanalysis
3.1 Data Visualisation
3.1.1 Monthly Trends
A line chart comparing the number of rides for both member types across different months (Fig 1) demonstrates a consistent trend. Both annual members and casual riders experience a gradual increase in ride counts from month to month, peaking during the summer months and declining during the winter months. However, a closer examination reveals a noteworthy disparity in ride volumes between winter and summer months. For instance, in February (winter), casual members accounted for only 23% of the total riders, whereas in August (summer), the percentage of casual rides surged to approximately 48%. This shows that the percentage of casual customers increases during the summer months.
Fig 1: Line graph for the number of rides per month for both casuals and members
3.1.2 Weekly Trends
Another line chart comparing the number of rides for both annual members and casual riders (Fig 2) across weekdays reveals that Casual riders show a peak in ride activity from Friday to Sunday, with a noticeable drop on weekdays. In contrast, annual members exhibit reduced ride activity during the weekends, with a significant increase on weekdays. This pattern suggests that annual members primarily use Cyclistic services for commuting on workdays, while casual riders tend to utilize the service more frequently on weekends, probably for leisure.
Fig 2: Line graph for the number of rides per weekday for both casuals and members
3.1.3 Starting location comparison
A geographical analysis of the longitude and latitude coordinates of trip starting locations (Fig 3)reveals intriguing insights. Annual members are evenly distributed around the city of Chicago, indicating a wide range of starting points for their rides. In contrast, casual members are also located throughout the city, but a significant concentration of rides occurs on the right-hand side of the map. A closer look reveals that the latitude and longitude coordinates of these clustered locations correspond to popular parks like Millennium Park and Grant Park. This buttresses our earlier insight, that annual members use Cyclistic mostly for work, while casual riders use Cyclistic mostly for recreation.
Fig 3: starting location of rides
3.1.4 Trip Duration Comparison
To gain more insights into rider behavior, we conducted an analysis comparing the average duration of single trips between casual riders and annual membership riders. Surprisingly, despite the higher count of annual membership riders, casual riders, on average, spend more time on a single trip than their annual membership counterparts. This might mean that because casual users use the service for leisure, they spend more time on each ride.
4.0 Conclusion
In conclusion, our findings suggest that casual riders predominantly utilize Cyclistic's services for leisure, while annual members primarily use Cyclistic for their daily work commute. It is worth noting that some individuals work on weekends, which warrants further investigation into demographic data to validate this hypothesis. Additionally, gathering qualitative data on user motivations for choosing Cyclistic bike-sharing is essential before arriving at any definitive conclusions.
Considering the insights from this analysis, it may not be the most effective strategy to convert casual riders into annual members, as both customer categories have distinct motivations for using Cyclistic's services.
Comments
Post a Comment