top of page
img 4.jpg

MTA turnstile analysis. Do they actually gain or lose money ?

Learn More
Home: Welcome
img 3.jpg
  • "Welcome to the NYC MTA system

Leonardo Ridwan

Home: Quote
img 2.png
Home: Image
img 1.jpg
Home: Image

Purpose of this project

Living in New York and taking public transport you've seen it all. What you might have also seen is that people actually paying to enter the MTA and other people who jump the turnstiles. The purpose of this project proposal is to see if the MTA is losing money using turnstiles or is gaining money using turnstiles. I plan on using the public data of entries and exits that is provided in the links below to calculate the difference between the amount of people entering the MTA subway stations and the amount of people leaving the subway stations.

Home: Text

Overview

What I did

My hypothesis for this project was to see if the MTA turnstile system earned more money than it lost. To tackle this problem we needed to get our hands on MTA turnstile open data that shows the amount of people entering and leaving the turnstiles for every station in NY. The methods to tackle the problem was using pandas to be able to target certain columns.

Home: Text

Data

Data used for project

https://catalog.data.gov/dataset/turnstile-usage-data-2020
A csv file that contains the data needed for the project for the entire year of 2020. There are other years but since each file is a couple of GB's it takes a while for my laptop to compile the data. When opening the file in excel, excel must cut off a lot of the data to fit so you must download and not open the csv with excel cause it alters the data. Can use the other years to make graph prediction more precise but my laptop takes too long to compile 1 year of data, loading other years at the same time doesn't sound good for my laptop but feel free to do it yourself.

Home: Text

Techniques Section

The techniques used are the ones we learned from class using pandas to target specific columns and manipulating the entire data frame to do what I need it to do. Then using matplotlib to create a visual (attached below) so it can show the results

Home: Text

<Code>

import pandas as pd 
import matplotlib.pyplot as plt

df = pd.read_csv("2020.csv", header = 0) # opening certain file on computer
df.columns = [*df.columns[:-1], 'Exits'] # changing last column in cvs file to be Exits 
df['Month'] = pd.to_datetime(df['Date']).dt.month # converting the date column using pandas to datetime function 
df = df.groupby(df['Month']).sum() # grouping by month and summing total
df["Months"] = [1,2,3,4,5,6,7,8,9,10,11,12] # create a new column to have months 
df['Total_Difference'] = df['Entries'] - df['Exits'] # creating a new column that takes the total value of entries and subtracts it from exits
df['Total_Difference'] = df['Total_Difference'] * 2.75 # calculating the sum by the current mta fare price
ax = plt.gca() # setting varaible ax to plot get current axes, essential used so i can paste all results on one graph
df.plot(kind='line', x = 'Months', y ='Entries', color = 'green', ax = ax) #creating a line graph for entries with specific color 
df.plot(kind='line', x = 'Months', y ='Exits', color = 'red', ax = ax) #creating a line graph for exits with specific color 
df.plot(kind='line', x = 'Months', y ='Total_Difference', color = 'blue', ax = ax) #creating a line graph for total difference with specific color 
plt.show() # creating the graph 

Home: Text
Home: Work

Image Results

The image above shows the amount of entries and exits per month. As you can see in the image, entries surpass the total amount of exits per station. When you subtract the amount of both you can see that the total difference is still a net positive number. This means that NYC MTA system for the year of 2020 has gained more money then losing money.

Home: Text
Home: Text
bottom of page