gtfs_segments.utils
download_write_file(url, folder_path)
It takes a URL and a folder path as input, creates a new folder if it does not exist, downloads the file from the URL, and writes the file to the folder path
Parameters:
Name | Type | Description | Default |
---|---|---|---|
url |
str
|
The URL of the GTFS file you want to download |
required |
folder_path |
str
|
The path to the folder where you want to save the GTFS file. |
required |
Returns:
Type | Description |
---|---|
str
|
The location of the file that was downloaded. |
Source code in gtfs_segments/utils.py
240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 |
|
export_segments(df, file_path, output_format, geometry=True)
This function takes a GeoDataFrame of segments, a file path, an output format, and a boolean value for whether or not to include the geometry in the output.
If the output format is GeoJSON, the function will output the GeoDataFrame to a GeoJSON file.
If the output format is CSV, the function will output the GeoDataFrame to a CSV file. If the geometry boolean is set to True, the function will output the CSV file with the geometry column. If the geometry boolean is set to False, the function will output the CSV file without the geometry column.
The function will also add additional columns to the CSV file, including the start and end points of the segments, the start and end longitude and latitude of the segments, and the distance of the segments.
The function will also add a column to the CSV file that indicates the number of times the segment was traversed.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
df |
DataFrame
|
the dataframe containing the segments |
required |
file_path |
str
|
The path to the file you want to export to. |
required |
output_format |
str
|
geojson or csv |
required |
[Optional] |
geometry
|
If True, the output will include the geometry of the segments. If False, the output will |
required |
only include the start and end points of the segments. Defaults to True
Source code in gtfs_segments/utils.py
144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 |
|
failed_pipeline(message, filename, folder_path)
"If the folder path exists, delete it and return the failure message."
Parameters:
Name | Type | Description | Default |
---|---|---|---|
message |
str
|
The message to be returned |
required |
filename |
str
|
The name of the file that is being processed |
required |
folder_path |
str
|
The path to the folder where the file is located |
required |
Returns:
Type | Description |
---|---|
str
|
a string that is the concatenation of the message and the filename, indicating failure |
Source code in gtfs_segments/utils.py
222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 |
|
plot_hist(df, save_fig=False, show_mean=False, **kwargs)
It takes a dataframe with two columns, one with the distance between stops and the other with the number of traversals between those stops, and plots a weighted histogram of the distances
Parameters:
Name | Type | Description | Default |
---|---|---|---|
df |
DataFrame
|
The dataframe that contains the data |
required |
save_fig |
bool
|
If True, the figure will be saved to the file_path. Defaults to False |
False
|
show_mean |
bool
|
If True, will show the mean of the distribution. Defaults to False |
False
|
Returns:
Type | Description |
---|---|
Figure
|
A matplotlib axis |
Source code in gtfs_segments/utils.py
17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 |
|
process(pipeline_gtfs, row, max_spacing)
It takes a pipeline, a row from the sources_df, and a max_spacing, and returns the output of the pipeline
Parameters:
Name | Type | Description | Default |
---|---|---|---|
pipeline_gtfs |
Any
|
This is the function that will be used to process the GTFS data. |
required |
row |
Series
|
This is a row in the sources_df dataframe. It contains the name of the provider, the url to the gtfs file, and the bounding box of the area that the gtfs file covers. |
required |
max_spacing |
float
|
Maximum Allowed Spacing between two consecutive stops. |
required |
Returns:
Type | Description |
---|---|
Any
|
The return value is a tuple of the form (filename,folder_path,df) |
Source code in gtfs_segments/utils.py
194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 |
|
summary_stats(df, max_spacing=3000, min_spacing=10, export=False, **kwargs)
It takes in a dataframe, and returns a dataframe with summary statistics. The max_spacing and min_spacing serve as threshold to remove outliers.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
df |
DataFrame
|
The dataframe that you want to get the summary statistics for. |
required |
max_spacing |
float
|
The maximum spacing between two stops. Defaults to 3000[m] |
3000
|
min_spacing |
float
|
The minimum spacing between two stops. Defaults to 10[m] |
10
|
export |
bool
|
If True, the summary will be exported to a csv file. Defaults to False |
False
|
Returns:
Type | Description |
---|---|
DataFrame
|
A dataframe with the summary statistics |
Source code in gtfs_segments/utils.py
77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 |
|