Python GIS: Advantages and Integration
Table of Contents
Python GIS: Advantages and Integration #
Introduction #
Geographic Information Systems (GIS) have long had a place in public health, helping teams understand spatial patterns for planning outreach or allocating resources. Traditionally, GIS work has relied heavily on desktop mapping software like QGIS or ArcGIS. These tools offer powerful interfaces for exploring and visualizing spatial relationships, but they can be limited when it comes to automation and integration with modern data infrastructure.
Python is a versatile programming language that has increasingly been adopted by public health data teams for its capabilities in working with data of many forms.
This article explores how Python can enhance public health workflows - for data analysts, epidemiologists, data scientists, and GIS professionals - in some of the key areas where traditional GIS software falls short, offering advantages in automation, reproducibility, and integration with data systems. We’ll look at how Python’s growing ecosystem of GIS libraries make it a flexible alternative to traditional GIS software, how spatial data can be stored and versioned within a data lake, and why reproducible code-based workflows are becoming essential for modern public health analytics.
GIS #
GIS involves tools for capturing, storing, analyzing, and visualizing spatial data - that is, data that has a geographic component. GIS allows users to understand patterns and relationships tied to location, such as where people live, how resources are distributed, or how environmental factors vary across regions.
GIS data typically comes in two forms:
- Vector data: points, lines, and polygons representing things like schools, roads, or boundaries.
- Raster data: grid-based data such as satellite imagery or heatmaps.
GIS is used across many fields - from urban planning to environmental science - but its value in public health is especially significant.
GIS in Public Health #
In public health, GIS helps teams make data-driven decisions that are supported by geography. Whether mapping vaccine coverage or tracking environmental exposures, GIS provides a spatial lens for understanding health outcomes and service gaps.
Some common GIS applications in public health include:
- Clinic catchment analysis: determining which populations are served by which facilities.
- Outbreak mapping: visualizing the spread of infectious diseases.
- Environmental health monitoring: mapping water quality, air pollution, or vector habitats.
- Equity analysis: identifying underserved communities or areas with limited access to care.
Spatial thinking is especially important for planning and communication - helping PHUs design outreach strategies, allocate resources, and share insights with stakeholders through maps and dashboards.
Python in GIS Workflows #
Python offers a wide range of libraries for working with spatial data, making it a powerful alternative - or complement - to traditional GIS software. These tools allow public health teams to build automated, reproducible workflows that integrate directly with data lakes and other systems.
Python, with its dedicated libraries, acts as a bridge between data science and spatial analysis. It offers several key advantages over traditional GIS software:
- Automation and reproducibility: Python scripts can automate repetitive tasks, and running steps with scripts can keep workflows reproducible, reducing human error and improving consistency.
- Integration with data lakes and APIs: Python connects to modern data infrastructures, enabling direct access to datasets stored in data lakes and real-time data from APIs - eliminating manual data transfer and supporting up-to-date analysis.
- Open-source ecosystem: Python offers extensive open-source libraries (e.g. GeoPandas, Rasterio, Folium) for spatial analysis and visualization. This flexibility allows customization to meet program needs.
Code-based GIS also improves institutional memory. With modern, readable syntax - and with comments when needed - Python workflows effectively document themselves. Anyone reviewing the script can see the exact sequence of steps that produced a map or dataset. This is a major strength for PHUs, where GIS tasks may be infrequent, staff may change, and dropdown-driven workflows can be difficult to reconstruct later.
Python GIS Tools #
Several Python libraries work together to provide powerful tools for spatial data processing and visualization:
- GeoPandas: Combines the functionality of pandas with spatial operations. It’s ideal for reading, filtering, and joining spatial datasets.
- Shapely: Provides geometric operations such as buffering, intersection, and distance calculations.
- Folium: Enables interactive web-based maps using Leaflet.js. It’s great for sharing spatial insights with non-technical audiences.
- Plotly / Matplotlib: Though not specific to GIS, both offer customizable visualizations for spatial data.
Data Lake Integration #
Modern public health analytics often rely on centralized data systems, such as data lakes, to store and manage datasets. Integrating GIS into these systems ensures that spatial data becomes part of a broader analytical ecosystem, placing it alongside demographic and environmental information. This centralization allows analysts to work from a single source of truth rather than juggling multiple tools, and the data lake structure supports versioning and provenance so curated map layers can be documented and trusted. It also enables scalable workflows that serve multiple programs - such as vaccine outreach and environmental monitoring - without duplicating effort, making GIS more consistent and useful across the organization.
Python enables this integration, as the language’s GIS libraries work seamlessly with common data lake formats like Parquet, CSV, and GeoJSON. Scripts can automate ingestion, cleaning, and transformation of spatial data, reducing manual steps. Metadata can be generated programmatically, making it easier to track coordinate systems, sources, and update history.
A typical medallion structure in a data lake separates data into zones that reflect its level of transformation. Folders within these zones are often where project specific data is stored. A /gis/ section in a data lake might include:
/raw/gis/for unprocessed shapefiles or GeoJSON/processed/gis/for intermediate outputs (e.g. reprojected layers, cleaned geometries, or dissolved/merged boundaries)/curated/gis/for standardized layers with documented CRS/curated/gis/layers/for thematic maps/curated/gis/metadata/for provenance and data dictionaries
By integrating GIS into a data lake, PHUs can build workflows that are transparent, reproducible, and ready for advanced analytics.
Example: Disease Outbreak Map #
One use case for Python GIS tools is in outbreak mapping. Consider a report on a (fake) disease outbreak that requires the creation of a bubble map for cases in towns within the Wellington-Dufferin-Guelph region.
First, we will load the geographic data.
Within Python, the relevant files can be loaded in and processed with GeoPandas. We will load two files:
case_count.csv- a csv files containing each town name, number of disease cases, and town coordinates (latitude and longitude).region_boundaries.shp- the shapefile path for the region boundaries.
import geopandas as gpd
import pandas as pd
# Read the shapefile into a GeoDataFrame
region_gdf = gpd.read_file(region_shapefile_path)
# Read the outbreak data and convert to GeoDataFrame
df = pd.read_csv(outbreak_file_path)
outbreak_gdf = gpd.GeoDataFrame(
df,
geometry=gpd.points_from_xy(df.Longitude, df.Latitude),
crs="EPSG:4326" # Typical CRS for GPS
)
Make sure both dataframes use the same coordinate reference system (CRS). Having the same CRS is imperative for merging and plotting data.
# Match CRS
region_gdf = region_gdf.to_crs(outbreak_gdf.crs)
Group the data points by region. This will allow us to create map layers based on data points in each region.
joined_gdf = gpd.sjoin(
outbreak_gdf,
region_gdf,
how="left",
predicate="within"
) # or 'intersects', to include points on the boundary
# Group points outside of WDG boundaries as 'Other'
joined_gdf['WDG Region'] = joined_gdf['WDG Region'].fillna('Other')
We’ll also calculate total number of cases per region.
region_gdf['Cases'] = [sum(joined_gdf[joined_gdf['WDG Region'] == region]['Count of Cases']) for region in region_gdf['WDG Region']]
Plot the case counts for each town and region using Folium.
import folium
from folium.plugins import GroupedLayerControl
m = region_gdf.explore(tooltip=['WDG Region', 'Cases'])
groups = []
# Create a FeatureGroup for each region
for region in joined_gdf['WDG Region'].unique():
group = folium.FeatureGroup(name=region)
region_data = joined_gdf[joined_gdf['WDG Region'] == region]
for _, row in region_data.iterrows():
folium.Circle(
location=[row.geometry.y, row.geometry.x],
radius=row['Count of Cases']*200, # Adjust as needed
popup=f"{row['Town']}: {row['Count of Cases']} cases",
tooltip=f"{row['Town']}: {row['Count of Cases']} cases",
color='orange',
fill=True,
fill_color='orange',
fill_opacity=0.6,
).add_to(group)
m.add_child(group)
groups.append(group)
# Add layer control
folium.LayerControl().add_to(m)
GroupedLayerControl(
groups={'Region': groups},
collapsed=False,
exclusive_groups=False,
).add_to(m)
m
Folium’s GroupedLayerControl allows us to assign map overlays to specific groups. These groups can be activated by toggling checkboxes. Groups can also be exclusive (only one layer within a group can be displayed at one time), though here we set ours to be non-exclusive.
This code gives us the following map:
This map and its layers can now be used in reporting or saved to the appropriate section of a data lake or other storage service for future reference.
Many PHU data products - long-form reports, blogs, surveillance summaries, and dashboards - can embed interactive Folium maps directly, offering a more engaging alternative to static screenshots. This approach complements existing tools like Power BI: Python handles complex geospatial processing and custom map generation, while Power BI remains a strong option for dashboarding and business reporting.
Conclusion #
GIS is an essential tool for understanding and addressing public health challenges, and Python has much to offer when it comes to improving keys aspects of spatial workflows.
Moving toward code-based GIS workflows doesn’t mean leaving behind the functionality of traditional mapping tools - and what it adds is efficiency, reproducibility, and flexibility. As public health continues to depend on data-driven decision-making, adopting Python for GIS can help public health teams deliver quality insights that can be relied on.