AI in Satellite Imagery and Remote Sensing: A Practical Guide

Every day, Earth observation satellites capture petabytes of imagery. Landsat, Sentinel, Planet, Maxar — the volume of geospatial data available has exploded. The bottleneck isn’t data anymore. It’s analysis.

AI, specifically deep learning for computer vision, has become the essential tool for extracting actionable information from satellite imagery at scale. What took a team of analysts weeks to manually interpret, a trained model processes in minutes.

Why Satellite Imagery Is Different

Before diving into techniques, understand what makes remote sensing data unique:

Multi-spectral bands. Unlike RGB photos, satellite images capture data across many spectral bands — visible, near-infrared, shortwave infrared, thermal. A Sentinel-2 image has 13 bands. Each band reveals different information about the Earth’s surface.

Spatial resolution varies. From 30cm (commercial high-res) to 30m (Landsat) to 1km (weather satellites). Your resolution determines what you can detect — buildings at 50cm, fields at 10m, land cover at 30m.

Temporal dimension. The same location is imaged repeatedly (every 5 days for Sentinel-2). This temporal stack enables change detection, trend analysis, and seasonal monitoring.

Georeferencing. Every pixel has geographic coordinates. This enables integration with maps, GIS data, and ground truth measurements.

Atmospheric interference. Clouds, haze, and atmospheric scattering affect image quality. Pre-processing (atmospheric correction) is mandatory for quantitative analysis.

Core AI Applications

Land Use and Land Cover Classification

The foundational task: classify every pixel (or region) into categories — forest, urban, water, agriculture, barren land.

How it works:

Train a semantic segmentation model (U-Net, DeepLabV3, or transformer-based) on labeled satellite imagery
The model learns spectral and spatial patterns associated with each land cover type
Apply to new imagery to generate land cover maps

Training data sources:

LULC datasets: ESA WorldCover, Dynamic World (Google), National Land Cover Database (USGS)
Manual annotation: Using tools like Label Studio or QGIS with satellite imagery
Transfer learning: Pre-trained models on large geospatial datasets, fine-tuned on your specific region

Practical tips:

Always include a temporal component — a field in winter looks very different from a field in summer
Multi-spectral features (NDVI, NDWI, NDBI) dramatically improve classification over RGB alone
Edge cases between classes (is it scrubland or sparse forest?) are where human review matters most

Change Detection

Detecting what changed between two (or more) images of the same area. Critical for:

Deforestation monitoring
Urban expansion tracking
Disaster damage assessment
Agricultural change (crop rotation, land conversion)
Coastal erosion

Approaches:

Image differencing: Subtract pixel values between two dates. Simple but effective for dramatic changes.

Deep learning change detection: Models like BiT (Binary Change Detection Transformer) take image pairs as input and produce change masks. These handle subtle changes that simple differencing misses.

Time series analysis: For gradual changes, analyze the full temporal stack. LSTM or temporal convolution networks detect trends and breakpoints in pixel-level time series.

Key challenge: Distinguishing real change from noise (clouds, shadows, seasonal variation, sensor differences). Pre-processing and normalization are essential.

Object Detection

Finding specific objects in satellite imagery: buildings, vehicles, ships, aircraft, solar panels, swimming pools.

Architecture choices:

YOLO variants for fast inference on large image tiles
Faster R-CNN for higher accuracy on small objects
Anchor-free detectors (FCOS, CenterNet) increasingly popular for geospatial applications

Unique challenges:

Scale variation: Objects can be very small (a few pixels for vehicles at 30cm resolution)
Rotation invariance: Unlike street-level photos, satellite imagery has no fixed orientation
Dense scenes: Urban areas can have thousands of buildings per tile
Class imbalance: The thing you’re looking for (e.g., illegal mining sites) is rare

Practical approach:

Tile large satellite images into manageable chips (512×512 or 1024×1024)
Use overlapping tiles to avoid cutting objects at boundaries
Apply non-maximum suppression across tile boundaries

Semantic Segmentation

Pixel-level classification — every pixel gets a label. Used for:

Building footprint extraction
Road network mapping
Crop type identification
Flood extent mapping
Solar panel detection

The SpaceNet challenges have driven significant progress here, with top models achieving >90% IoU on building extraction in diverse urban environments.

Architecture evolution:

U-Net remains the baseline (and is surprisingly hard to beat significantly)
HRNet preserves high-resolution features throughout the network
Segment Anything Model (SAM) with geospatial fine-tuning is gaining traction for interactive annotation

Foundation Models for Remote Sensing

2024-2026 saw the emergence of geospatial foundation models — large models pre-trained on massive satellite image datasets.

Key Models

IBM/NASA Prithvi: Pre-trained on harmonized Landsat-Sentinel data. Open-source. Good for land cover, crop classification, and flood detection. Fine-tunes well with limited labeled data.

Google’s geospatial FM: Powers Dynamic World and other Google Earth Engine products. Not fully open, but accessible through the Earth Engine API.

Clay Foundation Model: Open-source, trained on diverse satellite imagery including SAR. Strong on multi-modal geospatial tasks.

SatMAE and Scale-MAE: Masked autoencoder approaches adapted for satellite imagery. Learn strong representations from unlabeled data.

Why Foundation Models Matter

Traditional approach: collect labeled data for your specific task, train a model from scratch. This requires thousands of labeled examples.

Foundation model approach: start with a model that already understands satellite imagery, fine-tune on your specific task with 50-200 labeled examples. Dramatically reduces the data and compute needed for new applications.

Data Pipeline

A realistic satellite imagery AI pipeline:

1. Data Acquisition
   - Download from Copernicus, USGS, or commercial providers
   - Define area of interest and date range

2. Pre-Processing
   - Atmospheric correction (Sen2Cor for Sentinel-2)
   - Cloud masking (s2cloudless or Fmask)
   - Georeferencing verification
   - Band selection and stacking

3. Feature Engineering
   - Spectral indices (NDVI, NDWI, NDBI, etc.)
   - Texture features (GLCM)
   - Temporal composites (median, max NDVI)

4. Model Training / Inference
   - Tile imagery into chips
   - Train or apply model
   - Post-processing (morphological operations, CRF smoothing)

5. Validation
   - Compare against ground truth
   - Accuracy metrics (IoU, F1, overall accuracy)
   - Visual inspection of edge cases

6. Integration
   - Export to GIS formats (GeoTIFF, vector shapefiles)
   - Serve via web mapping (GeoServer, Mapbox)
   - Dashboard and reporting

Tools and Platforms

Processing

Google Earth Engine: Cloud-based, massive data catalog, JavaScript/Python API
Microsoft Planetary Computer: Azure-hosted, STAC-compatible, open data
Rasterio + GeoPandas: Python libraries for raster/vector geospatial processing
GDAL: The foundation library everything else builds on

Deep Learning

TorchGeo: PyTorch library specifically for geospatial deep learning — handles datasets, samplers, and transforms
Raster Vision: End-to-end framework for satellite image deep learning
Solaris: CosmiQ Works library for geospatial ML (SpaceNet heritage)

Annotation

Label Studio: Open-source, supports geospatial imagery with GeoTIFF import
CVAT: Computer Vision Annotation Tool, works well for object detection annotation
QGIS: Manual digitization for creating training labels from imagery

Real-World Applications

Deforestation Monitoring

Brazil’s PRODES and DETER systems use satellite AI to detect deforestation in near-real-time across the Amazon. AI processes Sentinel and Landsat imagery weekly, flagging new clearings for enforcement teams. This has directly contributed to reducing illegal deforestation.

Disaster Response

After earthquakes, hurricanes, or floods, AI processes satellite imagery within hours to map damage extent, identify blocked roads, and locate displaced populations. Organizations like the Humanitarian OpenStreetMap Team use AI-assisted building damage classification to prioritize relief efforts.

Agricultural Monitoring

AI analyzes satellite imagery to estimate crop yields, detect irrigation problems, monitor crop health, and verify insurance claims. Planet’s daily imagery combined with AI models gives farmers field-level insights throughout the growing season.

Urban Planning

City planners use satellite AI to track informal settlement growth, map impervious surfaces (for stormwater planning), monitor construction activity, and assess green space distribution.

Common Pitfalls

Ignoring atmospheric correction. Raw satellite imagery has atmospheric effects baked in. Comparing uncorrected images from different dates will produce false change detections.
Training and testing on adjacent tiles. Satellite imagery is spatially autocorrelated — nearby tiles are similar. If training and test tiles are adjacent, your accuracy is inflated. Use spatially separated train/test splits.
Ignoring class imbalance. If 95% of pixels are “not building,” a model that predicts “not building” everywhere gets 95% accuracy. Use IoU, F1, or balanced accuracy.
Resolution mismatch. A model trained on 50cm imagery won’t work on 10m imagery (and vice versa). Always match your model to your data resolution.
Ignoring temporal context. A green field in June might be a parking lot in January. Use multi-temporal data or be explicit about the seasonal context.

Key Takeaways

Satellite imagery is multi-spectral, multi-temporal, and massive — AI is essential for analysis at scale
Spectral indices (NDVI, NDWI) are powerful features that should complement raw bands
Foundation models dramatically reduce labeled data requirements for new tasks
U-Net remains the workhorse for segmentation; object detection uses standard architectures adapted for overhead imagery
Spatial train/test splits are critical for honest evaluation
Google Earth Engine and TorchGeo are the most productive starting points for new projects
The field is moving from task-specific models to foundation models that generalize across geospatial applications

AI in Satellite Imagery and Remote Sensing: A Practical Guide

AI in Satellite Imagery and Remote Sensing: A Practical Guide

Why Satellite Imagery Is Different

Core AI Applications

Land Use and Land Cover Classification

Change Detection

Object Detection

Semantic Segmentation

Foundation Models for Remote Sensing

Key Models

Why Foundation Models Matter

Data Pipeline

Tools and Platforms

Processing

Deep Learning

Annotation

Real-World Applications

Deforestation Monitoring

Disaster Response

Agricultural Monitoring

Urban Planning

Common Pitfalls

Key Takeaways

Related reads

Stay ahead of the AI curve