Detectron2 FPN + PointRend Model for amazing Satellite Image Segmentation


The main objective of this task is to perform semantic segmentation on satellite images to segment each pixel of the image into either of the five classes considered which are greenery, soil, water, building, and utility. Constraints made are anything related to plants and trees like forest, fields, bushes, etc. are considered as single class greenery. Same for soil, water, building, and utility (roads, vehicles, parking lots, etc.) classes.

Data Preparation for Modeling

We have created 500 random RGB satellite images using Google Maps API for modeling. For segmentation tasks, we need to prepare annotated images in the format of RGB masked images with height and width same as the input image and each pixel value corresponds to the respective class color code (i.e greenery — [0,255,0], soil — [255,255,0], water — [0,0,255], buildings — [255,0,0], utility — [255,0,255]). For the annotation process, we choose labelme tool. Additionally, we have performed image augmentations like horizontal flip, random crop, brightness alterations on images to let model robustly learn the features. After annotations are done, we made train and validation split to the dataset in ratio 90:10. Below is a sample image from the training dataset and the corresponding RGB masked image.

Fig 1: Sample image with corresponding annotated RGB mask from the training dataset

Model understanding

For modeling, we have used the Basic FPN segmentation model + PointRend model from Facebook’s Detectron2 library. Now let us understand the architecture of both models.

Basic FPN Model

FPN (Feature Pyramid Network) mainly consists of two parts encoder and decoder. Image is processed into final output by passing through encoder first and then through the decoder and finally through a segmentation head for generating pixel-wise class probabilities. In encoder bottom-up approach is performed using ResNet encoder and in decoder top-down approach is performed using properly structured CNN network.

Fig 2: Feature Pyramid Network (FPN) mode process flow (Image Source [1])

PointRend Model

The basic idea of the PointRend model is to see segmentation tasks as computer graphics rendering. Same as in rendering where pixels with high variance are refined by subdivision and adaptive sampling techniques, the PointRend model also considers the most uncertain pixels in semantic segmentation output, upsamples7t them, and makes point-wise predictions which result in more refined predictions. The PointRend model performs two main tasks to generate final predictions. These tasks are,

Points Selection Strategy

During inference random points are selected where the probabilities in the coarse prediction output (prediction vector which has a resolution equal to 1/4th input image) from the FPN model have class probabilities near to 1/no. of classes, i.e., 0.2 in our case as we have 5 classes. But during training instead of selecting points only based on probabilities first, it selects kN random points from a uniform distribution. Then selects βN most uncertain points (points with low probabilities) from these kN points. Finally remaining (1 — β)N are sampled from a uniform distribution. For the segmentation task during training k=3 and β=0.75 have shown good results. See fig 3 for more details on point selection strategy during training.

Fig 3: Point selection strategy demonstration (Image source [2])

Point-Wise Predictions

Point-wise predictions are made by combining two feature vectors

Combined Model (FPN + PointRend) Flow

Now that we understood the main tasks of the PointRend model let us understand the flow of the complete task.

Fig 4: PointRend model process flow (Image source [2])
Fig 5: PointRend model upsampling and point-wise prediction demo for 4*4 course prediction vector (Image source [2])


For model training, we have used Facebook’s Detectron2 library. The training was done using Nvidia Titan XP GPU with 12GB VRAM and performed for 1 lakh steps with an initial learning rate of 0.00025. The best validation IoU was obtained at the 30000th step. The accuracy of Detectron2 FPN + PointRend outperformed the UNet model for all classes. Below are some of the predictions from both models. As you can see Detectron 2 model was able to distinguish features of greenery and water class when U-Net failed in almost all cases. Even boundary predictions of the Detectron2 model are far better than U-Net’s.

Fig 6: Sample predictions from UNet and Detectron2 model. Per image left is the prediction from UNet model, the middle is original RGB image and right is the prediction from Detectron2 model


In this blog, we have understood how Detectron 2 FPN + PointRend model performs segmentation on the input image. PointRend model can be applied as an extension to any image segmentation tasks for getting better predictions at class boundaries. As further steps to improve accuracy, we can increase the training dataset, do augmentations on images, play with hyperparameters like learning rate, decay, thresholds, etc.


[1] Feature Pyramid Network for Multi-Class Land Segmentation



Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store


Affine is a provider of analytics solutions, working with global organizations solving their strategic and day to day business problems