Introduction to Mask RCNN

Mask RCNN is a simple, flexible, and general framework for object instance segmentation.

Before we move towards Mask RCNN, Let’s understand instance segmentation. Instance segmentation is the task in which the model detects and delineates each distinct object of interest that appear in the image. In Instance Segmentation the model Identifies each object instance of each pixel for every known object within an image. Here the labels are instance-aware.

Example of instance segmentation (and also of Mask RCNN) –



This post will assume that the reader has some basic knowledge about deep learning so I won’t go explaining about CNNs and other deep learning fundamentals here. I will soon write another blog to explain these things.

As the name sounds, Mask RCNN generates a “Mask” around the object(see the image above). Mask RCNN extends the architecture of Faster RCNN by adding another object mask prediction branch in parallel with bounding box prediction branch. Mask RCNN generalizes really well and it provides a solid baseline for future research in instance segmentation.

Mask RCNN Network Architecture –

Screenshot 2019-04-20 at 8.34.16 AM.png

As we have seen above, Mask RCNN has been built on top of Faster RCNN(Follow the blog because a blog post about it is going to come really soon). Faster RCNN can have different backbones for its CNN layer. Mask RCNN have 2 architectures, based on Faster R-CNN with Vanilla ResNet and Faster R-CNN with ResNet-FPN(Feature Pyramid Networks). Using a ResNet-FPN backbone for feature extraction with Mask RCNN gives excellent gains in both accuracy and speed.

Mask RCNN –

Mask RCNN adopts the same two-stage procedure, with an identical first stage (which is RPN, Region Proposal Network). In the second stage, in parallel to predicting the class and box offset, Mask R-CNN also outputs a binary mask for each RoI(Region of Interest).

Since Mask RCNN is a multi-task job,  the loss is L = L(cls) + L(box) + L(mask) .

We need to make a note here that the bounding classification branch predicts the class and not the mask! This decouples the job of classification and mask prediction. This formulation is key for good instance segmentation results.

Another important thing mentioned in the paper was ROI Align. ROI Align simply removes quantization from ROI Pooling layer and this helps in properly aligning features with the input.


Play with Mask RCNN code –

First clone this github repository

> git clone

then go inside the repository

> cd Mask_RCNN

setup the project

> pip3 install -r requirements.txt

> python3 install –user

Download pre-trained COCO weights (mask_rcnn_coco.h5) from the releases page.

(Optional) To train or test on MS COCO install pycocotools from one of these repos. They are forks of the original pycocotools with fixes for Python3 and Windows (the official repo doesn’t seem to be active anymore).

Windows: You must have the Visual C++ 2015 build tools on your path (see the repo for additional details)

now start jupyter notebook

> jupyter notebook

then open your browser and go to this link –



What next ?

There are so many cool stuff which you can do with Mask RCNN.

  1. Check out the amazing projects people made using Mask RCNN –
  2. Human Pose Estimation – Screenshot 2019-04-20 at 11.41.01 AM.png
  3. You can also solve this kaggle problem if you are interested –


References –

  1. Mask RCNN paper

One thought on “Introduction to Mask RCNN

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s