Quantization
Last updated
Last updated
Now comes the part where we will compress the YOLOv3 model. This compression is needed to fit the PYNQ-Z2 board as I said millions of times, I know ... We will specifically execute a process called quantization which reduces the bit-width of the weights or activation's of the Neural Network. For the PYNQ-Z2 we will have to limit these values to 8 bit since that's the maximum width of the channel connecting the PL to the PS.
To allow DNNDK to perform quantization on the YOLO model, we need to configure a quantization script. First, we will be talking about some specific parameters and then I will show the rest of the script and teach how to complete it.
The first thing to know is the input and output names on this network. To do this we will use DECENT_Q to run an inspection on the model so we are given the required information. Insert this code on the terminal and change the path of the model according to your implementation:
You should have a similar output like this one I got. Remember to ignore the numpy warnings, it is just related to the fact that we are using a older version.
As you can see, the inspection resulted in one input called "input_1" and 3 outputs called "conv2d_59/BiasAdd", "conv2d_67/BiasAdd" and "conv2d_75/BiasAdd". The second part of the image refers to the inspection of Tiny YOLO - a simplified version of YOLOv3 and you can clearly see that the number of outputs is smaller enphacizing the fact that this Neural Network is smaller than the YOLO we will be working on.
From the inspection output you can also see that the variable "shapes" has 3 question mark signs on the first fields. What does it mean? This means that those parameters are arbitrary in size, meaning you can chose whatever value you want. The first value represents the number of images simultaneously processed by the Network, the second represents the height of the image, the next is the width of the image and the last is the number of channels. Three channels represent an RGB image. This next image has a representation of this.
The next thing we need is a set of images with objects and annotations about those objects on the images. In other words we need a set on which each image is associated with the information of the objects and their bounding boxes. This is what we call Dataset. The best approach here is to use a pre-existing Dataset with a lot of images and annotations as the results of the network will have a better chance of being accurate. One of the most famous Datasets is the COCO (Common Objects in COntext). This dataset has 80 classes (80 types of possible objects) and annotations for more than 118 thousand images! For this application this Dataset will be more than enough.
The download of the COCO dataset can be done on the official website on the Downloads tab. You want to download the "2017 Train images [118k/18GB]" and also the "Train/Val annotations [241MB]". You can use these links if you want:
The Dataset is about 19GB in size so get ready to wait for the download and to make your computer's storage suffer.
After the painfully long download is complete, I recommend you store the images and annotations on a folder called "yolo_dataset". Inside this folder you should store the images on a folder "images" and the file called "instances_train2017.json" on a folder called "labels". The Dataset should be placed on the quantization folder of the repository!
No no no no, don't open the folder with the images of the Dataset! Ubuntu will get slow and the folder will crash if you try to open it. It's too much images and the system doesn't handle them as well as Windows does.
Well, now I think the important stuff is covered and we now can take a look at the quantization script. The following images helps to give you an idea of the necessary parameters for the script.
The first parameter "input_frozen_graph" refers to the YOLOv3 model in Tensorflow. This model was copied to the quantization directory on the last chapter but we could indicate here the full path to it's location if it wasn't the case. The next parameters are "input_nodes" and "output_nodes" which correspond to the Neural Network input and output names we talked about earlier. On the "input_shapes" we are just going to specify the size as 416x416 as it is very common, leaving the other values as default. The parameter "method" corresponds to the quantization method and it can be non-overflow or min-diffs being that the first might have worst results in case of outliers, the second one is the default option and the one we will use. "Input_fn" is relative to a function that makes the pre-processing of the Dataset images so it can then send to the model on the right proportions. This function is based on the DNNDK examples but I stole it directly from Wu-tianze work as it is ready for our application.
Here is the quantization script completed as it should:
Before executing the quantization, you will have to open the calibration function and change the path to the training dataset images according to your location. I'm assuming you only need to change the user name since I asked you to place the dataset on the right place.
All the necessary files are located on the repository and you probably won't need to change anything else, just make sure the Tensorflow model "yolo.pb", the calibration function and the dataset images are present on the quantization folder and you should be fine.
Finally, to execute the quantization you simply run the command this commands:
The quantization process is very demanding and will take a long time considering the Dataset is enormous. For me it took 12 hours to finish but it might be faster according to your computer capabilities, specially if you have a GPU.
At the end of the process you probably had a nice night of sleep or you just went ride a bike down the mountain. What matters is there is a new file called "quantize_results" which has two Tensorflow files representing the YOLOv3 model quantized. We will be needing only the "deploy_model.pb".
The next step is to compile the model with a "DPU language" so I suggest you copy the "deploy_model.pb" to the compilation directory of the repository.
There is the possibility to use GPU and make the process of quantization much faster but you would need to install other dependencies. I recommend you check the DNNDK v3.1 User Guide to see how you can do it. I apologize I couldn't test the GPU way because I don't have one and I didn't have time