Another learning project!
So this time I wanted to see how quickly I could make a CNN recognize obvious defects in products. I wanted to both validate my own understanding of building CNNs and what kind of problems they could solve, but also wanted to test if I could do this with very little data and in a short amount of hours.
I was very happy to find out that the answer to both was - yes, can do!
The premise was quite simple - a bunch of coke cans, some with big scratches, some with missing tabs, some crumpled. The goal is to determine which cans are good and which are bad, so that e.g. an automatic quality assurance system could then flag the bad cans for removal.
I started by taking 50 photos of coke cans. I then randomly took five images out of the whole set; two intact cans and three broken ones.
These random cans were the final test cans. In a production case, five samples would definitely not be enough for the final testing, but they represent 10% of the 50 images I took and would have to do this time. Unlike in the randomly picked images above, there were also combinations of tabless but no scratches, both crumbled and scratched, etc, in the training and validation data. I actually had only five different coke cans at hand and I first took images of all of them intact, and then started breaking them in various combinations. Also, they are all opened since well, I had drank them earlier!
After having separated the test images and having cropped and resized the images I took, I augmented my data by a fair bit. Augmenting in machine learning is creating new samples by applying various transformations to existing samples.
I applied a random zoom and randomly flipped some of the samples horizontally.
I ended up with 450 images - ten transformed images per one original image, minus the final test images which, at this point, were completely separated from the data I used for training and tuning the network.
My final model was this:
model = Sequential() model.add(Conv2D(32,5,padding="same", activation="relu", input_shape=(200,290,3))) model.add(MaxPool2D((2, 2))) model.add(Conv2D(32,3,padding="same", activation="relu")) model.add(MaxPool2D((2, 2))) model.add(Conv2D(64, 3, padding="same", activation="relu")) model.add(MaxPool2D((2, 2))) model.add(Conv2D(128, 3, padding="same", activation="relu")) model.add(MaxPool2D((2, 2))) model.add(Conv2D(256, 3, padding="same", activation="relu")) model.add(MaxPool2D((2, 2))) model.add(Dropout(0.6)) model.add(Flatten()) model.add(Dense(512,activation="relu")) model.add(Dense(1, activation="sigmoid"))
Trained with the standard
adam optimization and 20 epochs.
I split the augmented data to two: 80% training (360 images) for training and 20% (90 images) for validation which I used for optimizing the network.
Then I combined the training set, validation set and the five image test set and ran predictions for them all. Perfect confusion matrix!
I also tried to run the same thing without augmentation. The resulting confusion matrix was this:
Still quite good I'd say considering the data size, but for production this rate of false negatives would be far too high.
In the end, I used something like 8 hours to this and only 50 photos were needed.
There are a few potential pitfalls - the data could be simply too little to draw good conclusions from. The network could fail with a larger sample set. Also, since I only took images from 5 different cans, it could be that there's a bias introduced from that. I may have held the camera differently for good and bad cans, which could have screwed the samples up. And I may have coded the confusion matrix generation wrong and my model isn't actually as good as I think it is. But eh, it is what it is - I think this is enough to show that the above approach can work. Pun intended.
My initial network model was copy pasted from here: https://www.analyticsvidhya.com/blog/2020/10/create-image-classification-model-python-keras/
And the idea for augmentation came from here: https://blog.keras.io/building-powerful-image-classification-models-using-very-little-data.html
Overall I am very satisfied with this result.
The code, with the cropped images of the coke cans, is available on GitHub: https://github.com/tzaeru/cnn-quality-assurance
augment.py generates augmented data from the images in the
true-validation directory has the images I separated early on from the rest to do some final testing with.
stats.py generates the confusion matrix and gives some stats.
Unfortunately I ended up using OpenCV with the augmentation and it's a quite large dependency, sorry about that. Other than that you would need Python3, Keras, numpy and scikit-learn.