Build Melanoma Image Classification model with Convolutional Neural Network(CNN)

Mary Adewunmi
5 min readJun 10, 2021
Source

Melanoma is otherwise called malignant melanoma. It is a type of skin cancer that develops from the pigment-producing cells known as melanocytes. Melanomas typically occur in the skin but may rarely occur in the mouth, intestines or eye (uveal melanoma) — Wikipedia
Symptoms might include a new, unusual growth or a change in an existing mole. Melanomas can occur anywhere on the body.
Treatment may involve surgery, radiation, medication or in some cases, chemotherapy.

📌 Task at hand
This Notebook is written to identify melanoma in lesion images in the target data. The problem is to predict the probability (floating point) that the lesion in the image is malignant (the target).

The datasets can be found here.

The complete notebook can be found here.

📌 Importing the necessary libraries

📌 The datasets

The datasets were publicly made available by Kaggle and can be found here. It is provided by the International Skin Imaging Collaboration (ISIC), funded by the International Society for Digital Imaging of the Skin, which is an international initiative to improve melanoma diagnosis (ISDIS). The ISIC Archive houses the world’s largest collection of high-resolution dermoscopic photographs of skin lesions. Contributors to the images include:

1. Dermatology Service, Melanoma Unit, Hospital Clínic de Barcelona, IDIBAPS, Universitat de Barcelona, Barcelona, Spain
2. Memorial Sloan Kettering Cancer Center New York, NY
3. Department of Dermatology, Medical University of Vienna. Vienna, Austria
4. Melanoma Institute Australia. Sydney, Australia
5. The University of Queensland, Brisbane, Australia
6. Department of Dermatology, University of Athens Medical School

It has 9 classes of Skin diseases which are pigmented benign keratosis, melanoma, vascular lesion, actinic keratosis, squamous cell carcinoma, basal cell carcinoma, seborrheic keratosis, dermatofibroma and nevus.

📌 Splitting datasets into train and validation data using Keras image data generator

📌Visualizing the classes of skin lesion images in the dataset

The figure above showed 5 classes of lesion images but there are 9 classes, knowing fully well that we pick the images from train datasets and there are bound to be a repetition of image classes since our data was split randomly not sorted.

📌 Prefetching and Autotuning of Images as part of Image preprocessing

Prefetching of images is used to prepare the environment for training and ensuring images are taken from the disk without having I/O blocked while autotuning of images in batches is to prevent the large dataset from becoming a bottleneck while training the model.

📌Normalization

Image normalization is a process, often used in the preparation of data sets for artificial intelligence (AI), in which multiple images are put into a common statistical distribution in terms of size and pixel values; however, a single image can also be normalized within itself.

This is a very important part of preprocessing images because there are some images with large pixel intensity values and it may be impossible to extract features from such images with the current model. Another challenge with images having multiple colours is that it is computationally intensive to work with.

📌 Training the Model with CNN

There are three types of layers in CNN which are convolutional, pooling and fully connected layer with additional layers of dropout for reducing overfitting of models and finally the activation layer for adding non -linearity to the model. More on CNN can be found here.

Source

📌 Visualizing the model’s parameters

The accuracy of the model changes since the training images was randomly picked, the accuracy will resonate with the image type picked for training.

📌 Model accuracy

Visualizing the model performance below shows there is overfitting of the model because the training loss is constantly decreasing but the validation loss is not. This shows that the model has memorized the training data patterns.

📌Ways of boosting the performance include one or more combination of the following:

👉Reduce the layers of the neural network

👉 Add dropout and tune its rate✔

👉 Apply L2 normalization and tune the lambda value✔

👉 Freeze the number of neurons in each network layer so as to reduce the parameters

👉 Add more training data which is not always economical because of storage cost, a workaround is to augment the images

📌Augmentation

We augment the images considering the small dataset we used in a bid to manage space and lower the cost of training the images.

More from Augmentation can be found here

To re-train the model with the augmented images

👉Reduce the layers of the neural network✔

👉 Add more training data which is not always economical because of storage cost, a workaround is to augment the images✔

📌 View the model’s performance with augmented images

Accuracy/Loss

The model performance looks good and it ranges between 60 -65% depending on the size of the images randomly picked from the training data and can still be worked upon with an efficient net model. I will do that in my future post.

📌 Conclusion

Training medical images can be an uphill task if the image size is small and can cause overfitting problem but can be fixed by augmenting the available images. Also, the accuracy can further be improved with preprocessing operations like applying a filter to the image, tuning the parameters, using other network topology like EffNet.

The complete notebook can be found here.

📌References

✔Data curated from https://www.kaggle.com/c/siim-isic-melanoma-classification

https://en.wikipedia.org/wiki/Melanoma

https://neptune.ai/blog/data-augmentation-in-python

✔Source: https://medium.com/techiepedia/binary-image-classifier-cnn-using-tensorflow-a3f5d6746697

https://www.upgrad.com/blog/basic-cnn-architecture/

If you found this tutorial useful, kindly appreciate it with a clap👏. Constructive criticism is welcome👍

More information about me.

Happy coding😉.

--

--

Mary Adewunmi

I am a Data scientist/Deep learning Researcher with focus on using Deep learning/Computer vision for medical image diagnosis.