While convolutional neural networks (CNNs) have successfully been applied for skin lesion classification, previous studies have generally considered only a single clinical/macroscopic image and output a binary decision. In this work, we have presented a method which combines multiple imaging modalities together with patient metadata to improve the performance of automated skin lesion diagnosis. We evaluated our method on a binary classification task for comparison with previous studies as well as a five class classification task representative of a realworld clinical scenario. We showed that our multimodal classifier outperforms a baseline classifier that only uses a single macroscopic image in both binary melanoma detection (AUC 0.866 vs 0.784) and in multiclass classification (mAP 0.729 vs 0.598). In addition, we have quantitatively showed the automated diagnosis of skin lesions using dermatoscopic images obtains a higher performance when compared to using macroscopic images. We performed experiments on a new data set of 2917 cases where each case contains a dermatoscopic image, macroscopic image and patient metadata.