This study focused on the possibility of implementing a vision-based architecture to monitor and detect the presence of trash or valuables in shared cars. The system was introduced to take pictures of the rear seating area of a four-door passenger car. Image capture was performed with a stationary wide-angled camera unit, and image classification was conducted with a prediction model in a remote server. For classification, a convolutional neural network (CNN) in the form of a fine-tuned VGG16 model was developed. The CNN yielded an accuracy of 91.43% on a batch of 140 test images. To determine the correlation among the predictions, a confusion matrix was used, and in addition, for each predicted image, the certainty of the distinct output classes was examined. The execution time of the system, from capturing an image to displaying the results, ranged from 5.7 to 17.2 s. Misclassifications from the prediction model were observed in the results primarily due to the variation in ambient light levels and shadows within the images, which resulted in the target items lacking contrast with their neighbouring background. Developments pertaining to the modularity of the camera unit and expanding the dataset of training images are suggested for potential future research.