If we have high resolution images, their respective tensor will also be very large, therefore resulting in longer computation times for the 1 on 1 comparison to other image tensors.įinally, we add the resulting resized tensor to our imgs_matrix list, and go on with the next file in our directory (if present).Ĭool - after we have written this function, we are left with a list of tensors for all images in our directory. We do this, to speed up the comparison process of our images. we want it to have a pixel width and height of 50. In our last step, we resize our image tensor to a pixel size of 50 i. We then check if it has been successfully converted to a numpy n-dimensional array, and make sure our tensor has 3 layers at maximum. If both outputs are valid, the function moves on to the next line, where we use the opencv library to decode our image file and convert it to a tensor: img = cv2.imdecode(np.fromfile(directory + filename, dtype=np.uint8), cv2.IMREAD_UNCHANGED) If it is not an image, this function will return None and therefore move on to the next file in the directory. We first check if the file is accessible with: os.path.isdir(directory + filename)Īnd then we check whether it is an image: imghdr.what(directory + filename) There is a very handy library called imghdr which can help us in this process. We read into our computer directory, iterate over all the files and make sure that the format of the files in our folder are actually images (i. This function will create a tensor for each image our algorithm finds in a specific folder. A matrix therefore is a 2-dimensional tensor. A tensor is a container which can hold data in N dimensions. Those of you that are familiar with machine learning and computer vision may already know that images can be translated into matrices, or more precisely into a tensor. In order to compare images to one another, we need to somehow translate them into comparable, computer-readable i. □ 1 | In Theory: Translating Images to Numeric Data View the difPy project on GitHub and on PyPi. In today’s article, I will go through the process of writing a Python 3.8 script for the automated search for duplicate images in a folder on your local computer. That’s when I thought: let’s automate this process. Furthermore, some images may be easily classifiable as being duplicates at first look, but some images may need precise checking and may also result in you deleting images, that in reality were no duplicates. Especially, as mentioned above, if you have to go through thousands of images. I have found myself in this scenario quite a few times. Did you ever find yourself in the situation of going through hundreds, maybe even thousands of images, only to realize that some actually look a bit “ too similar”? Could they be duplicates? Then you probably checked both image resolutions, to then delete the one having the lowest.
0 Comments
Leave a Reply. |