This blog post will be the first in the series about OpenCV and Computer Vision.
Since my university days, I was impressed by Computer Vision. The eccentric possibilities of the field pushed the boundary of what was once only part of a sci-fi products like Star Trek (face recognition? biometrics? self-driving cars?). Not only that, but CV seems like the most tangible and streamlined way to introduce a newbie to ML (I’m not saying that it is in any way easy, but compared to other subdivisions it seems more… tactile). Now being already deep down in my Software Engineering career (going on 7 years strong), sometimes I get bored and want to learn new things, so why not go with Computer Vision?
Ok, so what is Computer Vision? Computer Vision is a fascinating discipline/category/subdivision of ML that combines artificial intelligence, image processing, and machine learning techniques, all in order to analyze and comprehend visual data similarly to how our eyes are doing it. We can utilize CV to recognize and classify objects and faces and analyze images and videos, we can use it in robotics, automation, VR, AR (virtual and augmented reality), and medical imaging, just to mention a couple of areas of use.
A group of followers and believers call it the next benevolent revolution in technology, others call it the harbinger of the end of privacy, the death of selective information sharing as we know it. You can decide on your own where you stand, I am here to go through the moves in hopes we can learn a bit more about this fascinating field. So without further ado, let us begin.
OpenCV Setup
Pip and Venv
First thing first, in order to leverage the vast potential of CV we need the hammer and the nail. In our case, the hammer is the Python programming language, elegant in syntax and powerful with features, and our nail (or vice versa?) is the beautiful open-source library called OpenCV. It provides a comprehensive list of features, algorithms, and functionalities for various computer vision tasks. We will, most probably also need some related Python libraries such as NumPy and SciPy. Numpy (numerical Python) is a package that provides support for numerical operations in Python. SciPy (scientific Python) is built on top of Numpy and provides additional high-level mathematical algorithms for scientific computing.
After the installation of Python, we’ll need to set up a development environment, and two tools called pip and venv will help us in this endeavor. Pip stands for “Pip Install Packages” and it is the default package manager for Python. Basically, it lets us install and manage packages (if you worked in the web ecosystem, think of npm only for Python). venv stands for virtual environments and it is useful for creating isolated Python environments. Why? So we can isolate our dependencies and environments with their own set of packages and dependencies, to mention only the most important rationale. Thankfully I’m running Linux, so by just checking the version in the terminal I know it is installed. Installing all of these packages for other operating systems is a bit out of scope for this article, but it shouldn’t be too hard to do it by following a specific tutorial.
mehmed@mehmed-Nitro-AN515-45:~/Projects/CVWork$ python3 --version Python 3.10.6
The OpenCV packages
The next thing to do is to install the OpenCV library and some additional packages. Looking at the names of installed packages below you’ll notice all of the keywords we talked about in the last paragraph – Pip, NumPy, SciPy, and Venv.
mehmed@mehmed-Nitro-AN515-45:~/Projects/CVWork$ sudo apt install python3-opencv -y mehmed@mehmed-Nitro-AN515-45:~/Projects/CVWork$ vsudo apt install python3-pip python3-numpy python3-scipy python3-venv -y
The next thing to do is to create a new directory and a virtual environment in the same directory.
mehmed@mehmed-Nitro-AN515-45:~/Projects/CVWork$ mkdir CVWork mehmed@mehmed-Nitro-AN515-45:~/Projects/CVWork$ cd CVWork/ mehmed@mehmed-Nitro-AN515-45:~/Projects/CVWork$ python3 -m venv test_env (test_env) mehmed@mehmed-Nitro-AN515-45:~/Projects/CVWork$ source test_env/bin/activate
python3 -m venv test_env
creates a new virtual environment named test_env using the venv
module. After creating the virtual environment we activated it using the last command source test_env/bin/activate
(Unix systems only!). We need to activate our virtual environment in order to set up the environment variables and configure our machine’s PATH to use the virtual environment’s Python version and installed packages. From now on going forward we can use pip in our virtual environment (keep an eye out for the braces in our command prompt, they tell us the virtual environment we use). Using the pip command we’ll now install the opencv-python
modules.
(test_env) mehmed@mehmed-Nitro-AN515-45:~/Projects/CVWork$ pip3 install opencv-python
I/O Functionality
After setting everything up, we can get to the meaty basics of OpenCV, which are reading images from files, video files, or live camera devices, writing and overwriting images to image files, manipulating image data, and handling keyboard and mouse input. If we simplify CV to the bare bones we can think of it as a function. A function that takes an input (in our case let’s say an image) and produces an output (again an image). Thinking of images first can be easier, but we can also have other inputs such as videos and camera feeds, and other outputs such as images and videos, but also image/video processing results, semantic segmentations, data recognitions, and others.
The most common formats of images used are the usual suspects – the PNGs, JPEGs, TIFFs, and BMPs.
The image itself
In OpenCV an image is represented as a two-dimensional array of pixels. Each pixel contains information about the color or intensity at a specific location in the image. The image can be grayscale, in which every pixel represents a single intensity value, or in color in which each pixel contains multiple color channels (e.g. Red, Green, Blue). Now, essentially, behind the scenes an image in OpenCV is just a NumPy array with the proper shape (height, width, channels) where height and width represent the number of pixel rows/columns in the image, and channels represent the number of color channels. If the image below were in color the first pixel would have three values instead of only one, one for each color channel.
import numpy import cv2 img = numpy.zeros((3, 3), dtype=numpy.uint8) cv2.imshow("Image", img) cv2.waitKey(0) cv2.destroyAllWindows()
This tiny snippet above creates a 3×3 NumPy array consisting of zeroes using numpy.zeroes
. The additional argument dtype
specifics that each pixel value is an 8-bit unsigned integer ranging from 0 to 255. The image is displayed using cv2.imshow
, with the title and image. The last two lines wait for keypresses and close all other OpenCV windows. If we print this image we would get the following result:
[[0 0 0] [0 0 0] [0 0 0]]
Additionally, we can convert this image into the BRG format (blue-green-red) using cv2.cvtColor
function. A three-element array actually represents each pixel in this format, and each integer represents one of the three color channels. By printing img.shape
we retrieve the shape of the image (the width, height, and the number of color channels).
img = cv2.cvtColor(img, cv2.COLOR_GRAY2BGR) print(img) print(img.shape)
[[[0 0 0] [0 0 0] [0 0 0]] [[0 0 0] [0 0 0] [0 0 0]] [[0 0 0] [0 0 0] [0 0 0]]]
Changing image data
Let us go through a simple code snippet that should show us the most basic way how OpenCV works on images.
import cv2 img = cv2.imread("Lenna.png"); for x in range(50): for y in range(50): img[x, y] = [255, 255, 255] for x in range(50, 100): for y in range(50): img.itemset((x, y, 1), 25) img[100:150, :50, 2] = 25 slice = img[200:300, 200:300] img[300:400, 300:400] = slice cv2.imshow("Image", img) cv2.waitKey(0) cv2.destroyAllWindows()
The first block of code above, consisting of the for loop will create a white box using plain array/list manipulations. Because of performance reasons, it is only suitable for very small sections of images.
The second block of code uses the itemset
method to find a particular pixel, and then change the corresponding color channel. img.itemset((x, y, 1), 25)
will set the red channel of the pixels in positions x and y to 25.
The third slice of code uses Numpy’s array slicing, in order to specify a range of indices. img[100:150, :50, 2] = 25
will grab pixels 0 to 50 on the x coordinate and 100 to 150 on the y coordinates, and use the color channel 2 (corresponding to green).
The last block of code binds a selection to a variable, which it later overwrites in another selection, cutting and pasting the slice of the region to another place in the image.
Working with videos
When it comes to working with videos, we can use the two classes that OpenCV provides us – VideoCapture
and VideoWriter
. Below I’ve written a small snippet of code again showing the most basic functionalities. The snippet below will grab a screenshot every time we click the mouse.
import cv2 videoCapture = cv2.VideoCapture('p4g.mkv') fps = videoCapture.get(cv2.CAP_PROP_FPS) total_frames = int(videoCapture.get(cv2.CAP_PROP_FRAME_COUNT)) size = (int(videoCapture.get(cv2.CAP_PROP_FRAME_WIDTH)), int(videoCapture.get(cv2.CAP_PROP_FRAME_HEIGHT))) print("FPS", fps) print("total_frames", total_frames) print(size) duration = total_frames / fps print(duration) saved_frame = False def handle_key_press(event, x, y, flags, param): global saved_frame if event == cv2.EVENT_LBUTTONDOWN: saved_frame = True cv2.namedWindow('Video') while videoCapture.isOpened(): cv2.setMouseCallback('Video', handle_key_press) success, frame = videoCapture.read() if not success: break cv2.imshow('Video', frame) if saved_frame: cv2.imwrite('ss.png', frame) print("Frame Saved") saved_frame = False if cv2.waitKey(30) & 0xFF == ord('q'): break videoCapture.release() cv2.destroyAllWindows()
The first couple of lines of code do not contain any logic but they are useful as they show us how to retrieve characteristics of the video. The video itself is a cute little video of one of my favorite games Persona 4 Golden.
By using the .get()
method which returns a value for the specified property, we can for example get the FPS of the video, which in our case is 30. Using videoCapture.get(cv2.CAP_PROP_FRAME_COUNT)
gets us the total amount of frames – 544. We can simply calculate the length of the video by dividing the number of frames by frames per second and we get that our video is close to 18 seconds long. Additionally, by retrieving cv2.CAP_PROP_FRAME_WIDTH
and cv2.CAP_PROP_FRAME_HEIGHT
we can get the width and height of the video. The full list of properties can be viewed on the official documentation.
FPS 29.97 total_frames 544 (640, 360) 18.151484818151484
Next, we create a boolean variable that will be switched to true when we click the left mouse button. The real meat of the code snippet above is the while
loop, during which execution we trigger the mouse callback to the method that we created. If the saved_frame
is true, we take the snippet of the image using cv2.imwrite('ss.png', frame)
. This is a fairly simplistic example, but one that shows the possibilities of the library. The last block of code if cv2.waitKey(30) & 0xFF == ord('q'):
listens to the keypress and allows us to control the termination of the video by pressing the ‘q’ key.
while videoCapture.isOpened(): cv2.setMouseCallback('Video', handle_key_press) success, frame = videoCapture.read() if not success: break cv2.imshow('Video', frame) if saved_frame: cv2.imwrite('ss.png', frame) print("Frame Saved") saved_frame = False if cv2.waitKey(30) & 0xFF == ord('q'): break
And this is our result (getting a screenshot of a frame from a video, you can try this on your videos)!
Conclusion
I hope that this article was a decent introduction to OpenCV, in the next posts we will do things that are a bit more complex, hope you subscribe to it! For more blog posts related to software development, AI and tech please check my blog on the following link.