“Computer Vision algorithms enable machine to indentify and classify objects, then react accordingly. ”

Lecturer: Prof. Hamid Dehghani

F2F: 12 noon Weds， 3pm friday, 10am Mon-Zoom (Lab- section)

Matlab-Based tutorial

Robotic Vision

Start with what we look up things: to know what is where, by looking. P15~16

prior knowledge (physics etc.) matters

We see things cuz of light reflect..

different fre light react diff with material, thus how we collect the info.

Humans perceive elevtromagnetic radiation with wavelengths 360-760nm

E is Eberfy, c = speed of light,

**Photocell**

Only capture light from 1-direction

Multi cell

capture different intensity of the light with better direction resolution

**Pin Hole** for only projection

It is now a sharp image but throw away lot of info

Lenses

**Snell’s Law** - looks more shallow than real

So a lens to collect and focus more info…. with the snell’s law.

pupil control the light amount

Up-side brian process the down-side vision ?????

Check it out.

What our eyes see is actually upside-down..

How much magnify or reduce the image.

the back of the eyes is not flat..P34

Contains two types of Photorecepetors

**Rods (光杆)**~120M, sensitive but lack of spatial resolution as they converge to the same neuron within Retina.**Cones (视杆)**~6M, active at higher light levels with higher resolution as signal processed be several neurons.

RF is the area on which light must fall for neuron to be simulated

two types of Ganglion cells: : "on-center" and "off-center"

On-center: stimulated when the center of its receptive field is exposed to light, and is inhibited when the surround is exposed to light.

Off-center cells have just the opposite reaction

Lecture 1.2 - Human Vision (1).pdf P12 ~ 13

some ganglion cells are sensitive with the boundry…

Need more reading of the slides…..?

The rate of firing also tells info.

No.3 Not a total crossover? but a partial crossover. Cuz the brain needs info from both sides.

Vision generated by photoreceptors in the retina

The information leaves the eye by way of the optic nerve

There is a partial crossing of axons at the optic chiasm.

After the chiasm, the axons are called the optic tract. •

The optic tract wraps around the midbrain to get to the lateral geniculate nucleus (LGN)

The LGN axons fan out through the deep white matter of the brain and ultimately travel to primary visual cortex, at the back of the brain.

Three diff types of Cones.

Thrichromatic Coding…

Why so less blue cones?

How to discriminate wavelengths 2nm in difference?

camera has filters allow only one type of color light to go through

**But some colors do not exist?**

One can imaging Bluish-green or Yellowish-green, But NOT Greenish- red or Bluish-yellow!

Many forms of colour vision proposed – Until recently some hard to disapprove •

1930s: Hering (German Physiologist) suggested colour may be represented in visual system as ‘opponent colours’

Yellow, Blue, Red and Green – Primary colours

Trichromatic theory cannot explain why yellow is a primary colour

Bluish green, yellowish green, orange (red and yellow), purple (red and blue) OK

Reddish green?? Bluish Yellow??

Opposite to each other

Excitation and inhibition cancel each other; no change in signal.

We have Red-green Ganglion cell and Yellow-blue ganglion cell.

image scale function in matlab

squeeze -> edge more visible

Gradient of the intensity, namely how fast the pixel changing in intensity:

2 by 2 matrix for the conner:

**Robert**

**Sobel**

Then we can get a gradient matrix, by apply threshold we can get a binary edge image.

Edge value is actually comply **Gaussian** Ditribution, but can be quite noisy.

If we set up a threshold, we may get multi-border lines. Thus the utilization of **Canny**.

Apply Gaussian filter to smooth the image in order to remove the noise

Find the intensity gradients of the image, using Roberts, Prewitt, or Sobel, etc.

Apply gradient magnitude thresholding or lower bound cut-off suppression to get rid of spurious response to edge detection

Apply double threshold to determine potential edges

Track edge by hysteresis: Finalize the detection of edges by suppressing all the other edges that are weak and not connected to strong edges.

Highly Directed Work

Second order operators

Thresholding

Mean filter:

random distributed noisy (even out positive and negative noise)

Gaussian Filter:

It is good to have Second Derivative, zero crossing points can be a good edge estimator, but not robust for noise.

So

What cause intensity changes?

Gemetric:

surface orientation, depth, color and texture discontinuities

Non-geometric:

illumination, specularities (镜面反射), shadows and inter-reflections.

**Edge Descriptors**

Direction - perpendicular to the direction of maximum intensity change (i.e., edge normal)

Strength - related to the local image contrast along the normal

And Position

**Main Step in ED**

(1) Smoothing: suppress as much noise as possible, without destroying true edges.

(2) Enhancement: apply differentiation to enhance the quality of edges (i.e., sharpening)

(3) Thresholding: determine which edge pixels should be discarded as noise and which should be retained (i.e., threshold edge magnitude).

(4) Localization: determine the exact edge location.

Upsample: sub-pixel resolution might be required for some applications to estimate the location of an edge to better than the spacing between pixels

But it is super noise..

h is a Gaussian filter, but sliterly blur my edge

instead conv of h and f, we can also take differentiated G which saves one operation

Noise suppression-localization tradeoff.

– Smoothing depends on mask size (e.g., depends on σ for Gaussian filters).

– Larger mask sizes reduce noise, but worsen localization (i.e., add uncertainty to the location of the edge) and vice versa

We want good localzation and single response.

I got a thick edge, but not I chose the local maximum of the edge gradient direction.

**Non-maxima suppression**

Check if gradient magnitude at pixel location (i,j) is local maximum along gradient direction

Standard thresholding can only select “strong” edges, does not guarantee “continuity”.

Given the noisy image, design the best suitable algorithm to detect edges.

Given the calculated edges, how would you quantify accuracy?

**Why we want to match features？**

Tasks like Object Recognition, Tracking…

Good features should be robust to all sorts of nastiness that can occur between images.

Types of invariance:

illumination

Scale

Rotation

Affine

Full Perspective

How to achieve illumination invariance?

The easy way (normalized) - histogram

Difference based metrics (sift)

How to achieve scale invariance?

Pyramids

Down Sampling

Repeat until image is tiny

Run filter over each size image and hope its robust

Scale Space (

**Different Of Gaussian**(DOG) method) ?# TodoPyramid but fill gaps with blurred images

not down sampling, but blurring it..

Like having a nice linear scaling without the expense

Take features from differences of these images

If the feature is repeatably present in between Difference of Gaussians it is Scale Invariant and we should keep it.

Rotate all features to go the same way in a determined manner

Take histogram of Gradient directions

Rotate to most dominant (maybe second if its good enough

If rotation, looking at the **histogram**: will be same distribution but offset.

Hough Transform

Polar Space and Cartesian Space

coordinaties

Distance from the origin

The Hough transform is a common approach to finding parameterised line segments (here straight lines

**The basic idea:**

Each straight line in image can be described by an equation (

Each isolated point can lie on an infinite number of straight lines.

In the Hough transform each point votes for every line it could be on.

The lines with the most votes win.

**Hough Space **

(

It also conduct NMS to gain the best edge.

We need to set a threshold

A hough map

There are generalised versions for ellipses, circles

For the straight line transform we need to supress non-local maxima

The input image could also benefit from edge thinning

Single line segments not isolated

Will still fail in the face of certain textures

Hough transform technique is that it is **tolerant of gaps in feature boundary descriptions and is relatively unaﬀected by image noise**, unlike edge detecto

Segmentation of Ageing brain

atlas 地图集

Geometric (and Photometric) alignment of one image with another

Implemented as the process of estimating an optimal transformation between two images.

Images may be of same or different types (MR, CT, visible, fluorescence, ...)

Co-register the image

Landmarks: eyes, ears etc. or curve of features

Image values: conservation of intensity

need same dimension of resolution

hard to handle different features

different pixels value are more likely to belong to different group.

**The joint histogram**

Class of Transforms:

Rigid, not scaling (6-dimension)

Affine

Piecewise Affine

Typically use different affine transformation for different parts of the image

Non-rigid (Elastic)

some shrinking, some expanding or deforming

External forces drive transformation, Internal forces provide constraints.

**What similarity criterion to use?**

maintain the distances between features.

RMS

Mutual Info

maximize the possibility of the location given the pixel.

what is

?${p}_{i,j}$ What is

**Normalised cross-correlation**?

Spatial resolution: Pixel Size

Intensity resolution: Bits per pixel

Time resolution: Frames per sec.

**Spectral**resolution: Number of bands + bandwidth

**Image Statistics**

Mean, standard deviation

Histogram:

**frequency distribution**graph

Signal-to-noise (SNR)

Non-automated: taking 5~6 and average through.

**Histogram-based segmentation**

Thresholding challenges

How do we determine the threshold ?

Different regions / image areas may need different levels of threshold.

Many approaches possible

Interactive threshold

Adaptive threshold

Variance minimisation method (Otsu threshold selection algorithm)

**What is the OTSU?** #TODO

**Dilation**

adding a “layer” of pixels to the periphery of object

**Erosion**

removing a “layer” of pixels all round an object

Active contours (snakes)

Watershed

Level-set methods # TODO

Active shape model segmentation # TODO

**Active contours (snakes)**

**Watershed Segmentation**

About touching the world…

Robotic Manipulation

https://www.cs.bham.ac.uk/research/groupings/robotics

It is hard for people to interpret the first image.

We can use both of them at the same time.

Depth versus distance

How to measure depth and distance?

**Passive**

Stereophotogrammetry

Structure from motion

Dapth from focus

**Active**

TOF

Structured light imaging

Photometric stereo

But hard to process related image (i.e. find the matching pixels)

we have one camera, but moving…

predict where the canvas is, and needs more prior knowledge like the location of the camera.

move the lens that focus..

looking for sharp edges, but not any time that emerges.

It is possible but it is quite noisy.

**Active Stereophotogrammetry**

R200 Camera

Can project surface features

Multiple camera still do not interfere with each other

Holes if you don’t find correspondence.

noisy when multiple objects, so we only look at one direction at once.

We now have a wave, so a wave bouncing back..

collect different pixel at different time..

Dmitry..

Phase wrapping and unwrapping

goal is not the depth, but the surfaces…

convert depth data into point cloud

Try to find the function to build surfaces (gradient)

Representations: Untextured mesh and textured mesh

Robot coordinates

ICP algorithm

Multi-steps…

ICP…

Others

**Covariance**

measure of how much each of the dimensions vary from the mean with respect to each other

Covariance Matrix

Diagonal is the variances of x, y and z

hence matrix is symmetrical about the diagonal$cov(x,y)=cov(y,x)$ N-dimensional data will result in NxN covariance matrix

**How to interpret covariance?**

The value itself that it doesn’t mean anything, but can use to determine the correlation and its **sign**.

If it is 0: they are independent.

It can simplify a dataset

A linear transformation that chooses a new coordinate system for the data set such that:

greatest variance by any projection of the data set comes to lie on the first axis (then called the first principal component),

the second greatest variance on the second axis,

and so on

It eliminates the later components for reducing dimensianlity.

The dimensions in PCA will be orthonal.

What is the principal component.

By finding the

**eigenvalues and eigenvectors**of the covariance matrix, we find that the eigenvectors with the**largest eigenvalues**correspond to the dimensions that have the**strongest correlation**in the dataset.

PCA is a useful statistical technique that has found application in:

fields such as face recognition and image compression

finding patterns in data of high dimension

Then, we gain the covariance matrix:

N can be the number of pixels in an image.

How much that features contribute, and choose the top-k features.

**Ideas:**

Eigenfaces: the idea

Eigenvectors and Eigenvalues

Co-variance

Learning Eigenfaces from training sets of faces

Recognition and reconstruction

**Eigenfaces**

Think face as a combination of some **components** of faces.

These basis faces can be differently weighted to represent any faces

So we can use different vectors of weights to represent faces.

**How do we pick the set of basis faces?**

**Statistical criterion** for measuring the notion of “best representation of the differences between the training faces”

**How to learn?**

training set rearrange into

**2Dmatrix**…Rows: Each value, Columns: Each pixel value

Calculate Co-variance matrix

Then find the

**eigenvectors**of that covariance matrix.Sort by

**eigenvalues**and find the top-features.Get the principal components

${v}_{k}$

**Image space to face space.**

**Recognition in face space**

The cloest face in the face space is the chosen match.

But if with hat or glasses ???

**Image registration**

Some Books

Image Matting Problem

Open Source **Detectron2** based on PyTorch

Temporal superresolution