An image is a sum of waves.
The 2D Fourier transform turns a picture into a list of cosine waves, each at a different frequency and orientation, each with its own amplitude. Add them all back up and you get the picture again.
Below, a small portrait and its spectrum side by side. The centre of the spectrum is the slow stuff — broad shapes, average brightness. The edges are the fast stuff — sharp lines, fine texture. Erase the edges and the picture goes soft. Erase the centre and only the edges remain.
An image, as a sum of waves.
Every image is a sum of 2D cosine waves at different frequencies and orientations. The 2D Fourier transform tells you how much of each. Click any cell of the spectrum to see the wave it represents; drag the radius to keep only nearby frequencies and watch the picture lose, or recover, its details.
There is one more reason this transform matters, and it is the reason image-kernels and the Fourier transform are really the same essay viewed from two sides: convolution in the spatial domain is multiplication in the frequency domain. Sliding a 3×3 blur kernel across a photograph and multiplying the photograph’s spectrum by a low-pass envelope produce the same result, give or take edge handling.
That is why JPEG works, why MP3 works, why every audio plugin you have ever used works. Reach into the spectrum, scale the frequencies you care less about, transform back. Whole subfields of signal processing are footnotes on this trade.
What we are doing here is the textbook discrete Fourier transform, evaluated naively at O(N⁴). Real systems use the FFT for O(N² log N) per axis. The arithmetic is faster; the picture is the same.