This is typically done with “features”. Features are functions that takes a small, localized group of pixels and outputs some value. An application takes these features and lines up the inputs based on common features it finds in each video stream’s frames.
To line up the images, an algorithm that finds a few “really good” feature matches, and then refines the precision with the remaining data is used, called RANSAC, or Random Sample Consensus.
In order for this to work reliably, the specific feature used to find the common areas in the video frames must be independent of location, scale, rotation, and mild changes in illumination.