We select key corresponding features such as eyes, nose, hairline, etc., to define corresponding points to the two images. I selected around 40 points. Now, we can compute the midway shape by taking the average of these corresponding points. We can also visualize the triangle mesh created by a Delaunay triangulation algorithm for the original face, the desired face, and the midway face. Why do we use the Delaunay algorithm? Because it maximizes the angles in the triangle, which means we don't have "skinny triangles". Skinny triangles are no good as you have an uneven pixel distribution. I.e., when you are mapping pixels from the source triangle to the skinny triangle, you are going to be smearing a lot of the pixels over the same area, which loses a lot of detail. The results of the triangulation are below.
Performing the multiplication results in the following system of equations:
\[ \begin{aligned} x'_1 &= m_{00} \cdot x_1 + m_{01} \cdot y_1 + m_{02} \\ y'_1 &= m_{10} \cdot x_1 + m_{11} \cdot y_1 + m_{12} \\ x'_2 &= m_{00} \cdot x_2 + m_{01} \cdot y_2 + m_{02} \\ y'_2 &= m_{10} \cdot x_2 + m_{11} \cdot y_2 + m_{12} \\ x'_3 &= m_{00} \cdot x_3 + m_{01} \cdot y_3 + m_{02} \\ y'_3 &= m_{10} \cdot x_3 + m_{11} \cdot y_3 + m_{12} \end{aligned} \]
For each triangle in the mid-way face, we get the corresponding triangles from imageA and imageB. We find the affine translation from the points of the triangle from A to the midway (we do the same for B). We then warp the images using the translation. A mask is utilized to get the part we are interested in. We also accumalate these masks and normalize at the end to prevent overcontribution.