>>13294290cont.
>it's a matrix that has the entries of the forget gate on its diagonalWhy? Well, the Jacobian has the gradient with respect to the i-th component of C_t in row i. The i-th component of C_t is . Then the partial derivatives with respect to the components of C_{t-1} are all zero except for the i-th component, where the partial derivative is the i-th forget gate component. Hence a diagonal matrix.
Btw, I would recommend you to use the weighted gradient sum variant of the chain rule, it makes things a lot less fucky for example when you're deriving by matrices.