BELT: Block-wise Missing Embedding Learning Transformer
Matrix completion has attracted attention in many fields, including statistics, applied mathematics, and electrical engineering. Most of the works focus on the independent sampling models under which the observed entries are sampled independently. Motivated by applications in the integration of multiple Electronic Health Record (EHR) datasets, we propose the method Block-wise missing Embedding Learning Transformer (BELT) to treat row-wise/column-wise missingness. Specifically, BELT can recover block-wise missing matrices efficiently when every pair of matrices has an overlap. Our idea is to exploit the orthogonal Procrustes problem to align the eigenspace of the two sub-matrices using their overlap, then complete the missing blocks by the inner product of the two low-rank components. Besides, we prove the statistical rate for the eigenspace of the underlying matrix, which is comparable to the rate under the independently missing assumption. Simulation studies show that the method performs well under a variety of configurations. In the real data analysis, the method is applied to two tasks: (i) the integrating of several point-wise mutual information matrices built by English EHR and Chinese medical text data, and (ii) the machine translation between English and Chinese medical concepts. Our method shows an advantage over existing methods.
READ FULL TEXT