This is a very good idea as one often find research articles with interesting tables in .pdf format or as scanned documents that you have to retype in latex or in excel.
One can convert scanned pdf to excel using this type of OCR software
https://www.cogniview.com/pdf-to-excel/pdf2xl-ocr
but I assume you want to use ML to create your own conversion software. This is an excellent idea and I will be the first to use your solution.
In terms of ANN OCR in table format, this is a potential solution
Optical Character Recognition systems enable several applications, e.g. automatic character recognition in printed texts. For the success of such systems, reliable segmentation is an essential stage. This chapter presents two approaches to segmentation: the SLPTEO for segmentation of text lines and words, and SCORC for character segmentation. The first is applied to printed texts, but can be also applied to handwritten texts. The second handles printed overlapping and touching characters, working directly on grayscale images. Experimental results show great robustness of the methods presented.
SLPTEO e SCORC: Abordagens para Segmentação de Linhas, Palavras e Caracteres em Textos Impressos. Available from: https://www.researchgate.net/publication/256088532_SLPTEO_e_SCORC_Abordagens_para_Segmentao_de_Linhas_Palavras_e_Caracteres_em_Textos_Impressos [accessed Jun 16, 2015].
Chapter SLPTEO e SCORC: Abordagens para Segmentação de Linhas, Palav...
If you insist to use of Ml, I think you can, (1)recognise the cells of table using line detection algorithms (2) ROC the text in each cells ,using a method such as NN,
the most significant properties of a line being a part of a table are
a) that it consists of non-continuous text flow, i.e. having sequences of characters alternating with intervals of empty space, and
b) that the intensity distribution (distribution of black and white pixels) of the line has a strong correlation with the distribution of the line above and the line below.
Both properties can be measured easily and can be used as a feature vector for classification in order to decide if the line is part of a table or not, e.g. using thresholds or any kind of machine learning technique.
If you are also looking for tables that do NOT cover the whole width of the sheet you have to apply the above method to smaller sections of the lines.
I am new to ML yet i am also in search of Optical Table Recognizing options through ML way. I found these interesting articles. I'm yet to start working on it. Hope this helps.