
Extracting Table Cell from DocParse
table_demo.py
table
element that contains the information about the table in the page.
output.json
cells
field which is an array of cell objects that represent each of the cells in the table. Let’s focus on the first element of that list.
cells.json
Displaying the Table
Here we’ve detected the first cell, its bounding box (which indicates the coordinates of the cell in the PDF.), whether it’s a header cell and its contents. You can then process this JSON however you’d like for further analysis. In the notebook we use thetables_to_pandas
function to turn the JSON into a pandas dataframe and then perform some analysis on it:
display_table.py
Years ended December 31 (Millions) | 2018 | 2017 | 2016 | |
---|---|---|---|---|
0 | Major GAAP Cash Flow Categories | |||
1 | Net cash provided by operating activities | $ 6,439 | 6,240 | $ 6,662 |
2 | Net cash provided by (used in) investing activities | 222 | (3,086) | (1,403) |
3 | Net cash used in financing activities. | (6,701) | (2,655) | (4,626) |
4 | Free Cash Flow (non-GAAP measure) | |||
5 | Net cash provided by operating activities | $ 6,439 | $ 6,240 | 6,662 |
6 | Purchases of property, plant and equipment (PP&E | (1,577) | (1,373) | (1,420) |
7 | Free cash flow | $ 4,862 | $ 4,867 | $ 5,242 |
8 | Net income attributable to 3M | $ 5,349 | $ 4,858 | $ 5,050 |
9 | Free cash flow conversion | 91 % | 100 % | 104 % |