Amazon started as online book store and has grown to be the worlds largest online retailer.
Amazon Textract is now available to automate data management
Amazon US today announced the general availability of Amazon Textract, a machine learning service which automatically extracts text and data which would typically need a manual review.
The tool allows users to document workflows, enabling them to process millions of document pages in hours. User are able to create indexes, build automated approval workflows, and better maintain compliance with document archival rules by flagging data that may require redaction. This also allows users to analyse a comprehensive set of data and take strategic action for their business.
Many users extract text and data from files such as contracts, expense reports, mortgage guarantees, fund prospectuses, tax documents, hospital claims, and patient forms through manual data entry or simple optical character recognition (OCR) software. This is a time-consuming and often inaccurate process that produces an output requiring extensive post-processing before it can be put in a format that is usable by other applications.
How does Amazon Textract work?
Amazon Textract identifies text and data from tables and forms in documents. Amazon say that Amazon Textract goes beyond the usual OCR to identify the contents of fields in forms, information stored in tables, and the context in which the information is presented.
For instance, if a table has a couple of columns a traditional OCR would show a random “bag of letters.” Amazon say that Amazon Textract is “intelligent” as it reads information like a human. It’s able to tell that the data is allocated across two tables and organise the data in chronological order.
Traditional optical character recognition (OCR)
Sellers can use the tool if they still receive paper catalogues from their suppliers.