Azure AI Document Intelligence Backbone for Multimodal building
Use Cases and Deployment Scope
We use Azure AI Document Intelligence mainly for Multimodal Gen AI building process, we use pdf document to take images embedding and text embedding for text to text and image based output generation model using OCR process. This is our main use case for our customer. Text embedding will be around the image. We easily take text embedding also image embedding also taken but text image embedding only can be taken through Azure AI Document Intelligence Optical Character Recognition and we do that using its api endpoint and key we take from Azure portal. Then we use that in our Python code, integrate, and get text image embedding. Text around image and text in image embedding helps to give accurate answers for the user based queries.
Pros
- From the Image the OCR technology use to store embedding properly
- From the images the OCR technology automatically capturing text is brilliant
- Also for creating Azure AI Document Intelligence endpoint and keys it is very easy to do
Cons
- Azure AI Document Intelligence should have used to store image based embedding for multiple languages
- Images storing part is a different area in our case and only for a few images that part can be increased
- Cost is high which should be reduced so more users will come and more technology through this will come
Return on Investment
- Significant reduction in manual processing cost by 40-60%
- Improved data accuracy and reduce work
- Scalability is good without cost increase
Usability
Alternatives Considered
Azure AI Content Safety, Azure Blob Storage and Azure AI Search
Other Software Used
Azure AI Content Safety, Azure Blob Storage, Azure AI Search

