Overview
The OCR Image endpoint extracts text from images using state-of-the-art Optical Character Recognition powered by Mistral Document AI. This endpoint can read printed text, handwriting, and structured documents from images in multiple formats, making it perfect for digitizing physical documents, automating data entry, and extracting information from visual content. Key capabilities:- Multi-format support - PNG, JPEG, GIF, WebP, and other common formats
- High accuracy - Advanced AI models trained on diverse document types
- Structured output - Returns both plain text and markdown-formatted results
- Layout preservation - Maintains document structure and formatting
- Handwriting recognition - Supports both printed and handwritten text
- Fast processing - Typical extraction in 2-5 seconds
- Document digitization - Convert physical documents to searchable text
- Receipt processing - Extract line items, totals, and merchant info
- Invoice automation - Pull invoice details for accounting systems
- Forms processing - Digitize handwritten or printed forms
- Business card scanning - Extract contact information
- ID verification - Read text from identity documents
- Screenshot analysis - Extract text from UI screenshots
- Data entry automation - Eliminate manual typing from documents
- Archive digitization - Convert historical documents to searchable format
When to Use OCR Image vs OCR PDF
Use OCR Image for:- Single-page documents captured as images
- Photos of documents (receipts, business cards, signs)
- Screenshots containing text
- Scanned single pages
- Social media images with text
- Product labels and packaging
- Multi-page PDF documents
- Scanned PDFs
- Digital PDFs with embedded images
- Documents requiring page-by-page processing
- Large document sets
How OCR Works
The OCR process involves several sophisticated steps:- Image Preprocessing - Image is analyzed and optimized for text extraction
- Text Detection - AI identifies regions containing text
- Character Recognition - Individual characters are recognized
- Layout Analysis - Document structure and reading order are determined
- Post-processing - Text is cleaned, formatted, and structured
- Output Generation - Results returned in both plain text and markdown formats
Best Practices for OCR Quality
Image Quality:- Use high-resolution images (300 DPI or higher for scanned documents)
- Ensure good lighting and contrast
- Avoid blurry or out-of-focus images
- Keep text horizontal (or provide properly oriented images)
- Remove shadows and glare
- Ensure text is clearly visible
- Crop to document boundaries when possible
- Use white or light backgrounds for best contrast
- PNG or JPEG for photos of documents
- PNG for screenshots (lossless)
- JPEG for scanned documents (with high quality settings)
- Keep files under 10MB for optimal performance
Authentication
All OCR endpoints require authentication via Bearer token in the Authorization header.Request
You can submit images via either file upload or base64-encoded JSON.Method 1: File Upload (multipart/form-data)
Method 2: Base64 JSON
Request Body (JSON Method)
- image string — Base64-encoded image data (with or without data URI prefix)
Responses
Success Response
No Text Found
Field Reference
- success boolean — Whether text extraction succeeded.
- text string — Extracted text from the image (processed from markdown).
- method string — Always
"mistral_document_ai"for OCR processing. - raw_response object — Complete unprocessed response from Mistral Document AI, including:
- pages array — Array of page data (single page for images)
- index number — Page index (0 for images)
- markdown string — Extracted text in markdown format
- text string — Plain text version
- Additional metadata fields
- model string — Model used (e.g.,
mistral-document-ai-2505)
- pages array — Array of page data (single page for images)
Supported Image Formats
- PNG
- JPEG/JPG
- GIF
- WebP
Error Responses
Authentication Required
401 Unauthorized
Invalid API Key
503 Service Unavailable
Invalid Request
400 Bad Request
Understanding the Response
The API returns both processed text and raw OCR data: success - Boolean indicating whether text was successfully extracted text - Clean, processed text extracted from the image (recommended for most use cases) markdown - Formatted text preserving structure (available in raw_response) raw_response - Complete Mistral Document AI response with detailed metadata When to use each:- Use
textfor simple text extraction and display - Use
markdown(in raw_response) to preserve formatting and structure - Use
raw_responsewhen you need bounding boxes, confidence scores, or detailed metadata
Integration Patterns
Document Management Systems:Performance Optimization
Image Optimization:- Resize very large images (4000x4000+) to reduce upload time
- Use JPEG compression for photos (85-95% quality)
- Use PNG for text-heavy images and screenshots
- Crop unnecessary margins to focus on text areas
- Smaller files process faster (aim for < 2MB)
- Clear, high-contrast images process more quickly
- Parallel requests for multiple images
- Consider caching results for frequently processed images
- Batch similar documents together
- Cache results to avoid reprocessing
- Pre-process images to ensure good quality
- Use appropriate resolution (don’t oversample)
Error Handling
Implement robust error handling for production use:Common Issues and Solutions
Issue: No text extracted from clear image- Solution: Check image orientation, try rotating 90/180/270 degrees
- Solution: Increase image resolution or contrast
- Solution: Verify text is actually present and legible
- Solution: Improve image quality (better lighting, focus)
- Solution: Ensure text is horizontal
- Solution: Remove shadows and glare
- Solution: Try preprocessing (contrast adjustment, noise reduction)
- Solution: Check if text is very small or low contrast
- Solution: Increase resolution
- Solution: Verify entire document is within image bounds
- Solution: Reduce image size (resize before upload)
- Solution: Compress images appropriately
- Solution: Check network connectivity
Performance Tips
Image Quality:- Use 300+ DPI for scanned documents
- Ensure good lighting and minimal glare
- Keep text horizontal and in focus
- Use high-quality camera settings
- PNG for screenshots and text-heavy images
- JPEG (85-95% quality) for photos
- Keep files under 10MB for optimal speed
- Consider preprocessing (crop, rotate, enhance contrast)
- Process multiple images concurrently
- Use threading or async for batch operations
- Implement request pooling for high-volume use
- Cache OCR results for frequently accessed images
- Store extracted text in database
- Implement deduplication to avoid reprocessing identical images
