OCR Image

Overview

The OCR Image endpoint extracts text from images using state-of-the-art Optical Character Recognition powered by Mistral Document AI. This endpoint can read printed text, handwriting, and structured documents from images in multiple formats, making it perfect for digitizing physical documents, automating data entry, and extracting information from visual content. Key capabilities:

Multi-format support - PNG, JPEG, GIF, WebP, and other common formats
High accuracy - Advanced AI models trained on diverse document types
Structured output - Returns both plain text and markdown-formatted results
Layout preservation - Maintains document structure and formatting
Handwriting recognition - Supports both printed and handwritten text
Fast processing - Typical extraction in 2-5 seconds

Real-world applications:

Document digitization - Convert physical documents to searchable text
Receipt processing - Extract line items, totals, and merchant info
Invoice automation - Pull invoice details for accounting systems
Forms processing - Digitize handwritten or printed forms
Business card scanning - Extract contact information
ID verification - Read text from identity documents
Screenshot analysis - Extract text from UI screenshots
Data entry automation - Eliminate manual typing from documents
Archive digitization - Convert historical documents to searchable format

When to Use OCR Image vs OCR PDF

Use OCR Image for:

Single-page documents captured as images
Photos of documents (receipts, business cards, signs)
Screenshots containing text
Scanned single pages
Social media images with text
Product labels and packaging

Use OCR PDF for:

Multi-page PDF documents
Scanned PDFs
Digital PDFs with embedded images
Documents requiring page-by-page processing
Large document sets

How OCR Works

The OCR process involves several sophisticated steps:

Image Preprocessing - Image is analyzed and optimized for text extraction
Text Detection - AI identifies regions containing text
Character Recognition - Individual characters are recognized
Layout Analysis - Document structure and reading order are determined
Post-processing - Text is cleaned, formatted, and structured
Output Generation - Results returned in both plain text and markdown formats

Processing time: Typically 2-5 seconds per image, depending on resolution and complexity.

Best Practices for OCR Quality

Image Quality:

Use high-resolution images (300 DPI or higher for scanned documents)
Ensure good lighting and contrast
Avoid blurry or out-of-focus images
Keep text horizontal (or provide properly oriented images)

Document Preparation:

Remove shadows and glare
Ensure text is clearly visible
Crop to document boundaries when possible
Use white or light backgrounds for best contrast

Format Selection:

PNG or JPEG for photos of documents
PNG for screenshots (lossless)
JPEG for scanned documents (with high quality settings)
Keep files under 10MB for optimal performance

Authentication

All OCR endpoints require authentication via Bearer token in the Authorization header.

Authorization: Bearer ik_your_api_key_here

Request

You can submit images via either file upload or base64-encoded JSON.

Method 1: File Upload (multipart/form-data)

curl -X POST "https://api.incredible.one/ocr/image" \
  -H "Authorization: Bearer ik_your_api_key_here" \
  -F "file=@/path/to/receipt.png"

Method 2: Base64 JSON

curl -X POST "https://api.incredible.one/ocr/image" \
  -H "Authorization: Bearer ik_your_api_key_here" \
  -H "Content-Type: application/json" \
  -d '{
    "image": "iVBORw0KGgoAAAANSUhEUgAAAAUA..."
  }'

Request Body (JSON Method)

image string — Base64-encoded image data (with or without data URI prefix)

Responses

Success Response

{
  "success": true,
  "text": "Extracted text from the image...",
  "method": "mistral_document_ai",
  "raw_response": {
    "pages": [
      {
        "index": 0,
        "markdown": "Extracted text in markdown format...",
        "text": "...",
        "bounding_box": {...},
        "image_url": "..."
      }
    ],
    "id": "...",
    "object": "document",
    "model": "mistral-document-ai-2505"
  }
}

No Text Found

{
  "success": false,
  "error": "No text could be extracted from the image",
  "text": "",
  "method": "mistral_document_ai",
  "raw_response": {
    "pages": [...]
  }
}

Field Reference

success boolean — Whether text extraction succeeded.
text string — Extracted text from the image (processed from markdown).
method string — Always "mistral_document_ai" for OCR processing.
raw_response object — Complete unprocessed response from Mistral Document AI, including:
- pages array — Array of page data (single page for images)
  - index number — Page index (0 for images)
  - markdown string — Extracted text in markdown format
  - text string — Plain text version
  - Additional metadata fields
- model string — Model used (e.g., mistral-document-ai-2505)

Supported Image Formats

PNG
JPEG/JPG
GIF
WebP

Error Responses

Authentication Required

{
  "error": "Authentication required",
  "message": "API key must be provided in Authorization header as 'Bearer ik_your_api_key'"
}

Status Code: 401 Unauthorized

Invalid API Key

{
  "error": "Invalid API key",
  "message": "The provided API key is invalid, inactive, or expired"
}

Status Code: 503 Service Unavailable

Invalid Request

{
  "success": false,
  "error": "Missing 'image' or 'file' field in request body"
}

Status Code: 400 Bad Request

Understanding the Response

The API returns both processed text and raw OCR data: success - Boolean indicating whether text was successfully extracted text - Clean, processed text extracted from the image (recommended for most use cases) markdown - Formatted text preserving structure (available in raw_response) raw_response - Complete Mistral Document AI response with detailed metadata When to use each:

Use text for simple text extraction and display
Use markdown (in raw_response) to preserve formatting and structure
Use raw_response when you need bounding boxes, confidence scores, or detailed metadata

Integration Patterns

Document Management Systems:

# Extract text from uploaded documents
def process_uploaded_document(file):
    response = requests.post(
        "https://api.incredible.one/ocr/image",
        headers={"Authorization": f"Bearer {API_KEY}"},
        files={"file": file}
    )
    
    if response.json()["success"]:
        # Store extracted text in database
        text = response.json()["text"]
        save_to_database(text)
    return text

Receipt Processing:

# Extract and structure receipt data
def process_receipt(receipt_image):
    result = ocr_image(receipt_image)
    
    # Use AI to structure the extracted text
    structured_data = client.answer(
        query="Extract merchant name, date, items, and total",
        response_format=receipt_schema,
        context=result["text"]
    )
    return structured_data

Batch Processing:

# Process multiple images in parallel
import concurrent.futures

def ocr_batch(image_paths):
    with concurrent.futures.ThreadPoolExecutor(max_workers=5) as executor:
        futures = [executor.submit(ocr_image, path) for path in image_paths]
        results = [f.result() for f in concurrent.futures.as_completed(futures)]
    return results

Performance Optimization

Image Optimization:

Resize very large images (4000x4000+) to reduce upload time
Use JPEG compression for photos (85-95% quality)
Use PNG for text-heavy images and screenshots
Crop unnecessary margins to focus on text areas

Processing Speed:

Smaller files process faster (aim for < 2MB)
Clear, high-contrast images process more quickly
Parallel requests for multiple images
Consider caching results for frequently processed images

Cost Optimization:

Batch similar documents together
Cache results to avoid reprocessing
Pre-process images to ensure good quality
Use appropriate resolution (don’t oversample)

Error Handling

Implement robust error handling for production use:

def safe_ocr_image(image_path, max_retries=3):
    for attempt in range(max_retries):
        try:
            with open(image_path, "rb") as f:
                response = requests.post(
                    "https://api.incredible.one/ocr/image",
                    headers={"Authorization": f"Bearer {API_KEY}"},
                    files={"file": f},
                    timeout=30
                )
            
            result = response.json()
            
            if result.get("success"):
                return result["text"]
            else:
                logging.warning(f"OCR failed: {result.get('error')}")
                return None
                
        except requests.exceptions.Timeout:
            logging.error(f"Timeout on attempt {attempt + 1}")
            if attempt == max_retries - 1:
                raise
        except Exception as e:
            logging.error(f"OCR error: {e}")
            raise
    
    return None

Common Issues and Solutions

Issue: No text extracted from clear image

Solution: Check image orientation, try rotating 90/180/270 degrees
Solution: Increase image resolution or contrast
Solution: Verify text is actually present and legible

Issue: Garbled or incorrect text

Solution: Improve image quality (better lighting, focus)
Solution: Ensure text is horizontal
Solution: Remove shadows and glare
Solution: Try preprocessing (contrast adjustment, noise reduction)

Issue: Missing text sections

Solution: Check if text is very small or low contrast
Solution: Increase resolution
Solution: Verify entire document is within image bounds

Issue: Slow processing

Solution: Reduce image size (resize before upload)
Solution: Compress images appropriately
Solution: Check network connectivity

Performance Tips

Image Quality:

Use 300+ DPI for scanned documents
Ensure good lighting and minimal glare
Keep text horizontal and in focus
Use high-quality camera settings

File Optimization:

PNG for screenshots and text-heavy images
JPEG (85-95% quality) for photos
Keep files under 10MB for optimal speed
Consider preprocessing (crop, rotate, enhance contrast)

Parallel Processing:

Process multiple images concurrently
Use threading or async for batch operations
Implement request pooling for high-volume use

Caching:

Cache OCR results for frequently accessed images
Store extracted text in database
Implement deduplication to avoid reprocessing identical images

Getting Started

Text

Prompt Engineering

File Support

Research

Media

OCR

Overview

When to Use OCR Image vs OCR PDF

How OCR Works

Best Practices for OCR Quality

Authentication

Request

Method 1: File Upload (multipart/form-data)

Method 2: Base64 JSON

Request Body (JSON Method)

Responses

Success Response

No Text Found

Field Reference

Supported Image Formats

Error Responses

Authentication Required

Invalid API Key

Invalid Request

Understanding the Response

Integration Patterns

Performance Optimization

Error Handling

Common Issues and Solutions

Performance Tips

Getting Started

Text

Prompt Engineering

File Support

Research

Media

OCR

​Overview

​When to Use OCR Image vs OCR PDF

​How OCR Works

​Best Practices for OCR Quality

​Authentication

​Request

​Method 1: File Upload (multipart/form-data)

​Method 2: Base64 JSON

​Request Body (JSON Method)

​Responses

​Success Response

​No Text Found

​Field Reference

​Supported Image Formats

​Error Responses

​Authentication Required

​Invalid API Key

​Invalid Request

​Understanding the Response

​Integration Patterns

​Performance Optimization

​Error Handling

​Common Issues and Solutions

​Performance Tips

Overview

When to Use OCR Image vs OCR PDF

How OCR Works

Best Practices for OCR Quality

Authentication

Request

Method 1: File Upload (multipart/form-data)

Method 2: Base64 JSON

Request Body (JSON Method)

Responses

Success Response

No Text Found

Field Reference

Supported Image Formats

Error Responses

Authentication Required

Invalid API Key

Invalid Request

Understanding the Response

Integration Patterns

Performance Optimization

Error Handling

Common Issues and Solutions

Performance Tips