【Free Code】Minimal Script to Safely Run Cloud Vision OCR via CLI

* If you need help with the content of this article for work or development, individual support is available.

2025.12.18 （更新: 2025.12.17）

#CLI #Cloud Vision API #Code Snippet #OCR #python

When using Cloud Vision API's OCR in a practical setting, you often run into the issue where "the official documentation samples are too fragmented, and it's unclear how to connect them all together."

Specifically, tasks like:

Uploading to GCS
Running asynchronous OCR
Retrieving and merging the result JSONs

It is very convenient to have a script that can execute these steps as a continuous workflow.

In this post, I am sharing the "Minimal configuration script to safely run Cloud Vision OCR via CLI" that I actually use in my projects.

It is designed with security in mind, ensuring that credentials are not hardcoded. Feel free to copy, paste, and adapt it to your own environment.

Minimal Script (Python)

This script includes the following features:

Creates a GCS bucket (if it doesn't exist)
Uploads the PDF
Executes asynchronous OCR
Downloads the result JSONs and merges the text

import os
from google.cloud import vision
from google.cloud import storage
import json

# ==============================
# Configuration (ANONYMIZED)
# ==============================

# NOTE:
# - Do NOT hardcode real credential file names in public code
# - Use environment variables or placeholders

# Example (set outside this script):
# export GOOGLE_APPLICATION_CREDENTIALS="/path/to/your/service-account.json"

# os.environ["GOOGLE_APPLICATION_CREDENTIALS"] = "your-service-account.json"  # ❌ avoid in public code


def create_bucket_if_missing(bucket_name: str):
    """Create a GCS bucket if it does not exist."""
    storage_client = storage.Client()
    bucket = storage_client.bucket(bucket_name)

    if not bucket.exists():
        print(f"Bucket '{bucket_name}' not found. Creating...")
        storage_client.create_bucket(bucket)
        print(f"Bucket '{bucket_name}' created.")
    else:
        print(f"Bucket '{bucket_name}' already exists.")


def upload_blob(bucket_name: str, source_file: str, destination_blob: str):
    """Upload a local file to GCS."""
    storage_client = storage.Client()
    bucket = storage_client.bucket(bucket_name)
    blob = bucket.blob(destination_blob)

    blob.upload_from_filename(source_file)
    print(f"Uploaded '{source_file}' to 'gs://{bucket_name}/{destination_blob}'.")


def async_detect_document(gcs_source_uri: str, gcs_destination_uri: str):
    """Run async OCR (PDF/TIFF) using Cloud Vision API."""

    client = vision.ImageAnnotatorClient()

    feature = vision.Feature(
        type_=vision.Feature.Type.DOCUMENT_TEXT_DETECTION
    )

    input_config = vision.InputConfig(
        gcs_source=vision.GcsSource(uri=gcs_source_uri),
        mime_type="application/pdf",
    )

    output_config = vision.OutputConfig(
        gcs_destination=vision.GcsDestination(uri=gcs_destination_uri),
        batch_size=1,
    )

    request = vision.AsyncAnnotateFileRequest(
        features=[feature],
        input_config=input_config,
        output_config=output_config,
    )

    print(f"Running OCR for: {gcs_source_uri}")
    operation = client.async_batch_annotate_files(requests=[request])
    operation.result(timeout=300)
    print("OCR completed.")


def list_blobs(bucket_name: str, prefix: str):
    """List blobs in a bucket with a given prefix."""
    storage_client = storage.Client()
    return storage_client.list_blobs(bucket_name, prefix=prefix)


def fetch_ocr_results(gcs_destination_uri: str):
    """Download OCR result JSON files and merge extracted text."""

    if not gcs_destination_uri.startswith("gs://"):
        raise ValueError("Destination URI must start with gs://")

    path = gcs_destination_uri.replace("gs://", "")
    bucket_name, prefix = path.split("/", 1)

    output_dir = "ocr_output"
    os.makedirs(output_dir, exist_ok=True)

    full_text = ""

    for blob in list_blobs(bucket_name, prefix):
        if not blob.name.endswith(".json"):
            continue

        local_path = os.path.join(output_dir, os.path.basename(blob.name))
        blob.download_to_filename(local_path)

        with open(local_path, "r", encoding="utf-8") as f:
            data = json.load(f)

        for idx, page in enumerate(data.get("responses", []), start=1):
            annotation = page.get("fullTextAnnotation")
            if not annotation:
                continue

            text = annotation.get("text", "")
            full_text += f"\n\n--- Page {idx} ---\n{text}"

    with open("final_text_output.txt", "w", encoding="utf-8") as f:
        f.write(full_text)

    print("Merged OCR text saved to final_text_output.txt")


if __name__ == "__main__":
    # ==============================
    # Example placeholders ONLY
    # ==============================

    BUCKET_NAME = "your-ocr-bucket-name"
    INPUT_PDF = "input.pdf"
    OUTPUT_PREFIX = "ocr_results/"

    gcs_source = f"gs://{BUCKET_NAME}/{INPUT_PDF}"
    gcs_output = f"gs://{BUCKET_NAME}/{OUTPUT_PREFIX}"

    # create_bucket_if_missing(BUCKET_NAME)
    # upload_blob(BUCKET_NAME, INPUT_PDF, INPUT_PDF)
    # async_detect_document(gcs_source, gcs_output)
    fetch_ocr_results(gcs_output)

Key Points of This Code

1. No Hardcoded Credentials

The script assumes that GOOGLE_APPLICATION_CREDENTIALS is passed as an environment variable. By not writing the path or content of key.json directly in the code, we prevent accidents where credentials might be inadvertently published to GitHub or other public repositories.

2. Merging OCR Results

The Cloud Vision API outputs JSON files split by page. The fetch_ocr_results function automatically downloads these files, merges them in page order, and saves the result to final_text_output.txt. In practice, since the goal is almost always "I just want the final text," automating this step makes the process much smoother.

3. Ease of Re-execution

Inside the if __name__ == "__main__": block, the code is structured so you can uncomment and execute only the necessary processing steps. Since OCR can take time, this is convenient for situations like "The OCR is already finished, I just want to fetch the results."

Usage

Install the necessary libraries

pip install google-cloud-vision google-cloud-storage

Set your credentials to an environment variable

export GOOGLE_APPLICATION_CREDENTIALS="/path/to/your-service-account.json"

Update BUCKET_NAME and other variables in the script, then run it.

Note The output ocr_output/ folder and final_text_output.txt will contain OCR results (which may include confidential information). It is recommended to add these to your .gitignore to ensure they are not included in version control.

Cloud Vision OCR Series Articles

This article is part of a series on setting up and operating OCR using Cloud Vision API.

[Summary] Cloud Vision OCR x CLI x GCP Setup & Troubleshooting Log](/archives/ocr-series-summary-en)
Setup: Why Cloud Vision API OCR Failed with IAM and How I Finally Got It Working
Operation: Why Running Cloud Vision OCR via CLI Was the Easiest Method
Workflow: Why GCP x VSC x CLI is the Correct Route for Frequent OCR
Tools: Explanation of Practical Tools Created to Run Cloud Vision OCR via CLI
Code: Free Code: Minimal Script to Safely Run Cloud Vision OCR via CLI
Extra: When Stuck in GCP GUI, Escaping to Cloud Shell + CUI Was Faster (Failure Story)

【Free Code】Minimal Script to Safely Run Cloud Vision OCR via CLI

Minimal Script (Python)

Key Points of This Code

1. No Hardcoded Credentials

2. Merging OCR Results

3. Ease of Re-execution

Usage

Cloud Vision OCR Series Articles

ZIDOOKA!

コメントを残すコメントをキャンセル

Minimal Script (Python)

Key Points of This Code

1. No Hardcoded Credentials

2. Merging OCR Results

3. Ease of Re-execution

Usage

Cloud Vision OCR Series Articles

ZIDOOKA!

コメントを残す コメントをキャンセル

Related Posts

Export Slack Mentions to CSV with Python: A Workflow Game Changer

Why You Still Get ‘Permission denied (publickey)’ Even With ssh -vvv

Not Sure If Your SSH Key Pair Is Actually Correct? Check This First

Troubleshooting “vivliostyle not found” Errors — How to Install and Configure Vivliostyle CLI

コメントを残すコメントをキャンセル