🧠 AuditVision: Turning FDA Compliance Chaos into Clarity

Tags: Python, OCR, Django, OpenCV, FDA 483, Data Analytics, AI in Pharma | 🔗 View Live Demo

When I joined Zydus Lifesciences, I saw firsthand how painful and time-consuming it was to handle FDA Form 483 documents — critical post-audit observations that demand fast, accurate action.

Most of these PDFs weren’t even text-selectable. They were scanned image documents that QA teams had to retype manually. One audit could take up to a week to process — and there were dozens per year. It was slow, error-prone, and draining valuable resources.

🚨 The Pain: A Broken Process

QA teams would comb through these flat PDFs, retyping line by line, trying to extract key insights: auditor names, firm IDs, root causes, trends. The process was manual, fragmented, and inconsistent across sites. There was no centralized system, no dashboards — just Excel sheets and shared folders.

🧩 The Spark: Let’s Build Something Better

That’s when I built AuditVision — a platform that could turn these scanned FDA documents into clean, structured data and meaningful visual insights. From OCR to web dashboards, the goal was simple: automate what could be automated, and surface what mattered.

🔧 Building the System

📄 Step 1: Converting PDFs to Structured Data

  • Used Poppler to convert multi-page PDFs into high-res images
  • Applied OpenCV with custom morphological filters to:
    • Detect table lines
    • Remove scan noise and whitespace
    • Segment images into top (30%), middle (observations), bottom (footer)
  • Ran Tesseract OCR on each zone independently for better accuracy
  • Cleaned and parsed text using regex, converting it into Pandas DataFrames with fields like firm name, auditor, date, FEI number
  • Merged with QA Excel data to enrich with metadata (plant, department, etc.)

💻 Step 2: Interactive Dashboards

Once the data pipeline was set, I built a secure Django web app with real-time analytics using Chart.js, Plotly, and DataTables. Users could:

  • Track top auditors and their observation patterns
  • View company-wise and year-wise compliance trends
  • Compare auditor performance across regions and time
  • Filter, search, and export relevant slices of data

🔐 Step 3: Securing Compliance

  • Role-based access control with separate user/admin views
  • Full audit trail of logins, CRUD operations, exports with IPs and timestamps
  • Locked Excel exports (row limits + embedded user metadata for traceability)
  • Live session APIs for stats, termination, and timeout handling

⚡ The Impact

  • Reduced audit processing time from 1 week to just 1 day
  • Enabled centralized, real-time visibility across multiple manufacturing sites
  • Adopted across both Corporate QA and plant-level teams
  • Improved traceability, transparency, and audit readiness

💡 Reflection

What started as a small automation script turned into a full-scale internal product. AuditVision blended OCR, analytics, UX, and security into one unified platform — and made life easier for people on the ground.

To me, this project wasn’t just about writing code. It was about understanding a real-world problem and designing a solution that respected the complexity while simplifying the work.

Tech Stack: Python, Django, OpenCV, Tesseract, Pandas, Chart.js, Plotly, SQL

Company: Zydus Lifesciences | Status: Internal deployment