Building the Solution: From Unstructured Chaos to Structured Intelligence

How We Turned 250+ Resumes Into a Searchable AI System in 2 Weeks

In Part 1, we explored how Copilot-powered search is changing the way employees find and work with information across Microsoft 365. From context-aware answers to natural language discovery, the promise of AI-driven search is clear.

Now, in Part 2, we move from possibility to precision—looking at what it actually takes to make Copilot Search work effectively in real-world environments. We designed a four-layer architecture using Microsoft Azure services

Layer 1: Data Ingestion

Resumes uploaded from SharePoint to Azure Blob Storage
Preserves folder structure and metadata
Currently using a custom C# script; post-POC, automated via Power Automate

Layer 2: AI Enrichment (The Game Changer)

This is where the magic happens. We built a custom Azure Function that:

Extracts text using Azure AI Search’s built-in OCR (handles scanned PDFs, images, complex layouts)
Calls Azure OpenAI GPT-4 with a strict extraction schema
Uses function calling to enforce structured output (no hallucinations!)
Extracts 45+ fields from each resume:
- Basic info (name, title, location, years of experience)
- Industry details (networks, roles, genres, skills, guilds)
- Career history (shows with roles/networks/years, awards, production companies)
- Contact information and references

Example extraction

{
  "name": "James Sanderson",
  "title": "Executive Producer/Showrunner",
  "networks": ["Discovery+", "Investigation Discovery", "Bravo"],
  "roles": ["Executive Producer", "Showrunner", "Director"],
  "genres": ["True Crime", "Documentary", "Reality TV"],
  "shows": [
    {
      "show_name": "Murder in the Heartland",
      "role": "Executive Producer",
      "network": "Investigation Discovery",
      "years": "2018-2022",
      "seasons": "4"
    }
  ],
  "yearsExperience": 15,
  "awards": [...]
}

Critical Innovation: We add contextual annotations to entertainment industry terms- “Murder in the Heartland (TV Show Title – Professional Work Credit)” to prevent content filtering issues. This annotation happens in our Azure Function code, completely under our control.

Layer 3: Azure AI Search Index

The structured data goes into Azure AI Search, which provides:

Semantic search – understands “true crime” = “murder investigation” = “cold case”
Hybrid ranking – combines keyword and semantic relevance
Complex filtering – network AND role AND genre AND years of experience, all at once
Tunable relevance – we control scoring profiles and ranking

Layer 4: Search API & User Interface

We built a REST API (Azure Function) that

Receives natural language queries from Copilot Studio
Translates them into optimized Azure AI Search queries
Applies filters and ranking
Returns structured JSON results
Handles all content annotation.

Users interact through:

Copilot Studio for conversational search
React web app for advanced search with full filter controls

The Content Filtering Solution

Remember those “Content was filtered” errors? Here’s how we solved them.

The Problem

Microsoft’s Responsible AI system (rightfully) blocks harmful content. But it’s a black box that sometimes misclassifies legitimate business terms. Entertainment industry resumes containing show titles like “Murder Mystery,” “True Crime,” or “Deadly Sins” were triggering filters.

Our Solution

We control content processing before it reaches Microsoft’s filters. In our custom Azure Function, we add contextual annotations:

Before (Gets Filtered):

“Murder in the Heartland”
“True Crime Documentary”

After (Passes Through):

“Murder in the Heartland (TV Show Title – Professional Work Credit)”
“True Crime Documentary (TV Genre – Entertainment Industry)”

These annotations clarify context, preventing misclassification. Since this happens in our code (not Microsoft’s), we have complete control.

Result: Zero content filtering errors in production. Every query works.

The Results: From 30 Minutes to a few Seconds

After a 2-week POC (32-40 hours of implementation), here’s what we delivered:

Performance Metrics

Metric	Before	After	Improvement
Search Time	15-30 minutes	Few seconds	99% faster
Content Filtering Errors	Frequent blocks	Zero	100% eliminated
Search Accuracy	Inconsistent	95%+	Reliable
Complex Queries	Not possible	Fully supported	New capability
Result Completeness	Unknown	100% of matches	Trustworthy

Business Impact

Productivity Gains

20+ searches per day × 20 minutes saved per search = 400 minutes daily
Nearly 7 hours of productivity gained per day
ROI achieved in 6-8 weeks

Better Outcomes

Find talent that would have been missed with manual search

Multi-criteria matching works flawlessly (network + role + genre + experience)
Every team member gets the same consistent results
Confidence that results are complete, not just examples

Scalability

Handles 250+ resumes today, can scale to 1000+ with no performance degradation
Auto-indexes new/updated resumes (no manual maintenance)
Supports concurrent users without slowdown

Ebook: Copilot for Microsoft 365 – Unveiling its Dynamics and Capabilities

Copilot for Microsoft 365 – Unveiling the Dynamics and Capabilities

Microsoft 365 Copilot is coming soon but is your organization ready? As organizations increasingly embrace Microsoft 365 Copilot for enhanced collaboration and productivity, the strategic planning of its rollout becomes critical. Read our eBook.

Get the eBook

Real-World Examples

Here’s how it works in practice:

Query 1: “Find documentary producers who worked with Discovery”

Returns 15 profiles in 2 seconds
Automatically includes Discovery, Discovery+, Investigation Discovery
All are producers (not directors or other roles)
All have documentary experience
Sorted by relevance (most Discovery credits first)

Query 2: “True crime producers with 10+ years experience in Los Angeles”

Returns 8 profiles in 2 seconds
All match: true crime OR crime investigation OR murder mystery (semantic understanding)
All have 10+ years experience
All based in Los Angeles
Zero content filtering errors

Query 3: Complex multi-criteria

User selects: Networks (Discovery, Investigation Discovery), Roles (Producer, Executive Producer), Genres (Documentary, True Crime), Min Experience (10 years)
Returns exact matches only
Fast, accurate, complete

Key Architectural Decisions

Why Azure AI Search Instead of Vector Database?

Azure AI Search provides hybrid search (vector + keyword + semantic) out of the box, with built-in ranking, filtering, and faceting. Vector databases like Pinecone or Weaviate are excellent for pure similarity search but lack the rich query capabilities needed for enterprise search.

Why GPT-4 Function Calling?

GPT-4’s function-calling feature enforces strict JSON schemas, eliminating hallucinations. If GPT-4 can’t find a field, it returns null—it never invents data. This is critical for mission-critical applications where accuracy matters.

Why a Custom Azure Function for Search API?

This gives us complete control over:

Content annotation (bypassing RAI filters)
Query translation (natural language → structured search)
Scoring and ranking logic
Response formatting – Security and access control

We could have used Azure AI Search directly, but the API layer provides a cleaner interface for Copilot Studio and allows business logic centralization.

Why Both Copilot Studio AND a Web App?

Different users, different needs:

Copilot Studio: Great for conversational, quick searches (“Find producers with true crime credits”)
React Web App: Better for power users who want full control over filters, export to CSV, and advanced sorting

One backend, multiple interfaces.

When to Use Out-of-the-Box Copilot vs. Production Search

This isn’t an either/or. Both have their place.

Use Out-of-the-Box Copilot Studio When

You need conversational Q&A about documents
Approximate answers are fine (“Here are some examples…”)
You have <100 documents
Queries are simple (no complex filtering)
You need something fast (days to deploy)
You don’t need 100% consistency

Perfect for: Policy questions, document summaries, general knowledge base

Use Production Search Architecture When

You need to find ALL matching results (not examples)
Multi-criteria filtering is essential
Results must be consistent and trustworthy
You have 100s-1000s of documents
Sub-second response times matter
You need complete control over ranking and relevance
Content filtering is causing issues

Perfect for: Talent search, contract search, technical documentation, research repositories

Lessons Learned

1. Pre-Processing Beats On-Demand Processing

Extracting structured data upfront (even if it takes hours for initial indexing) is far better than asking an LLM to read PDFs on every query. The upfront cost pays dividends in speed, accuracy, and consistency.

2. Structured Data Eliminates Hallucinations

Using GPT-4 function calling with strict schemas means you get structured data or null – never hallucinated data. This makes the system trustworthy for business-critical applications.

3. Control Over Content Processing Is Essential

When dealing with industry-specific terminology that might trigger content filters, you need to control the processing pipeline. Our custom Azure Function gives us that control.

4. Semantic Search Changes Everything

Azure AI Search’s semantic understanding (“true crime” = “murder investigation” = “cold case”) finds results that a keyword search would miss. This is the power of modern AI search.

5. The Right Tool for the Right Job

Copilot Studio is brilliant for what it does. But when you need production-grade search, you need a proper search engine. Trying to force Copilot to be something it’s not leads to frustration.

The Technology Stack

For those interested in the technical details:

Search & AI

Azure AI Search (Basic tier, ~$75/month)
Azure OpenAI GPT-4 (pay-per-use)

Compute & Storage

Azure Functions (.NET 8 Isolated Worker)
Azure Blob Storage (Standard tier)

User Interface

Microsoft Copilot Studio (conversational search)
React 18 + TypeScript (web app)
Ant Design (UI component library)

Total Cloud Cost: max $300-400/month for production workloads.

Beyond Talent Search: Where This Architecture Applies

While we built this for talent search, the same architecture works for any industry with large collections of unstructured documents:

Recruitment & HR: Resume search, candidate matching
Legal: Contract search, clause extraction
Healthcare: Clinical document search (HIPAA-compliant)
Technical Documentation: Knowledge base, support systems
Research: Academic paper search, citation analysis
Real Estate: Property document search
Financial Services: Policy and compliance document search

If you have PDFs or Word documents and need a reliable, filtered search – this pattern applies.

The Bottom Line

Out-of-the-Box Microsoft Copilot Studio is an excellent tool for conversational document Q&A. But when your business needs a complete, fast, reliable search with complex filtering, you need a different architecture.

Our client went from

❌ 15-30 minute manual searches

❌ Inconsistent, incomplete results

❌ Content filtering blocking legitimate queries

❌ No way to filter by multiple criteria

To

✅ Sub-second search results

✅ 100% consistent, complete results

✅ Zero content filtering issues

✅ Full multi-criteria filtering support

✅ Complete control and visibility

The difference? Pre-indexed structured data with a production-grade search engine.

Sometimes the out-of-the-box solution is perfect. Sometimes you need to build something better.

The key is knowing which is which.

What’s Next?

If you’re facing similar challenges like unreliable search results, content filtering issues, or the need for production-grade AI search, we can help.

We offer

POC Assessment (1 week)

Document analysis workshop
Architecture design for your use case
Cost estimate and timeline
Risk assessment

Full POC (2-4 weeks)

Complete implementation with your data
Stakeholder demonstration
Production deployment roadmap

This architecture is proven, scalable, and delivers measurable ROI in weeks.

Ready to go from blocked to brilliant? Contact us and we will be happy to help.

📧 Email us at info@netwoven.com

🌐 Learn more here

💼 Connect on LinkedIn

Appendix: Technical Architecture Diagram

┌─────────────────────────────────────────────────────────┐
│                 USER INTERFACE LAYER                    │
│    Microsoft Copilot Studio  +  React Web App           │
└────────────────────┬────────────────────────────────────┘
                     │
                     │ Natural Language Query
                     ↓
┌─────────────────────────────────────────────────────────┐
│              SEARCH API LAYER (Our Control)             │
│              Azure Function - REST API                  │
│  • Query translation                                    │
│  • Content annotation (prevents filtering)              │
│  • Multi-filter support                                 │
│  • Response formatting                                  │
└────────────────────┬────────────────────────────────────┘
                     │
                     │ Structured Search Query
                     ↓
┌─────────────────────────────────────────────────────────┐
│               SEARCH ENGINE LAYER                       │
│              Azure AI Search Index                      │
│  • Semantic search (context understanding)              │
│  • Vector search (similarity matching)                  │
│  • Keyword search (exact match)                         │
│  • Hybrid ranking (best of all three)                   │
│  • Complex filtering                                    │
└────────────────────┬────────────────────────────────────┘
                     │
                     │ Pre-Indexed Structured Data
                     ↑
┌─────────────────────────────────────────────────────────┐
│            AI ENRICHMENT LAYER (Critical!)              │
│                                                         │
│  ┌──────────────────────────────────────────────────┐   │
│  │      CUSTOM SKILL - Azure Function + GPT-4       │   │
│  │                                                  │   │
│  │  1. Extract text (OCR for scanned docs)          │   │
│  │  2. Call Azure OpenAI GPT-4                      │   │
│  │  3. Use function calling (strict schema)         │   │
│  │  4. Extract 45+ structured fields                │   │
│  │  5. Add content annotations                      │   │
│  │  6. Return JSON (no hallucinations)              │   │
│  │                                                  │   │
│  └──────────────────────────────────────────────────┘   │
└────────────────────┬────────────────────────────────────┘
                     │
                     │ Raw Document Files
                     ↓
┌─────────────────────────────────────────────────────────┐
│                 DATA SOURCE LAYER                       │
│               Azure Blob Storage                        │
│          PDF & Word Documents (250+)                    │
│     Auto-sync from SharePoint (Power Automate)          │
└─────────────────────────────────────────────────────────┘

If your organization is struggling with unreliable AI search, content filtering issues, or unstructured content chaos, Netwoven can help.

Our proven architecture delivers measurable ROI in weeks – not months. Schedule a consultation with our experts to get started.