How to Build a HIPAA-Aware Medical AI Agent with LangGraph, FHIR, and Human-in-the-Loop Triage

10 hours ago
15 min read

Important: The system described in this article is a triage support and administrative tool. It does not diagnose medical conditions, prescribe treatments, or replace the judgment of a licensed healthcare professional. If you are experiencing a medical emergency, call your local emergency services immediately.

There are two moments that define patient experience in healthcare, and both are broken.

The first is the moment a patient notices a symptom and tries to decide what to do. Is this urgent? Should I go to the ER? Can this wait until Monday? Most people have no structured way to answer these questions. Some default to the emergency department — contributing to an estimated $32 billion in avoidable ED costs in the US annually. Others dismiss symptoms that should not be dismissed. The information gap is costly in both directions.

The second moment is scheduling. The patient has decided they need to see someone. Now they navigate a phone tree, get placed on hold, explain their symptoms twice, and wait three days to confirm an appointment that could have been booked in thirty seconds.

The Medical AI Agent addresses both moments — within a strictly bounded scope. It is a LangGraph-powered conversational triage support assistant that helps patients describe and organise their symptoms, classifies urgency using a clinician-reviewed rule-based knowledge graph (not LLM reasoning), escalates unconditionally to emergency services when red-flag patterns are detected, routes to a human clinician for review when confidence is insufficient, handles appointment scheduling via FHIR R4 API integration, and hands off to a telemedicine session when needed — all within a HIPAA-aware architecture that separates protected health information from application state.

What this system does not do: it does not diagnose. It does not recommend treatments. It does not interpret test results. It does not make clinical decisions. Every urgency classification comes from a clinician-reviewed YAML rule file, not LLM inference. The LLM's role is conversation facilitation — helping patients describe their symptoms in an organised way — nothing more.

This architectural constraint is not a limitation. It is the design. The risk in a medical AI system is not obvious failure — it is the plausible-sounding but clinically incorrect response that a patient trusts. Preventing this requires structural separation: the LLM must be architecturally prevented from being in the urgency classification path, not merely instructed not to diagnose.

Real-world use cases this application handles:

Health-tech founders building a HIPAA-aware first-contact layer for a clinic or telehealth platform
AI engineers studying LangGraph HITL patterns, clinical knowledge graph integration, and FHIR APIs in a realistic context
Full-stack developers learning production health-tech architecture with compliance scaffolding included
CS students building a real-world LangGraph project with clinical safety patterns and FHIR integration
Digital health researchers prototyping triage workflow evaluation tools with an IRB-ready audit log

Full source code (synthetic test data only) is available at labs.codersarts.com.

The Safety Architecture First

Before the LangGraph implementation, the safety design must be established — because in a medical application, the safety architecture is not a feature. It is the foundation everything else is built on.

The Three Structural Safety Rules

Rule 1: The LLM does not classify urgency. Ever.

The urgency classification — EMERGENCY, URGENT, SEMI-URGENT, or ROUTINE — comes entirely from a rule-based knowledge graph implemented as a YAML file of symptom patterns. Every entry has a reviewed_by (clinician ID) and reviewed_at (date) field. Any rule without a clinical review annotation is blocked from production by the CI pipeline.

The LLM's job is to conduct the conversation that collects the symptom data. It does not assess the data it collected. Assessment happens in a separate node, in a separate process, without LLM involvement.

Rule 2: The emergency path contains no LLM calls.

When an EMERGENCY classification fires — from the knowledge graph or the parallel red-flag monitor — a pre-written message is delivered to the patient. The message is static. It does not go through gpt-4o. No async operation may block this path. It must reach the patient within 500 milliseconds.

Rule 3: PHI never enters LangGraph state.

The LangGraph checkpointer persists graph state. LangSmith traces log it. Development logs print it. If protected health information is in the LangGraph state, it will eventually appear somewhere it should not.

The SymptomProfile — everything that qualifies as PHI — is stored in a separate, AES-256 encrypted data store, keyed by session_id. LangGraph state contains only session_id, urgency metadata, routing decisions, and de-identified event logs. A PII scrubbing function runs on every LLM response before it is written to conversation history in state.

These three rules are structural, not instructional. They cannot be bypassed by a misconfigured environment variable or a clever prompt.

📄 Before you dive in — grab the free PRD template that maps out this entire system: architecture, API spec, sprint plan, and system prompt. [Download the free PRD]

How It Works: Core Concept

The concept powering this system is parallel safety monitoring alongside a stateful triage conversation, with strict graph-level separation between LLM facilitation and rule-based clinical classification.



PATIENT CONVERSATION PIPELINE:

  Patient sends message
          │
          ▼
  [SESSION INIT NODE]
  Validate session; confirm consent
  (No PHI collection before consent confirmed)
          │
          ├──────────────────────────────────────┐
          ▼                                      ▼
  [SYMPTOM COLLECTOR NODE]          [RED FLAG MONITOR NODE]
  gpt-4o (temp: 0.2)                Static keyword list (< 1ms)
  Collect: chief complaint,          + gpt-4o-mini binary classifier
  duration, severity,                (temp: 0.0, every message)
  associated symptoms,               │
  age group, medications             ├── (EMERGENCY_FLAG)
          │                          │         ▼
          │                          │   [EMERGENCY ESCALATION]
          │                          │   LangGraph interrupt()
          │                          │   Pre-written message ONLY
          │                          │   < 500ms — no LLM call
          │                          │   → 911 / 112 / 999 / 988
          │                          │   → END
          └────────────────┬─────────┘
          (symptom profile complete)
          │
          ▼
  [KNOWLEDGE GRAPH NODE]
  Rule-based YAML matching — NO LLM
  → EMERGENCY → emergency_escalation
  → URGENT + confidence < 0.75 → hitl_review
  → URGENT + confidence ≥ 0.75 → scheduling
  → SEMI-URGENT / ROUTINE → scheduling
          │
          ▼
  [HITL REVIEW NODE] (low-confidence URGENT only)
  LangGraph interrupt() — clinician notified < 30s
  Patient sees holding message
  10-minute SLA → auto-escalation if unanswered
          │
          ▼
  [SCHEDULING NODE]
  FHIR R4: Slot search + Appointment create
  SMART on FHIR OAuth 2.0
          │
          ├── (telemedicine selected)
          │         ▼
          │   [TELEMEDICINE HANDOFF]
          │   Twilio Video session link
          │   FHIR Communication → clinician inbox
          └── (in-person) → Appointment created

The parallel architecture between symptom_collector and red_flag_monitor is the critical pattern. The red-flag monitor runs on every message — it does not wait for the symptom profile to be complete. A patient can describe a stroke symptom in their first message and the escalation fires immediately, before a single clarifying question is asked.

System Architecture Deep Dive

The Medical AI Agent has eight layers. The overriding constraint across all layers is that PHI must not cross into layers 3 and 4 — the LangGraph orchestration and application state layers — except as de-identified identifiers.

Layer 1 — Patient Chat UI (Next.js 15). Conversational chat interface, symptom collection form elements (body location selector, severity scale), urgency result display, appointment slot selector, telemedicine link, and the emergency escalation screen (pre-rendered, not dynamically generated). The UI renders the escalation screen from a static React component — no API call required.

Layer 2 — Clinician Dashboard (Next.js 15). HITL review queue showing pending sessions with de-identified symptom summaries and urgency classifications, approve/override/escalate controls, audit log viewer (compliance officer role only). Receives real-time updates via WebSocket push.

Layer 3 — API Gateway (FastAPI + WebSocket). Session management, WebSocket event streaming, FHIR API proxy (adds OAuth tokens server-side, strips PHI from outbound audit logs), Twilio API proxy, auth middleware, and the HIPAA audit log writer (all PHI in audit events is replaced with de-identified identifiers before writing).

Layer 4 — LangGraph Orchestration Engine. StateGraph definition — 8 nodes, conditional edges, both interrupt() patterns. Compiled with SqliteSaver (prototype) or PostgresSaver (production). The state schema is enforced to contain zero PHI fields at compile time via type annotations.

Layer 5 — Symptom Knowledge Graph. YAML rule file, Python matcher, version registry, and red-flag list. No LLM dependency. Runs synchronously. The knowledge graph version number is the most important audit field — it is the only way to identify which classification logic was applied to any given session.

Layer 6 — LLM Layer (OpenAI gpt-4o). Symptom collection conversation and red-flag language monitoring. Strictly bounded — no urgency assessment, no diagnostic reasoning. Post-generation PII scrubbing runs on every response before any state write.

Layer 7 — Integration Layer. FHIR R4 client (SMART on FHIR OAuth with proactive token refresh), Twilio Video + SMS, OpenFDA drug interaction API (for informational medication alerts only).

Layer 8 — Data & Compliance Layer. Encrypted PHI store (AES-256), append-only audit log (6-year retention minimum per HIPAA), PostgreSQL session metadata (de-identified), LangGraph checkpointer (de-identified).

Architecture Table

Layer	Component	Role
1	Patient Chat UI (Next.js 15)	Symptom collection, urgency display, scheduling, emergency screen
2	Clinician Dashboard (Next.js 15)	HITL review queue, override controls, audit viewer
3	FastAPI + WebSocket	Session management, FHIR proxy, audit logging, auth
4	LangGraph StateGraph	Graph execution, HITL interrupt(), de-identified state
5	Symptom Knowledge Graph	Rule-based YAML urgency classification, red-flag registry
6	OpenAI gpt-4o	Symptom collection conversation, red-flag language monitoring
7	FHIR + Twilio + OpenFDA	Scheduling, telemedicine, medication alerts
8	Encrypted PHI store + Audit Log	PHI storage, HIPAA compliance trail

The Knowledge Graph Pattern

The knowledge graph is the system's most important component and deliberately its simplest. Each rule is a YAML entry:



rules:
  - rule_id: "chest_pain_radiation_v3"
    symptom_pattern:
      - "chest_pain"
      - "arm_radiation"
    exclusion_pattern: []
    urgency_level: "EMERGENCY"
    confidence: 0.95
    specialty: "emergency"
    red_flag: true
    clinical_notes: "Classic STEMI/unstable angina presentation. Immediate emergency response required."
    reviewed_by: "DR_SMITH_001"
    reviewed_at: "2025-04-15"
    version: "3"

  - rule_id: "headache_sudden_severe_v2"
    symptom_pattern:
      - "headache"
      - "sudden_onset"
      - "severity_gte_9"
    exclusion_pattern:
      - "chronic_migraine_history"
    urgency_level: "EMERGENCY"
    confidence: 0.90
    specialty: "emergency"
    red_flag: true
    clinical_notes: "Thunderclap headache — possible subarachnoid haemorrhage."
    reviewed_by: "DR_JONES_002"
    reviewed_at: "2025-04-20"
    version: "2"

The Python matcher is a pure function — no I/O, no LLM, runs in under 5ms:



def match_urgency(symptom_profile: SymptomProfile, kg: KnowledgeGraph) -> TriageResult:
    matched_rules = []
    for rule in kg.rules:
        if not all(s in symptom_profile.symptom_codes for s in rule.symptom_pattern):
            continue
        if any(s in symptom_profile.symptom_codes for s in rule.exclusion_pattern):
            continue
        matched_rules.append(rule)

    if not matched_rules:
        # No match → SEMI-URGENT + low confidence → triggers HITL
        return TriageResult(urgency_level="SEMI_URGENT", confidence=0.40,
                            specialty="general_practice", no_match=True)

    best = max(matched_rules, key=lambda r: urgency_rank[r.urgency_level])
    return TriageResult(urgency_level=best.urgency_level, confidence=best.confidence,
                        specialty=best.specialty, matched_rule_id=best.rule_id,
                        kg_version=kg.version)

The kg_version is logged in the audit trail with every session. If a rule is later found to be clinically incorrect, you can identify every session that was classified by that version.

The "no match" fallback — SEMI-URGENT with confidence 0.40 — is the safety-critical default. Unknown symptom combinations must reach a human clinician, not silently default to ROUTINE.

FHIR R4 Integration Pattern

FHIR R4 is the scheduling standard, and SMART on FHIR is the authentication layer. The two most common implementation failures are token expiry mid-session and transmitting more PHI than necessary.

Proactive Token Refresh



async def get_fhir_token(client: FHIRClient) -> str:
    """Return cached token, refreshing proactively if < 5 minutes remaining."""
    if client.token and client.expires_at:
        if (client.expires_at - datetime.utcnow()).seconds > 300:
            return client.token
    # Refresh token
    resp = await httpx.post(client.token_url, data={
        "grant_type": "client_credentials",
        "client_id": client.client_id,
        "client_secret": client.client_secret,
        "scope": "system/Slot.read system/Appointment.write",
    })
    resp.raise_for_status()
    data = resp.json()
    client.token = data["access_token"]
    client.expires_at = datetime.utcnow() + timedelta(seconds=data["expires_in"])
    return client.token

The 300-second buffer (5 minutes before expiry) prevents 401 errors for patients who take longer than expected during slot selection.

Minimum Necessary Data

The FHIR Appointment resource should contain only fields required for scheduling. The free-text symptom description — the most sensitive PHI field — goes into a FHIR Communication resource transmitted separately to the clinician's inbox, only after patient consent:



# Appointment resource — scheduling staff access
appointment = {
    "resourceType": "Appointment",
    "status": "proposed",
    "priority": urgency_to_fhir_priority[triage_result.urgency_level],
    "serviceType": [{"coding": [{"code": triage_result.specialty}]}],
    "reasonCode": [{"coding": [{"system": "http://snomed.info/sct",
                                 "code": symptom_profile.chief_complaint_code}]}],
    "slot": [{"reference": f"Slot/{selected_slot_id}"}],
    "participant": [{"actor": {"reference": f"Patient/{patient_fhir_id}"},
                     "status": "accepted"}],
    # NOT included: free-text symptom description, medication list, allergy list
}

# Communication resource — clinician inbox only, with patient consent
clinical_summary = {
    "resourceType": "Communication",
    "status": "completed",
    "category": [{"coding": [{"code": "triage-summary"}]}],
    "payload": [{"contentString": format_clinical_summary(symptom_profile, triage_result)}],
    "recipient": [{"reference": f"Practitioner/{clinician_fhir_id}"}],
}

Implementation Phases

Phase 1: Safety Architecture Before Code

Write the emergency escalation function first — as a pure Python function that returns the emergency services number for a given locale. Write a unit test that calls it with ten different red-flag phrasings, measures response time, and asserts delivery in under 500ms. This test must pass before any other code is written.

Key decisions to make before Sprint 1 ends:

PHI store interface: define the SymptomProfile storage and retrieval API; the interface between session_id (LangGraph state) and PHI (encrypted store) must be agreed before the symptom collector is implemented
PII scrubber specification: define the NER model (spaCy en_core_web_sm) and regex patterns; the scrubber must run on every LLM response before any state write — not as an afterthought
Audit log event schema: define all event types, required fields, and de-identification rules; harder to change retroactively than to get right initially
Red-flag keyword list: write the initial static list and have a clinician review it before any patient-facing code runs

Building the emergency escalation function as the first piece of code — structurally isolated from all LLM dependencies — is covered in detail in the full course.

Phase 2: Knowledge Graph and Symptom Collector

Implement the YAML rule loader, the rule matcher, and the symptom_collector LangGraph node. Run the matcher against 20 benchmark test cases — all red-flag patterns achieving 100% recall — before wiring any scheduling integration. The rule benchmark is the system's accuracy baseline.

Key decisions to make:

Symptom normalisation: the LLM collects free-text ("my chest hurts when I breathe"); a separate, auditable mapping step converts this to SNOMED-CT codes ("chest_pain", "pleuritic") before the rule matcher runs — never inline this mapping into the LLM prompt
No-match fallback: any symptom combination that matches no rule must be SEMI-URGENT with confidence 0.40 and trigger HITL; never silently default to ROUTINE
Consent gate: the session_init node must confirm consented: true has been recorded before symptom_collector begins; enforce this with a state field check, not a prompt instruction

Building the consent gate, the SNOMED-CT mapping step, and the knowledge graph accuracy benchmark is covered in detail in the full course.

Phase 3: HITL Patterns — Emergency and Clinical Review

Implement both HITL mechanisms: emergency escalation interrupt() (fires within 500ms, no LLM) and clinician review interrupt() (fires on low-confidence URGENT, waits for clinician action). These are structurally different and must be tested independently.

Key decisions to make:

Emergency interrupt() position: the red_flag_monitor runs in parallel with symptom_collector and can interrupt at any message — not just after symptom profile completion; a patient can trigger escalation in their first message
10-minute SLA enforcement: an asyncio.sleep(600) timer started at HITL trigger; on expiry, the patient sees the emergency services contact and the on-call clinician receives an SMS alert; this is a patient safety requirement
Mental health crisis routing: suicidal ideation and self-harm language must be in the static keyword list with EMERGENCY_FLAG; the escalation message must include 988 Suicide and Crisis Lifeline (US) alongside 911; this must be implemented before any patient-facing deployment

Testing the parallel red_flag_monitor — ensuring it fires before the symptom profile is complete when emergency language appears in the first message — is covered in detail in the full course.

Phase 4: FHIR Scheduling and PHI Audit

Implement FHIR slot retrieval and Appointment creation against the HAPI FHIR sandbox. Then run the PHI state audit — inspect every field of the LangGraph checkpointed state and assert that no SymptomProfile PHI fields appear as values.

Key decisions to make:

PHI state audit implementation: write a test that runs a complete synthetic triage session, retrieves the full state from SqliteSaver, serialises it to JSON, and asserts that none of the known PHI field values appear in the serialised state; fail the CI build if any PHI is found
FHIR error user experience: FHIR API errors (full calendar, rate limit, network failure) must surface a user-facing alternative contact message, not a raw error; the patient should never see a stack trace or HTTP error code

Phase 5: Telemedicine Handoff and Deployment Preparation

Implement the Twilio Video session link generation, the FHIR Communication resource, and the production compliance checklist. Run the complete 6-sprint test suite with synthetic patient data. Prepare the compliance documentation for clinical stakeholder review.

Key decisions to make:

Telemedicine consent: session recording requires a visible UI consent element — a checkbox, not a pre-checked default — before the Twilio recording API is called; log the consent event to the audit trail
Synthetic data discipline: every test in the 6-week development process must use synthetic patient data; define a fixture library in Sprint 1 and use it exclusively; never test with real symptom data, even anonymised

Common Challenges

1. LLM produces diagnostic-sounding language despite system prompt prohibition. Root cause: For common presentations, gpt-4o has seen extensive clinical text in training. Even with explicit prohibition, at temperature > 0.2, it occasionally produces "this could be related to..." or "symptoms like yours are sometimes associated with..." Fix: Set temperature to 0.2. Add a post-generation validator that scans for prohibited phrase patterns. If detected, replace the response with a safe redirect phrase and log the event as a safety violation in the audit trail. Do not rely solely on the system prompt for safety enforcement — the validator is the real enforcement mechanism.

2. Emergency escalation misses novel phrasing of emergency symptoms. Root cause: "My heart is squeezing" describes chest pain. "I can't see right" describes visual disturbance. The static keyword list covers standard phrasings but cannot cover all novel expressions. Fix: Two-layer detection: static keyword list (< 1ms, always runs) plus a gpt-4o-mini binary classifier (temperature 0.0, runs on every message). Either layer triggers escalation. Test with at least 100 diverse phrasings, including regional and idiomatic expressions, before any patient-facing deployment.

3. PHI appears in LangGraph state via LLM echo. Root cause: When the symptom collector confirms collected information ("So you're experiencing chest pain for 2 hours"), it echoes back symptom details that were provided by the patient. If this response is written directly to conversation_history in state, PHI is in the checkpointed state. Fix: Apply the PII scrubber — spaCy NER plus regex patterns — before every state write. The scrubber replaces detected PHI with [REDACTED] placeholders. Log all scrubbing events to the audit trail.

4. SMART on FHIR token expires during a long patient session. Root cause: SMART on FHIR access tokens typically expire after 60 minutes. A patient who takes 65 minutes from symptom collection to slot selection encounters a 401 at the scheduling step. Fix: Proactive token refresh: check expiry before every FHIR API call and refresh if less than 5 minutes remain. Handle 401 responses with a single token refresh + retry; never surface a 401 to the patient.

5. HITL clinician does not respond within the 10-minute SLA. Root cause: Off-hours operation or high triage volume means the dashboard is unmanned. An URGENT patient waits without a response beyond the safe window. Fix: asyncio.sleep(600) timer started at HITL trigger. On expiry: display emergency services contact to the patient with a message explaining the delay; send urgent SMS to on-call clinician; log the SLA breach to the audit trail with timestamp and session ID.

6. Knowledge graph confidence values are set too optimistically. Root cause: Developers set initial confidence values without clinical calibration. High-confidence rules skip HITL even on genuinely ambiguous presentations. Fix: Start with conservative values — cap all first-version rules at 0.75. Run the knowledge graph against the 200-case clinical benchmark and measure HITL trigger rate against the expected clinical rate. Adjust confidence values iteratively, with clinician sign-off on each adjustment.

7. Patient describes a mental health crisis during the triage flow. Root cause: The symptom knowledge graph focuses on physical presentations. A patient expressing suicidal ideation receives symptom collection questions instead of crisis resources. Fix: Mental health red-flag patterns (suicidal ideation, self-harm, "I don't want to be here anymore") must be in the static keyword list with EMERGENCY_FLAG. The escalation message must include 988 Suicide and Crisis Lifeline (US) alongside 911. This is not an edge case — it is a foreseeable user need, and it must be implemented before any patient-facing deployment.

8. OpenAI API outage partially degrades the emergency detection path. Root cause: The gpt-4o-mini secondary emergency detector is unavailable during an OpenAI outage. The static keyword list continues operating, but novel-phrasing coverage is reduced. Fix: During an OpenAI outage, the system must log a DEGRADED_MODE alert and continue operating with static-only detection. The system must not go offline because its secondary emergency detector is unavailable. Monitor OpenAI API status and alert operations when the system enters degraded mode.

A Note on Production Deployment

Everything in this article describes development architecture using synthetic test data and sandbox credentials.

Before any real patient data enters this system, the following are required — this list is not exhaustive:

HIPAA Business Associate Agreements with OpenAI, your cloud provider, Twilio, and every vendor whose service may touch PHI
Clinical review of the complete knowledge graph by at least two qualified clinicians, with documented sign-off on every rule
Legal review of all patient-facing disclosures, consent language, and terms of service by a healthcare attorney
Penetration testing by a third party focused on healthcare application security
PHI encryption upgraded from development Fernet to a managed KMS (AWS KMS, Azure Key Vault, or GCP Cloud KMS)
Production FHIR endpoint agreement with the EHR provider — a contractual relationship with the health system, not just an API integration
Staff training on HITL operational procedures, SLA expectations, and escalation protocols

The course at labs.codersarts.com includes a compliance checklist designed to be shared with your legal and clinical teams.

Ready to Build This Yourself?

The gap between this article and a working, compliance-ready Medical AI Agent includes: PHI separation that survives code review, a knowledge graph that passes clinical validation, FHIR integration that handles real-world API variability, and a compliance checklist your legal team can sign off on.

The Medical AI Agent course on labs.codersarts.com gives you everything you need:

✅ Full source code for all 6 sprints — LangGraph backend + Next.js chat widget + clinician dashboard, fully commented

✅ Symptom knowledge graph with 50+ rules using synthetic clinical data (clinician-reviewed format, ready for your clinical team to expand)

✅ FHIR R4 integration with HAPI FHIR sandbox — SMART on FHIR OAuth, proactive token refresh, minimum necessary data

✅ HITL clinician dashboard fully wired — approve / override / escalate, 10-minute SLA enforcement

✅ Emergency escalation path with < 500ms latency verification tests

✅ PHI separation architecture with PII scrubber, encrypted PHI store, and mandatory PHI state audit

✅ HIPAA-aware audit logging scaffold — append-only, encrypted, 6-year retention design

✅ Production compliance checklist for sharing with your legal and clinical teams

✅ Lifetime access — including updates as FHIR standards and LangGraph APIs evolve

✅ Community support via the Codersarts Discord

$30 Everything above. Synthetic data throughout.

Get the Full Course → labs.codersarts.com

Need help with a specific EHR integration, clinical workflow, or deployment context? Book a 1:1 guided session at $20/hour — work through the FHIR endpoint configuration, knowledge graph design, and compliance architecture for your specific deployment alongside the Codersarts team. Session recording included.

Conclusion

The Medical AI Agent is an eight-layer system built on two architectural principles that cannot be compromised: the LLM does not classify urgency (that belongs to a clinician-reviewed rule-based knowledge graph), and the emergency path contains no LLM calls (a pre-written message, delivered in under 500ms, unconditionally). Everything else — FHIR scheduling, HITL clinician review, telemedicine handoff, PHI-separated data model — is meaningful, but secondary to those two rules.

The simplest starting point: LangGraph + FastAPI + HAPI FHIR sandbox + YAML knowledge graph + SQLite. No Twilio, no managed KMS, no PostgreSQL. You can have a working triage conversation with a functioning knowledge graph, an emergency escalation path, and a HITL breakpoint running locally — using synthetic patient data throughout — in a weekend.

When you are ready to move from architecture to working code, the full course is waiting at labs.codersarts.com — complete source, compliance scaffold, and a clinical-ready knowledge graph format included.

Reminder: This system is a triage support and administrative tool. It does not diagnose medical conditions. All urgency classification uses a clinician-reviewed rule-based knowledge graph, not LLM reasoning. Production deployment requires clinical review, HIPAA compliance, and BAAs with all vendors handling patient data.

How to Build a HIPAA-Aware Medical AI Agent with LangGraph, FHIR, and Human-in-the-Loop Triage

The Safety Architecture First

The Three Structural Safety Rules

How It Works: Core Concept

System Architecture Deep Dive

Architecture Table

The Knowledge Graph Pattern

FHIR R4 Integration Pattern

Proactive Token Refresh

Minimum Necessary Data

Implementation Phases

Phase 1: Safety Architecture Before Code

Phase 2: Knowledge Graph and Symptom Collector

Phase 3: HITL Patterns — Emergency and Clinical Review

Phase 4: FHIR Scheduling and PHI Audit

Phase 5: Telemedicine Handoff and Deployment Preparation

Common Challenges

A Note on Production Deployment

Ready to Build This Yourself?

Conclusion

Recent Posts

Comments