Schema Diagrams: Bidirectional Visualization for the Schema Languages That Need It Most
Table of Contents
- 1. About schemaVisualization dataModelling
- 2. The Visualization Gap schemas toolingGap
- 3. The State of Schema Visualization Tooling marketAnalysis
- 4. How Schema Diagrams Works features
- 5. Benefits productivity collaboration
- 6. Beyond Avro extensibility protobuf jsonSchema
- 7. Getting Started bootstrap ai
- 8. Contributing openSource
- 9. Conclusion
- 10. tldr
1. About schemaVisualization dataModelling
Figure 1: JPEG produced with DALL-E 4o
If you've ever worked with a relational database, you've almost certainly seen an entity-relationship diagram. Maybe it was generated by DataGrip or pgAdmin, maybe someone drew it in Lucidchart, maybe it was a dbdiagram.io sketch. Regardless – the point is you saw the data model. You could point at boxes, trace lines, and say "this table references that one."
Now think about the last time you reviewed an Avro schema. If you're like most engineers, you opened a JSON file, scrolled through hundreds of lines of deeply nested objects, and mentally assembled the structure in your head. No boxes. No lines. Just JSON.
This gap – between the rich visual tooling that SQL databases have enjoyed for decades and the near-complete absence of equivalent tooling for schema languages like Avro – is what Schema Diagrams aims to close. It's a diagrams-as-code tool that renders interactive entity-relationship diagrams from Avro schemas, with bidirectional sync between the code editor and the visual canvas. Edit the code, the diagram updates. Edit the diagram, the code updates. Everyone works on the same artifact.1
This post is a companion to my earlier piece on schema language selection, which explored which schema language to choose. This post asks a different question: once you've chosen, how do you actually see what you've built?
2. The Visualization Gap schemas toolingGap
Avro schemas are defined as JSON. This is a reasonable serialization choice – JSON is ubiquitous, parsable everywhere, and schema registries speak it natively. But it's a terrible authoring format for complex data models.
Consider a moderately complex schema: a User record with nested Address, PaymentMethod, and OrderHistory types, each referencing further records and enums. In raw .avsc JSON, this might look like 300 lines of nested objects, arrays, and union types. The structural relationships – which records reference which, what the cardinality is, where the enums live – are buried in syntax2.
This creates real problems:
- PR reviews become rubber stamps. When a schema change is a diff of 50 lines of JSON, reviewers skim rather than analyze. Subtle issues – an accidental field removal, a type change that breaks backward compatibility, a new nullable union that downstream consumers don't handle – slip through.
- Onboarding stalls. A new engineer joining a team with 40 Avro schemas across a Kafka-based architecture faces weeks of spelunking through
.avscfiles to build a mental model of the data domain. With SQL, they'd open an ERD and have the big picture in minutes. - Cross-team communication breaks down. When a data engineer needs to explain a schema to a product owner or data analyst, they have two options: hand them the raw JSON (useless) or manually draw a diagram in Lucidchart (labor-intensive and immediately stale). Neither is sustainable.
- Duplicate schemas proliferate. Without a visual map of existing types, teams create new schemas that partially duplicate existing ones. In large organizations, this leads to dozens of slightly different
AddressorTimestampdefinitions scattered across registries.
SQL databases solved these problems ages ago. You can point any of thirty-plus tools at a database, and within seconds you have an ERD showing every table, column, relationship, and constraint. The schema languages powering modern event-driven systems – the schemas that are the API contracts between services – have nothing comparable.
3. The State of Schema Visualization Tooling marketAnalysis
The landscape splits along two axes: which schema language is supported, and whether the tool is one-directional (code → diagram) or bidirectional (code ↔ diagram). The findings are sobering.
3.1. SQL ERD: The Gold Standard
SQL has been around since the 1970s, and its visual tooling reflects that maturity. Over thirty actively maintained tools exist:
- Free web-based: dbdiagram.io (with its DBML diagram-as-code format), DrawDB, DrawSQL, ChartDB, ERDPlus, QuickDBD
- Commercial: Lucidchart, Vertabelo, SQLDBM, DataGrip, Navicat
- Open-source CLI: SchemaSpy, SchemaCrawler, tbls, ERAlchemy
Many of these support true bidirectional workflows. DBML, the language behind dbdiagram.io, lets you write schema definitions in a concise DSL, generate visual diagrams, and export DDL for multiple database engines. The round-trip is seamless: design visually, export code; import code, refine visually.
AI-powered options are emerging too – ChartDB includes an AI agent for generating schemas, and Eraser.io can generate ERDs from natural language. This is what mature tooling ecosystems look like.
3.2. Avro: The Tooling Desert
Now compare to Avro. The total number of visualization tools I've been able to identify: roughly six.
- Hackolade Studio (~$700/year per seat) – The most capable option. It supports forward-engineering (diagram →
.avsc) and reverse-engineering (.avsc→ diagram), generates ERD-style views, and integrates with Confluent Schema Registry. But it's a proprietary desktop application, not a diagram-as-code tool, and the price point puts it out of reach for many teams. - bol.com Avro Schema Viewer (free, open source) – A web-based hierarchical tree viewer for
.avscfiles. It renders schemas as expandable/collapsible trees with URL-based navigation. It's useful for browsing, but it's read-only and doesn't produce relationship diagrams. You can see the tree, but you can't see the forest. - schema-uml (free, open source) – A Python tool that converts
.avdland.protofiles into UML diagrams via Graphviz. One-directional: schema → diagram. No editing, no round-trip. - Javro (free) – An Avro schema editor with autocomplete and JSON preview. Focused on authoring, not visualization. No diagram output.
- Schema Registry UIs (Lenses, Confluent Cloud) – Show schemas as raw JSON with version history. No diagram view, no relationship visualization.
That's it. Six tools, two of which are commercial, and only one of which offer bidirectional diagram-as-code editing. Compare thirty-plus for SQL, with multiple free bidirectional options, AI integration, and active communities. The gap is staggering3.
3.3. Protobuf and JSON Schema
The other major schema languages fare somewhat better, but still lag far behind SQL.
Protocol Buffers benefit from Google's ecosystem. proto-gen-md-diagrams (from Google Cloud Platform) generates Markdown documentation with embedded Mermaid UML diagrams from .proto files. protobuf-uml-diagram and protobuf2uml provide additional UML generation. Protobuf's .proto format is also inherently more readable than Avro's JSON schema, which reduces (but doesn't eliminate) the need for visualization.
JSON Schema has the broadest visualization ecosystem among non-SQL formats: JSON Crack (graph visualization of any JSON), Atlassian's JSON Schema Viewer, IntelliJ plugins, and even a GSoC 2025 project specifically focused on interactive graphical schema viewing.
But even JSON Schema visualization is nowhere near SQL's maturity. And critically, across all three non-SQL formats, the category of bidirectional diagram-as-code tools is essentially empty.
3.4. The Missing Category: Bidirectional Diagram-as-Code
This is the core insight. The tools that exist for Avro (and Protobuf, and JSON Schema) fall into two categories:
- Viewers – one-directional tools that render a schema as a diagram or tree. Useful for understanding, but the output is a dead end. You can't edit the diagram and have the changes flow back to your schema code.
- Editors – authoring tools with autocomplete and validation, but no visual output. You're still staring at text.
What's missing is the third category: tools where the code and the diagram are the same artifact, kept in sync bidirectionally. In the SQL world, DBML and dbdiagram.io pioneered this. In the architecture diagramming world, this is exactly what CN Diagrams does for architecture diagrams.
Schema Diagrams brings this same bidirectional paradigm to schema visualization. Write Avro JSON or IDL in the code editor, see the diagram update in real time. Click a field in the diagram to rename it, and the code updates. The code is the diagram, and the diagram is the code.
4. How Schema Diagrams Works features
Schema Diagrams provides several capabilities designed to make Avro schemas tangible. You can try it directly at Schema Diagrams.
4.1. Bidirectional Editing
The Monaco editor (the same engine behind VS Code) and the Svelte Flow canvas stay synchronized. Add a field in the code, it appears as a row in the corresponding entity node. Click a field name in the diagram to rename it, and the code updates. Change a field's type via the visual dropdown, and the JSON or IDL reflects the change.
This eliminates the choice between maintainability and accessibility. Engineers version-control the schema code through pull requests. Data analysts and product owners explore and propose changes through the visual interface. Both are editing the same artifact.
4.2. Dual Format Support
Schema Diagrams supports both major Avro representations:
- Avro JSON (
.avsc) – the format that Schema Registries consume, and the format most Avro tooling produces. Verbose but ubiquitous. - Avro IDL (
.avdl) – the human-readable interface definition language. More concise, closer to how engineers think about types.
Format detection is automatic. Paste a JSON schema, and the tool recognizes it. Switch to IDL, and it parses that instead. Both formats are first-class citizens4.
4.3. Automatic Layout and Inline Validation
Schema Diagrams uses ELK (Eclipse Layout Kernel) to position entity nodes automatically. There's no manual arrangement required – the layout algorithm handles node positioning, edge routing, and spacing. This is a deliberate design choice: the diagram's layout is deterministic based on the schema structure, which means it stays consistent as schemas evolve.
The Monaco editor provides inline validation with squiggly underlines and error markers at specific line/column positions. Syntax errors, missing fields, and type resolution failures surface immediately as you type – no waiting for a separate compilation or validation step.
4.4. Relationship Discovery
The tool automatically detects and visualizes relationships between schema types:
- Reference edges (blue) – when a field's type is another named record
- Nested record edges (purple) – when a record is defined inline within another
- Join edges (orange, dashed) – explicit cross-schema relationships declared via
@joinannotations in IDL orx.joinmetadata in JSON
Each edge includes cardinality labels (1:1, 1:N, N:1, N:N), making the data model's topology immediately visible. Records can be collapsed to reduce visual clutter while maintaining relationship visibility – useful when working with schemas that have dozens of fields.
5. Benefits productivity collaboration
5.1. Cross-Persona Accessibility
The bidirectional sync is not just a technical feature – it's an organizational one. Data engineers author schemas in IDL or JSON. Data analysts read the diagram. Product owners can understand the data model without parsing nested JSON. Governance teams audit schema structures visually. Everyone contributes in their preferred medium, and everyone stays in sync.
This matters especially in organizations practicing data mesh, where data products are owned by domain teams rather than a central data platform. The schema is the product's interface, and that interface needs to be readable by consumers who may not share the producer's technical background.
5.2. Schema Reviews That Work
Pull request reviews of .avsc files are notoriously unproductive. A 50-line JSON diff tells you what changed syntactically, but not what it means structurally. Did we add a new entity? Change a relationship? Break backward compatibility?
Paste both versions into Schema Diagrams and the structural changes become immediately visible. The visual representation surfaces the intent of the change, making reviews faster and more thorough.
5.3. Faster Onboarding
New team members joining a Kafka-based architecture with dozens of Avro topics can explore schemas visually rather than reading raw JSON files one by one. The diagram reveals the data domain's structure at a glance – which records reference which, where the enums live, what the cardinalities are – providing an overview that would otherwise take days to assemble mentally.
5.4. Schema Evolution Awareness
Avro's schema evolution rules – backward compatibility, forward compatibility, full compatibility – are powerful but difficult to reason about from raw JSON alone. While Schema Diagrams is not a full evolution validator, the visual representation makes structural changes between schema versions obvious, helping teams catch potential issues before they reach production.
6. Beyond Avro extensibility protobuf jsonSchema
The architectural pattern behind Schema Diagrams – parse schema → build graph model → automatic layout → bidirectional sync – is fundamentally schema-agnostic. The parser produces an intermediate representation of entities, fields, and relationships. The layout engine and visual editor don't care whether those entities came from Avro, Protobuf, or JSON Schema.
The current focus is Avro because the need is most acute there – it has the weakest visualization tooling relative to its widespread use in streaming data platforms. But the framework is designed to be extensible. Protocol Buffers, JSON Schema, Thrift, and even GraphQL SDL could all be supported by adding a parser that produces the same intermediate graph representation.
The vision is a single tool where teams can visualize any schema format with the same bidirectional editing experience, regardless of which serialization technology their platform uses.
7. Getting Started bootstrap ai
The fastest path to a useful diagram is simple: paste a schema into the editor at Schema Diagrams and watch the diagram appear. Built-in example schemas (simple records, deeply nested hierarchies) provide starting points if you want to explore the tool before bringing your own data.
For teams with existing schemas spread across a Schema Registry, AI assistance can accelerate the process. LLMs are remarkably good at reading structured formats like JSON and IDL, and they can help consolidate, clean up, or annotate schemas before visualization. The workflow is straightforward:
- Export schemas from your Schema Registry (Confluent CLI, REST API, etc.)
- Ask an AI agent to consolidate related schemas, add
@joinannotations for cross-schema relationships, or convert between JSON and IDL - Paste the result into Schema Diagrams
- Refine visually – renames, type adjustments, and field additions happen directly on the canvas
This isn't as transformative as AI-bootstrapping an entire architecture diagram from a codebase (as described in the CN Diagrams post), because schemas already exist as structured text. But it meaningfully lowers the barrier to getting a comprehensive visual overview of a data domain that might span dozens of .avsc files.
8. Contributing openSource
Schema Diagrams is open source under the MIT license. The source code is available at github.com/chiply/schema-diagrams.
8.1. Tech Stack
For those interested in contributing, Schema Diagrams is built with:
- SvelteKit (Svelte 5) – Application framework with runes for reactivity
- Svelte Flow (@xyflow/svelte) – Interactive node/edge diagram canvas
- ELK (elkjs) – Automatic graph layout engine
- Monaco Editor – VS Code's editor component for schema authoring
- TypeScript throughout
- Vercel – Deployment platform
8.2. How to Contribute
Contributions are welcome in several forms:
Bug Reports: If something doesn't work as expected, open an issue with steps to reproduce. Include your browser and any error messages from the console.
Feature Requests: Have an idea for improvement? Open an issue describing the use case and proposed solution. New parser support (Protobuf, JSON Schema) is a particularly impactful area.
Pull Requests: Fork the repository, create a feature branch, and submit a PR. Please include tests for new functionality and ensure existing tests pass.
Documentation: Improvements to README, examples, or inline comments are valuable contributions that don't require deep code knowledge.
The project is early-stage, and there's significant opportunity to shape its direction. Parser contributions for additional schema formats would be especially welcome.
9. Conclusion
Schema visualization shouldn't be a luxury reserved for SQL databases. The schema languages powering modern event-driven architectures – Avro, Protobuf, JSON Schema – define the contracts between services, the shape of data flowing through pipelines, and the interfaces of data products. Those contracts deserve the same visual tooling that ERDs have provided for relational databases for decades.
Schema Diagrams is an attempt to close that gap, starting with the format that needs it most. Avro's JSON verbosity, its deep nesting, its implicit relationships – these aren't flaws in the format, but they do make visualization essential rather than optional. And by building on the bidirectional sync paradigm, the tool serves not just the engineer who writes the schema, but every persona in the organization who needs to understand it.
Try it at Schema Diagrams. Browse the source at github.com/chiply/schema-diagrams. If you build something interesting or have feedback, I'd love to hear about it.
10. tldr
tl;dr: Schema Diagrams is an open-source tool that brings bidirectional, diagram-as-code visualization to Apache Avro schemas – closing a gap that SQL ERD tools filled decades ago. The problem is straightforward: Avro schemas are defined as deeply nested JSON, and the tooling for visualizing them is nearly nonexistent. While SQL has thirty-plus ERD tools (many free, many bidirectional), Avro has roughly six – two commercial, the rest read-only viewers or one-directional generators. No bidirectional diagram-as-code tool existed for Avro.
Schema Diagrams fills this gap with a Monaco editor (VS Code's engine) synchronized bidirectionally with a Svelte Flow canvas. Edit Avro JSON or IDL in the code editor, and the ERD updates in real time. Click fields in the diagram to rename them, change types, or add new fields, and the code updates. The tool supports both .avsc JSON and .avdl IDL with automatic format detection, uses ELK for deterministic automatic layout, and renders relationship edges with cardinality labels.
The payoff is organizational: engineers version-control schemas through PRs while analysts and product owners explore visually – everyone works on the same artifact. Schema reviews become structural rather than syntactic, onboarding accelerates, and cross-team communication improves. The architecture is schema-agnostic – Protobuf, JSON Schema, and other formats could reuse the same engine. Try it at the Schema Diagrams page or explore the source at github.com/chiply/schema-diagrams.
Footnotes:
This is the same bidirectional sync paradigm behind CN Diagrams, my architecture diagramming tool. The core insight is the same: when code and visuals are the same artifact, you don't have to choose between maintainability and accessibility.
Avro IDL (.avdl) is significantly more readable than the JSON representation, and Schema Diagrams supports both. But in practice, most Schema Registry tooling and most Avro codegen pipelines work with .avsc JSON, so that's what engineers end up reading and reviewing.
There's an irony here. Avro was designed specifically for data exchange – it's a schema-first format built for interoperability between systems. Yet the tooling for understanding those schemas across teams is almost nonexistent. The format that most needs cross-persona visualization has the least of it.
Automatic format detection sounds simple, but it matters more than you might think. In practice, engineers switch between JSON and IDL constantly – JSON for registry interactions, IDL for human authoring. A tool that requires you to specify the format adds friction. One that figures it out adds flow.