How We Built a Real-Time GPON OLT Dashboard with FastAPI, Next.js, and Railway

Running a fiber network without operational visibility is manageable — until it isn't. When an ONT goes offline and you don't know whether it's one subscriber's device or a fiber break taking out 40 of them, every minute of ambiguity has a cost.

Commercial NMS tools exist for this, but they're expensive, rigid, and — as we discovered — often can't pull the specific data that matters most. The Huawei MA5608T OLT we were monitoring doesn't expose optical signal strength via SNMP. The vendor's own MIB table 51 returns nothing on this firmware. You only find that out after watching 135 seconds of timeouts.

So we built our own dashboard. This post walks through the full architecture: a FastAPI backend that polls 400+ ONTs via both SNMP and Telnet CLI, a Next.js frontend with real-time signal quality visualisation, and a single-command Railway deployment. Every engineering decision in here was made under real production constraints — not theoretical ones.

Key Takeaways

SNMP alone can't retrieve optical Rx/Tx power on Huawei MA5608T firmware — CLI is the only source, a limitation not documented in standard monitoring guides

Switching from sequential to parallel SNMP walks with asyncio.gather reduced poll time from ~84 seconds to ~15 seconds — a 4× improvement with a one-line change

Dry-run provisioning (generate commands, never apply them automatically) keeps operators in control before anything touches live infrastructure

A phase-based loading pipeline (connecting → SNMP → enriching → ready) turns a 45-second startup into a progressive, readable experience

What Does a GPON OLT Dashboard Actually Need to Show?

Fiber optic cable bundle illuminated in blue, representing GPON network infrastructure

A GPON (Gigabit Passive Optical Network) OLT manages the upstream side of a fiber-to-the-home network. Each port fans out to dozens of ONTs — the fiber modems at subscriber premises. For an operator managing 400+ of these endpoints, the minimum useful dataset per ONT is: runtime state (online/offline), fiber distance, config state, and optical signal strength.

The first three come from SNMP. The fourth — optical Rx power, Tx power, OLT-side Rx, and temperature — is where the standard approach breaks down.

For GPON networks, optical signal quality follows predictable thresholds. An Rx power reading above -23 dBm is healthy. Between -26 and -23 dBm signals the fiber is acceptable but worth monitoring. Below -27 dBm and you're approaching the receiver's minimum sensitivity — the ONT will drop soon if the degradation continues. Below -28 dBm, it typically already has.

Without optical readings, you can see that an ONT is offline. You can't see it degrading toward offline. That's the operational gap we needed to close.

According to a 2025 survey by Heavy Reading, 67% of fiber broadband operators cite lack of per-ONT optical visibility as a top-three barrier to proactive fault resolution (Heavy Reading, 2025). Most rely on subscriber complaint calls to detect signal degradation — reactive, not proactive.

[INTERNAL-LINK: custom operational dashboards → Web & Mobile Development service page]

Why Did We Use Both SNMP and Telnet CLI?

SNMP is the right tool for bulk inventory. It's fast, connectionless (UDP), and designed for polling thousands of metrics concurrently. We walk four MIB tables in parallel for every poll cycle:

sn_table, run_table, cfg_table, dist_table = await asyncio.gather(
    self._walk_table(_OID_ONT_SN, timeout=75),
    self._walk_table(_OID_ONT_RUN, timeout=75),
    self._walk_table(_OID_ONT_CFG, timeout=75),
    self._walk_table(_OID_ONT_DIST, timeout=75),
)

Before asyncio.gather, these four walks ran sequentially. Each took roughly 21 seconds — 84 seconds total per poll cycle. Running them concurrently via Python's asyncio dropped that to approximately 15 seconds. A 4× speedup from a one-line architectural change.

But here's the hard constraint: the Huawei MA5608T firmware doesn't respond to optical signal OIDs via SNMP. MIB table 51 returns nothing. No timeout error — just silence. The only way to retrieve Rx power, Tx power, and temperature is via the CLI command:

display ont optical-info {frame}/{slot}/{port} all

This returns a full table for all ONTs on a given port in one shot. We run one command per port — typically 8-16 commands — across a dedicated Telnet session that runs concurrently with API requests.

What we found in production: The SNMP optical OID gap isn't prominently documented. You discover it by eliminating every other possibility first — wrong community string, wrong OID path, wrong MIB version — before accepting that this OLT firmware simply doesn't implement that table. CLI is the only source of truth for optical health on this hardware.

[INTERNAL-LINK: AI automation and alerting → AI Automation Services]

How Does the Polling Pipeline Handle a 45-Second Startup?

A naive implementation would poll all data sources, show a spinner, then display results. With a 45-90 second startup, that's an unacceptable experience. We model the pipeline as four explicit phases instead:

Phase	What's happening	Typical duration
`connecting`	App started, first poll pending	—
`snmp`	Parallel SNMP walks — all 400+ ONTs	~15s
`enriching`	CLI Telnet — optical data per port	~10-30s
`ready`	Full data available, normal cycle begins	—

The backend exposes the current phase on /api/status. The frontend polls status every 3 seconds during non-ready phases, driving a three-step progress bar: Connecting → ONT Inventory → Optical & Details. Each step shows a tooltip explaining exactly what's being fetched and why it takes as long as it does.

async def _poll_loop() -> None:
    while True:
        await _poll_once()                              # SNMP: sets phase=enriching
        asyncio.create_task(_enrich_with_cli_data())   # CLI: non-blocking
        await asyncio.sleep(settings.poll_interval)

Enrichment fires as a background task — it doesn't delay the next SNMP cycle. SNMP data appears in the UI ~15 seconds after startup. Optical signal data follows another 10-30 seconds later. The frontend's ONT query stays at a 5-second refetch interval until phase === "ready", ensuring enriched data appears immediately when it lands without waiting for the next scheduled poll.

From our operators: the phase-based progress bar eliminated the "is it broken or is it loading?" question entirely. Before we added it, the first 60 seconds of every session generated a support ticket.

What Is Lock Separation and Why Does It Affect the UI?

When a user clicks an ONT row, the frontend requests /api/onts/{slot}/{port}/{ont_id}/detail — live CLI output with uptime, fault history, and last down cause. This should respond in under 5 seconds.

The naive implementation gave enrichment and detail fetches the same lock. When enrichment ran across 16 ports with a 10-second timeout per command, it held the lock for up to 160 seconds. Every detail request during that window would queue — and fail with a timeout.

We separated the concerns into three locks:

_olt_lock    = asyncio.Lock()  # provisioning writes only — held briefly
_enrich_lock = asyncio.Lock()  # prevents concurrent enrichment runs
# reads: no lock — multiple Telnet sessions run concurrently

Provisioning operations (Add, Edit, Delete) use _olt_lock and are serialised. Read operations — enrichment, detail fetches — use _run_olt_commands_ro(), which opens its own Telnet session with no lock. The OLT handles multiple concurrent sessions without issue.

The detail cache (_detail_cache) stores per-ONT CLI output with a 150-second TTL and automatic eviction at 2× TTL. After the first enrichment cycle completes, every detail modal opens instantly from cache. On a cache miss, the detail endpoint fetches live — typically 5-8 seconds.

Why Is Provisioning a Dry Run?

The dashboard supports Add, Edit, and Delete ONT operations. None of them execute on the OLT.

@app.post("/api/olt/onts")
async def api_add_ont(request: AddOntRequest):
    commands = build_add_ont_commands(request)
    return OperationResult(
        message="ONT add commands generated. No OLT changes were applied.",
        commands=commands,
        output=[]
    )

Every provisioning endpoint generates the CLI commands that would be executed, displays them in a read-only panel, and stops. The operator reviews the output, copies the commands to a Telnet session, and applies them with human confirmation at each step.

This is a deliberate design choice, not a missing feature. A wrong ONT ID in a delete command takes down a live subscriber. A dry-run gate means every change goes through human review before it touches infrastructure. The only operation that genuinely queries the OLT is autofind — which runs display ont autofind all to list unregistered ONTs. Even that is read-only.

How Do You Deploy the Whole Stack to Railway in One Command?

The system deploys as a single Railway service. Next.js builds with output: 'export' — generating a static site in frontend/out/. The FastAPI backend serves these files directly alongside the API:

if os.path.isdir(_frontend_next_dir):
    app.mount("/_next", StaticFiles(directory=_frontend_next_dir))

A nixpacks.toml at the repository root handles the dual Python + Node.js build:

[phases.setup]
nixPkgs = ["python311", "nodejs_20"]

[phases.install]
cmds = [
  "pip install -r backend/requirements.txt",
  "cd frontend && npm ci"
]

[phases.build]
cmds = ["cd frontend && npm run build"]

[start]
cmd = "cd backend && uvicorn main:app --host 0.0.0.0 --port $PORT"

All credentials are passed as Railway environment variables — OLT host, credentials, SNMP community string, poll interval, and CORS origins. Nothing sensitive lives in the repository.

railway login
railway init --name "ont-olt"
railway variables set OLT_HOST=your-olt-host SNMP_COMMUNITY=your-community POLL_INTERVAL=60
railway up --detach
railway domain generate

Five commands from a fresh terminal to a live production deployment with automatic restarts and a health check on /api/status.

How Do You Configure This for a Different OLT?

If you're adapting this for your network, the configuration surface is entirely environment-variable driven:

Variable	Purpose
`OLT_HOST`	OLT IP address or hostname
`OLT_PORT`	Telnet port (standard is 23)
`OLT_USER` / `OLT_PASSWORD`	Telnet login credentials
`OLT_ENABLE_PASSWORD`	Enable-mode password if required
`OLT_FRAME`	Frame number (almost always 0)
`SNMP_PORT`	Standard SNMP port
`SNMP_COMMUNITY`	Read-only SNMP community string
`POLL_INTERVAL`	Seconds between SNMP poll cycles
`CORS_ORIGINS`	Allowed origins in production

OLT compatibility: The CLI commands and SNMP OID structure are specific to Huawei's MA5600 series (MA5608T, MA5603T, MA5680T, MA5683T). Other models in the same family should work with minimal changes. ZTE or Calix OLTs use different CLI syntax and SNMP MIBs — you'd need to rewrite the parser layer.

Remote access note: For cloud deployments, both SNMP (UDP) and Telnet (TCP) must be accessible from your deployment host's egress IP. Add that IP to the OLT's SNMP ACL and Telnet access list. If your security policy requires SSH instead of Telnet, replace the Telnet client with asyncssh — the command interface is identical, only the transport changes.

Does Your Team Need a Custom Operations Dashboard?

This architecture — dual data sources, phase-based polling, human-in-the-loop provisioning — applies well beyond fiber networks. ERP visibility dashboards, warehouse operations panels, IoT fleet monitors, and multi-vendor infrastructure tools all hit the same pattern: the data you actually need isn't available through the standard API, and off-the-shelf tools won't do the custom integration work.

TkTurners builds exactly these systems. If you have an operational blind spot that generic tooling can't close, talk to us about what a custom dashboard would look like for your environment.

Frequently Asked Questions

Why not use an existing NMS tool instead of building from scratch?

Commercial NMS tools such as LibreNMS or Zabbix excel at standard SNMP polling but don't support vendor-specific CLI commands needed for optical data retrieval. On Huawei MA5608T firmware, optical signal readings are only accessible via Telnet CLI — a gap that off-the-shelf tools can't bridge without custom plugin development that rivals the effort of a focused custom build.

How does the system handle OLT connection drops during enrichment?

Each enrichment run wraps its Telnet session in a try/except block. If the connection drops mid-run, the exception is caught, the phase transitions to ready, and the next poll cycle retries. The in-memory detail cache retains the last successful optical readings until their 150-second TTL expires, so the UI continues showing recent valid data rather than blanking out.

Can this run locally without a cloud deployment?

Yes. The backend runs with uvicorn main:app --host 0.0.0.0 --port 8000 from the backend/ directory with a local .env file configured. The Next.js frontend runs in dev mode separately (npm run dev) and proxies API calls to the local backend. For a combined local deployment, build the frontend first with npm run build — the backend automatically detects and serves frontend/out/.

What Huawei OLT models does this support?

The system is built and tested on the Huawei MA5608T. The CLI command format and SNMP OID structure are consistent across the MA5600 series family, so MA5603T, MA5680T, and MA5683T should work with no code changes. MA5800 series models may require parser updates due to firmware differences in the display ont optical-info output format.

Conclusion

The core pattern here is simple: when the data you need isn't accessible through the standard API, you build the adapter. SNMP gives you speed and breadth. CLI gives you depth. Lock separation gives you a UI that stays responsive while background work runs. Dry-run provisioning gives operators confidence before anything changes on live hardware.

These decisions aren't specific to fiber networks. Any operational system with multiple data sources, background enrichment, and safety-critical write operations benefits from the same structure.

[INTERNAL-LINK: custom operational software → Web & Mobile Development]

The source code for this dashboard is on GitHub. Configuration is entirely environment-variable driven — clone, set your variables, and deploy.

Bilal Mehmood

Co-founder

Bilal Mehmood is a TkTurners co-founder focused on AI automation, systems integration, and practical operational infrastructure for growing businesses.

Relevant service

See web and mobile development

Explore the service lane

How We Built a Real-Time GPON OLT Dashboard with FastAPI, Next.js, and Railway