Nearly Doubling My CPU’s Clock Speed by Removing Complexity

Over the past few weeks, I’ve been grinding on timing closure for my custom dual-issue CPU core (written in SystemVerilog for a Lattice FPGA).
Today, something huge happened: I improved the max clock frequency from ~25.26 MHz to 44.92 MHz.

And the wild part?

I did it by removing logic, not adding more clever tricks.

What Changed?

My instruction fetch stage originally had a pretty complex alignment system.
It was designed to handle instruction boundaries and speculative fetch behavior, but after reviewing the ISA and memory model, I realized:

I didn’t need alignment logic at all.

It was dead weight sitting directly on the critical path.

So I ripped it out.

The new fetch path is dramatically simpler, cleaner, and easier to pipeline.

The Timing Report Before

Here’s what the project was hitting earlier:

Max frequency: ~25.26 MHz
Critical path: Fetch → Alignment → Decode

The alignment logic had multiple nested layers, case trees, and shifting/merging operations.
On an FPGA LUT fabric, that’s basically a death sentence for timing.

The Timing Report After

After removing the alignment machinery, here’s what I saw:

ERROR: Max frequency for clock '$glbnet$clk$TRELLIS_IO_IN': 44.92 MHz (FAIL at 50.00 MHz)

I almost doubled the frequency while staying in the same 5-stage pipeline structure.

This is the closest I’ve ever been to hitting my target of 200 MHz on this core.

What I Learned

This was a powerful reminder:

Performance often comes from simplification, not complexity.
Long combinational chains are the true enemy of Fmax.
FPGA design rewards clarity over cleverness.
Sometimes stepping back and questioning assumptions wins harder than optimization.

As a 16-year-old building a dual-issue CPU from scratch, seeing a jump like this is insanely motivating.

What’s Next?

Now that fetch is no longer dominating the critical path, I’m turning my attention to:

The decode → issue boundary
Hazard logic delays
Register file fanout
Better pipelining between ID and EX

Screenshots, timing plots, and waveform captures coming soon.

If you have suggestions for improving Fmax further, especially around dual-issue decode stages, feel free to reach out.

Hello World: Welcome to My New Blog

A quick introduction to my new site built with Astro and Tailwind CSS.

Nearly Doubling My CPU's Clock Speed by Removing Complexity