3-stage MCU generation teardown · Qwen3.6-27B base vs SFT-top2 adapter · skywater130 PDK
Both models were prompted with the same 3-stage pipelined MCU contract (4-op ISA, 4 registers, F/X/W pipeline, sky130). Each produced 8 candidates at temperature 0.5. Each candidate was compiled with Verilator, run against 4 test programs, and synthesized with yosys + abc against the sky130_fd_sc_hd typical-corner library. Reported metrics are line items, not aggregate scores.
| metric | base 27B | SFT-top2 27B | hand-written ref |
|---|---|---|---|
| candidates generated | 8 | 8 | 1 |
| compiled (Verilator) | 0/8 | 1/8 | 1/1 |
| programs passed (of compiled) | — | 2/4 | 3/4 |
| area_um2 (median) | — | 1573.0min 1573.0 · max 1573.0 | 1126.0min 1126.0 · max 1126.0 |
| fmax_mhz (median) | — | 2074.2min 2074.2 · max 2074.2 | 1183.1min 1183.1 · max 1183.1 |
| ops_per_sec (median) | — | 592600000.0min 592600000.0 · max 592600000.0 | 591600000.0min 591600000.0 · max 591600000.0 |
| est_power_uW (median) | — | 580.8min 580.8 · max 580.8 | 373.6min 373.6 · max 373.6 |
| name | source | compiled | tests pass | area um² | fmax MHz | ops/sec | est power µW | first diagnostic |
|---|---|---|---|---|---|---|---|---|
| reference | reference | yes | 3/4 | 1126 | 1183.1 | 5.916e+08 | 373.55 | |
| base_0 | base | no | 0/0 | — | — | — | — | SYSTEMVERILOG `int` declaration in a for-loop or task. Yosys (default Verilog-2005) rejects this; we use --sv to accept it. |
| base_1 | base | no | 0/0 | — | — | — | — | ILLEGAL LITERAL `8's0`: per IEEE 1800-2017 §5.7.1, the unsized literals `'0`/`'1`/`'x`/`'z` cannot take a size prefix. Use `8'sd0` instead. Frequent LLM confusion between sized literals and the unsized constant-context literals. |
| base_2 | base | no | 0/0 | — | — | — | — | |
| base_3 | base | no | 0/0 | — | — | — | — | |
| base_4 | base | no | 0/0 | — | — | — | — | |
| base_5 | base | no | 0/0 | — | — | — | — | SYSTEMVERILOG `int` declaration in a for-loop or task. Yosys (default Verilog-2005) rejects this; we use --sv to accept it. |
| base_6 | base | no | 0/0 | — | — | — | — | |
| base_7 | base | no | 0/0 | — | — | — | — | |
| sft_0 | sft | no | 0/0 | — | — | — | — | |
| sft_1 | sft | no | 0/0 | — | — | — | — | BYTE-ORDER BUG: descending slice `[255 - pc*8 -: 8]` on an ascending `[0:255]` vector reads from the wrong end — every byte is the trailing padding byte (0x00). Should be `[pc*8 +: 8]` for the byte-0-at-MSB convention. |
| sft_2 | sft | no | 0/0 | — | — | — | — | BYTE-ORDER BUG: descending slice `[255 - pc*8 -: 8]` on an ascending `[0:255]` vector reads from the wrong end — every byte is the trailing padding byte (0x00). Should be `[pc*8 +: 8]` for the byte-0-at-MSB convention. |
| sft_3 | sft | no | 0/0 | — | — | — | — | SYSTEMVERILOG `int` declaration in a for-loop or task. Yosys (default Verilog-2005) rejects this; we use --sv to accept it. |
| sft_4 | sft | no | 0/0 | — | — | — | — | BYTE-ORDER BUG: descending slice `[255 - pc*8 -: 8]` on an ascending `[0:255]` vector reads from the wrong end — every byte is the trailing padding byte (0x00). Should be `[pc*8 +: 8]` for the byte-0-at-MSB convention. |
| sft_5 | sft | no | 0/0 | — | — | — | — | ILLEGAL LITERAL `8's0`: per IEEE 1800-2017 §5.7.1, the unsized literals `'0`/`'1`/`'x`/`'z` cannot take a size prefix. Use `8'sd0` instead. Frequent LLM confusion between sized literals and the unsized constant-context literals. |
| sft_6 | sft | no | 0/0 | — | — | — | — | SYSTEMVERILOG `int` declaration in a for-loop or task. Yosys (default Verilog-2005) rejects this; we use --sv to accept it. |
| sft_7 | sft | yes | 2/4 | 1573 | 2074.2 | 5.926e+08 | 580.75 |
The SFT-top2 adapter was trained on 2 high-throughput 4×4 int8 matmul implementations (~13 GOPS each on sky130). It was not exposed to any MCU code. The point of this experiment was to measure how that narrow specialization affects an out-of-distribution task.
8's0 sized-literal-with-unsized-prefix (illegal per IEEE 1800-2017 §5.7.1), and (b) SystemVerilog '{} array-pattern syntax used with bare {}, or used inside Verilog-2005 contexts that don't accept it.sft_7), and it passes 2 of 4 test programs. The remainder fail on missing variable declarations (is_branch_w referenced but never declared) and the same SV pattern-assignment confusion.pc_f, instr_x, result_w) and forwarding logic comments. It does NOT collapse the MCU into matmul-shaped output. The bias is at the token level, not at the architectural level — which is the cleanest possible evidence that LoRA r=16 on attention projections is a localized intervention.IMEM_INIT parameter is declared [0:255] (ascending, byte 0 at MSB) but every candidate that attempted byte access used the descending idiom [255 - pc*8 -: 8] — so the model reads from the LSB end of the vector and gets only trailing padding. This is the same mistake the experiment author made first. Ascending packed-vector parameters are rare in real Verilog, so neither model has strong priors for them. The dashboard's BYTE-ORDER BUG diagnostic flags this.sft_7: 1573 µm², 2074 MHz (abc-stime estimate; final P&R would land lower), 593 MOPS on a 2-instruction test. For comparison the hand-written reference is 1126 µm², 1183 MHz, 592 MOPS. The LLM's compiled candidate is ~40% larger and reports a higher Fmax estimate because abc's pre-floorplan timing doesn't include wireload — neither figure is signoff-grade but both come from the same pipeline so they're directly comparable.