

Datacenter Performance. Edge Efficiency. Accelerating Inference, Everywhere.

Emulation: from C Tests to Tapeout - a Case Stud

DVClub World - November 25 2025

Antoine Madec

## Who am I?

- 14 years in simulation & emulation
- Europe and US
- Technical leader, FAE, manager: always hands-on
- Teams from 3 to 12 people
- Block-level / Top-level
- Automotive, Video Decoder, Al
- Axelera Al: accelerating inference









# A case study

- Just our opinion
  - Depends on the project, the needs
  - Might not work for you
- Key figures about Metis
  - 1 year: from Emulation start to chip bringup
  - multi-CPUs
  - 1 day: functional test bringup
    - DDR: 1 week
    - PCle: 1 week





# Verification platform landscape

Where does emulation fit?

- Simulation at top-level is too slow
  - Emulation, models, etc, are a must
  - The right platform for the right tests for the right interfaces
  - Trade-offs are inevitable
    - Speed
    - Realism
    - Debugability
    - Platform bring-up time
    - Price
- It is our job to choose the right platform





# Emulation cloud setup

Why choosing the cloud?

- ICE vs virtual emulation
- Our needs
  - CPU based tests
  - Some PCle tests
  - Flash/mem models
  - UART, gdb, etc
- Conclusion: virtual emulation
  - Simpler, setup the **Veloce** once
  - Cloud: emulation as a service
  - Subscription model





# Building Emulation databases (1)

Getting and modifying RTL sources

- Getting files
  - Easy tool to fetch files is critical
  - Same tool for sim, emu and synthesis
    - Different targets: emulation/simulation/RTL
- Emulation models
  - RAMs/ROMs/OTP
  - Flashes: Siemens Softmodels
  - RNG ring oscillators: \$urandom with XRTL
  - Matrix multiplication: save AVB

## Don't model

- PLL -> drive output clock directly
- Analog sensors -> use simulation
- PHYs (DDR, PCIe) -> Siemens Virtualab
  - Use sim to test PHY programming
  - PHYs aren't real, just models
  - Impact of performance
- Do not be afraid to experiment to speed up your process



# Building Emulation databases (2)

Taking advantage of the Cloud

- 4 comodels and 2 runtime servers
  - Ssh == accessible from our Network
  - Administered by our IT
- Nightly Cl
  - Running on gitlab
  - 10h builds / 2h tests
- Release system
  - Built by the nightly Cl
  - Users: copy/paste in the morning
  - Top platforms with different stubs
    - 3 to 24 boards





## The human factor

Emulation is a cross-team effort

- Love thy EDA consultant
  - Trust: being fair / demanding

#### FW/DV communication is critical

- FW and DV: a gradient in Verification
  - Running C code on CPUs
  - DV can write C driver
  - FW can debug waveforms
- Bringup will run C code on CPUs, not SV
- Weekly meetings, daily chats
- Using the same Git repo
- Other users
  - Architects: perf and compliance tests
  - Application SW
  - DFT



"I see ourselves as one big Verification team composed of people with different skillsets"

- Jovin Langenegger

Engineering Manager - Al Embedded Software



## Runtime (1)

C tests

- Building SW test
  - Bin to hex files: in sim and emu
  - Backdoor loading of the memories
  - Can also load the ELF through GDB
    - slower, more realistic
    - bringup preparation
- Building the TCL runtime script
  - Bash script
    - generates TCL script: templates
    - calls velrun (EDA specific)
  - misc: flash, snapshots (Linux boot), etc.

```
./run --help
./run test_hello_world
./run test_hello_world --clock fast
./run test_hello_world --dump_cpu_instructions
./run test_hello_world --xwaves noc
./run test_hello_world --no_printf
./run test_hello_world --coverage

./run test_hello_world --interactive
```

- Pass/Fail
  - **GPIO[2]**: test is finished
  - GPIO[3]: pass/fail



# Runtime (2)

CI

- MR pipeline
  - Simulation runs
    - Same SW build flow
    - Same top platforms (stubs)
  - Emulation sanity check
    - Cannot run full compilation
- Nightly Cl
  - Back-to-back test
    - Start TCL server when starting emulation
    - Saves overhead of start/finish
    - Test as fast as 5s
  - Gather IO toggle coverage
    - on top instances
    - merge emulation + simulation coverage
    - good integration metric





### Runtime (3)

#### Debug

- Avoid waveforms when possible
  - Too slow
  - Logs: XRTL + \$write()
  - CPU traces
    - Codelink?
    - fw\_trace\_utils
      - EDA agnostic
      - no license
      - Adapt to any CPU
- Xwaves (EDA specific)
  - Great for performance debugging
  - Add relevant signals at compile time
- Switch to simulation
  - For short tests
  - Same platforms, same tools
  - Slower, easier waveform debugging

```
7007264: 97 65 9b 00 auipc al, 0xb6
7007268: 83 b5 45 4a ld al, 0x4a4(al)
|  | // First core: Initialization.
#ifndef NO INIT BSS AND TLS
    init complete = 1;
      while (init complete == 0);
                                                                                                                                                                                                                                                                         70072b0: 03 35 c5 49 ld a0, 0x49c(a0)
70072b4: 02 95 jalr a0
                                                                                                                                                           l main
```



## Conclusion

Key take-aways

- Emulation: top-level crossroad
  - Factorize tools, code and effort
    - Sim/emu, verif/bringup, FW/DV
  - Work with multiple departments
    - Efficiency > politics
    - FW and DV relationship is critical
  - Make it user friendly
    - releases, run scripts
    - Arch, FW, application SW, DFT
- Our successful choices
  - Leverage Cl and Cloud: builds, coverage
  - Know when to rely on EDA tools, when to code it yourself
  - Simulation: close to emulation, complements emulation
  - The right trade-offs: efficiency > realism





## Conclusion

- Emulation is now our main top-level DV platform
  - Metis: bringup functional test in 1 day
  - New projects: more users, more complexity







# Thank You!

