Dr. Ian Cutress Explains The Hype Around RISC-V

Having not watched the video (due to being at work) my take on RISC-V is not that it is a particularly remarkable instruction set. It's probably good enough though.

Really any instruction set can do the job.

The great part about it is that it is license free. Anyone who wants to can use it and design their own CPU around it.

In an ideal world we would have this kind of license free instruction set to foster CPU competition so that big companies can't lock out others with their IP like Intel tried to do with AMD, and successfully did eitj some others when it comes to x86.

Competition is key, and with an open source instruction set where CPU designs from different vendors should be software compatible with each other we wind up with a much more competitive market.

I would love nothing more than to see the death of all proprietary instruction sets.
 
Hype is good 😃 If something can not have hype, probably it does not deserve interest. I don’t see riscv is over hyped.
 
  • Like
Reactions: erek
like this
Having not watched the video (due to being at work) my take on RISC-V is not that it is a particularly remarkable instruction set. It's probably good enough though.

Really any instruction set can do the job.

The great part about it is that it is license free. Anyone who wants to can use it and design their own CPU around it.

In an ideal world we would have this kind of license free instruction set to foster CPU competition so that big companies can't lock out others with their IP like Intel tried to do with AMD, and successfully did eitj some others when it comes to x86.

Competition is key, and with an open source instruction set where CPU designs from different vendors should be software compatible with each other we wind up with a much more competitive market.

I would love nothing more than to see the death of all proprietary instruction sets.
you could run the ISA through GPT4+ a few times to optimize it, right?
 

I mean, there is nothing technically about RISC-V that makes it super special, so in that sense, yea, I guess it is over-hyped.

But having a license free instruction set that anyone can use to foster competition. Now that is a pretty big deal. Especially with all the uncertainty about the future ownership of ARM.
 
thought we was all about hardware enthusiasm on here, so the hype is welcome

risc-v pwns imo

also jim keller and tenstorrent

they hired the Apple M1 architect
 
we've been hearing how its gonna change the world since the mid 90s...
 
Please do not confuse RISCV with RISC. RISC is not an ISA. BTW all modern high performance processors are based on RISC principles underneath. This has been true since... I would guess the 90's? :p
This^^
 
Please do not confuse RISCV with RISC. RISC is not an ISA. BTW all modern high performance processors are based on RISC principles underneath. This has been true since... I would guess the 90's? :p

Yep, x86 hasn't been truly CISC since.. uh.. at least the 80s?

It's RISC underneath breaks down the CISC instructions to the underlying RISC instructions for processing. As such RISC has changed the world, as todays CPU's wouldn't be where they are without it.

RISC just stands for Reduced Instruction Set Computer, as opposed to CISC which stands for Complex Instruction Set Computer. They are not instruction sets of themselves, but they describe loose categories of instruction sets. The original x86 design was a CISC-type design, but for practical purposes has been RISC with a virtual layer for 30 years.

RISC-V is relatively new, with work only having started in 2015. It seeks to be a open ISA based on a reduced instruction set design, and that IS pretty cool, if they can pull it off. The technical challenge is not easy, but it has been done before. The budsiness challenge is the difficult part. They are going to struggle to get actual business behind it and make it get critical mass.

If they do - however - it will be huge for the industry. A common instruction set that anyone can design a CPU using. No more lawsuits over who owns x86 or expensive fees to purchase the rights to use ARM. Just a free binary compatible ISA such that it is interchangeable with one CPU (and presumably compatible motherboard) from one vendor one days, and another from a different one another.

It's a long shot, but if it happens it might just result in the entirety of the compute world being provided the benefit that PC/x86 was inadvertently given when IBM lost control of it in the 1980's.
 
RISC-V began in 2010, went public in 2015, it is still based on RISC, just now is open source. same shit different pile, imo. same hype weve been hearing for 30 years.
 
The main benefits for this I believe will be hardware makers selling cheap microchips for other companies to build upon.

however. an open standard processor would never be faster than current generation hardware.
 
however. an open standard processor would never be faster than current generation hardware.

We are not talking an open standard processor. Just the instruction set.

Essentially the list of commands the software uses to give instructions to the CPU.

On the back end the CPU design/architecture would be proprietary with competition driving advances.

It would be sort of like Intel and AMD competing with each other using the same x86 instruction set, but by using vastly differing architectures to achieve their goals.

The only open thing here would be the instruction set, assuring that they can run the same binaries.
 
The great part about it is that it is license free. Anyone who wants to can use it and design their own CPU around it.
Some dude implemented the more-or-less minimum spec on discrete logic, mostly 7400 series. It's nine small PCBs, runs at 500KHz, IIRC, and has a couple of things for (primitive) I/O.
 
RISC-V began in 2010, went public in 2015, it is still based on RISC, just now is open source. same shit different pile, imo. same hype weve been hearing for 30 years.

RISC is not an instruction set. It is just a term used to describe a type of instruction set with a reduced number of instructions.

x86 is a CISC instruction set (that breaks those CISC instructions down to something resembling RISC for processing internally)

ARM is a RISC instruction set.

MIPS is a RISC instruction set.

Digital's ALPHA ISA was a RISC instruction set.

HP's PA-RISC was a RISC instruction set.

SUN's SPARC ISA was a RISC instruction set.

PowerPC is a RISC instruction set.

They are all different, and not binary compatible, but they are all RISC. All RISC means is that the number of instructions are reduced compared to a classic complex design.

RISC has already changed the world. It's embedded in just about all all the tech you use these days.

RISC-V is just another RISC instruction set, but this time no one owns it. It is open.

It's not the RISC part that is special. Everyone is RISC these days.

It's the part where no one owns it that is special. Anyone can take RISC-V and design a CPU around it, and that has potential industry changing aspects to it if it catches on. There is also a risk of fragmentation killing it since it is customizeable, but that is another topic.
 
Last edited:
The main benefits for this I believe will be hardware makers selling cheap microchips for other companies to build upon.

however. an open standard processor would never be faster than current generation hardware.
It can be massively scaled up though to achieve that high performance

Similar to how there’s 10s of thousands of cuda cores if not more H100
 
RISC-V began in 2010, went public in 2015, it is still based on RISC, just now is open source. same shit different pile, imo. same hype weve been hearing for 30 years.
Ok. So what open source ISA do you recommend instead?
 
  • Like
Reactions: erek
like this
I think we're far too fixated on the instruction set itself. As Zarathustra[H] indicated - even the most convoluted instruction set is just immediately digested into a series of smaller, reduced, instructions.

Sure, if I spun up something today, which didn't require binary compatibility, RISC-V looks great. I'd still have that other problem of making a really good architecture.
 
Last edited:
I think we're far too fixated on the instruction set itself. As Zarathustra[H] indicated - even the most convoluted instruction set is just immediately digested into a series of smaller, reduced, instructions.

Sure, if I spun up something today, which didn't require binary compatibility, RISC-V looks great. I'd still have that other problem of making a really good architecture.
i've been toying around with the idea of a RRSC or Reduced Register Set Computing ISA

"In general, having fewer registers could potentially reduce hardware complexity, save power, and decrease the physical size of the processor, which could be beneficial in certain specialized or low-power contexts. However, it could also lead to increased use of slower memory storage for variables and intermediate results in calculations, which could lead to reduced overall performance.

If the focus is solely on performance, without worrying about physical size, complexity, or power draw, a Reduced Register Set Computing (RRSC) architecture could focus on maximizing the efficient use of registers and minimizing memory latency.

Let's imagine some design features for this scenario:

  1. Smart Register Allocation: This architecture would utilize highly intelligent dynamic register allocation algorithms to optimally allocate the limited register set. These could take into account both immediate and future computational needs to reduce latency and improve throughput.
  2. Pipeline Optimization: Without the constraints of size and power, more complex pipeline stages could be developed to improve parallelism and reduce pipeline hazards, even with a reduced number of registers.
  3. Fast Context Switching: With fewer registers to save and restore during context switches, an RRSC architecture might be able to switch between threads or processes more quickly than traditional architectures.
  4. Custom Instruction Set: The instruction set could be custom-designed around the limited register set, with instructions that combine common sequences of operations into single instructions to reduce the need for intermediate storage.
  5. Hardware-level Multithreading: Given the reduced context switching time, the RRSC architecture could heavily invest in hardware-level multithreading, ensuring that the processor is always working on some task and reducing idle time.
  6. Advanced Prefetching: Implementing advanced prefetching techniques to predictively load data into cache from memory, reducing the latency of memory accesses, which would be more frequent due to the reduced register set."
Reduced Register Set Computing (RRSC) ISA Proposal

Instruction Format:


Let's use a simple 32-bit fixed-length instruction format:

  • OpCode: 6 bits
  • Destination Register: 5 bits
  • Source Register 1: 5 bits
  • Source Register 2 or Immediate: 16 bits
This provides space for 32 unique registers (5 bits), 64 operations (6 bits), and either a second register or a 16-bit immediate value.

Register Specification:

There are 32 registers in total, with a size of 32 bits each:

  • R0 to R31: General purpose registers
Instructions:

Here are a few examples of potential operations:

  • ADD Rdest, Rsrc1, Rsrc2: Add values from Rsrc1 and Rsrc2 and store the result in Rdest.
  • SUB Rdest, Rsrc1, Rsrc2: Subtract value in Rsrc2 from Rsrc1 and store the result in Rdest.
  • AND Rdest, Rsrc1, Rsrc2: Perform bitwise AND on values from Rsrc1 and Rsrc2 and store the result in Rdest.
  • OR Rdest, Rsrc1, Rsrc2: Perform bitwise OR on values from Rsrc1 and Rsrc2 and store the result in Rdest.
  • LD Rdest, address: Load the value at memory address into Rdest.
  • ST Rsrc, address: Store the value from Rsrc at memory address.
  • JMP address: Jump to memory address.
  • BEQ Rsrc1, Rsrc2, offset: Branch to offset from PC if values in Rsrc1 and Rsrc2 are equal.
128-bit Memory Addressable Space:

For a 128-bit memory space, the architecture becomes more complex. Here's a potential solution:

  • OpCode: 6 bits
  • Destination Register: 7 bits
  • Source Register 1: 7 bits
  • Source Register 2 or Immediate: 102 bits
There are 128 registers (R0 to R127), each 128-bits wide. This provides space for 128 unique registers, 64 operations, and either a second register or a 102-bit immediate value.

The load and store instructions would be updated to support 128-bit addresses:

  • LD Rdest, address: Load the 128-bit value at memory address into Rdest.
  • ST Rsrc, address: Store the 128-bit value from Rsrc at memory address.
This system would be able to address a memory space of 2^128 unique locations, but the increased size of the registers and instruction set could lead to significantly increased complexity and power consumption
 
i've been toying around with the idea of a RRSC or Reduced Register Set Computing ISA

I don't think I'd consider 32 GP registers "reduced register set". Many well-known 8-bit CPUs had around 3 general purpose registers, like the 6502.

The end result of RRSC is a zero-register architecture. Instead, everything goes on the stack. IIRC Java's VM and .Net IL are more or less stack-based, not register-based.
 
I don't think I'd consider 32 GP registers "reduced register set". Many well-known 8-bit CPUs had around 3 general purpose registers, like the 6502.

The end result of RRSC is a zero-register architecture. Instead, everything goes on the stack. IIRC Java's VM and .Net IL are more or less stack-based, not register-based.
Hmm, fascinating, so about this 1_rick

an ISA for this kind of RRSC architecture:

Instruction Set Architecture:

Instructions would operate directly on the stack, with operands being popped from the stack and results pushed onto it. Unlike JVM or .NET, this would be a hardware level stack and not a virtual one. This could allow the stack to operate faster than traditional memory.

Here's an outline of how a few common operations might look:

  • PUSH value: Push a value onto the stack.
  • POP: Pop a value from the top of the stack and discard it.
  • ADD: Pop two values from the stack, add them, and push the result.
  • SUB: Pop two values from the stack, subtract the second from the first, and push the result.
  • MUL: Pop two values from the stack, multiply them, and push the result.
  • DIV: Pop two values from the stack, divide the first by the second, and push the result.
  • JMP address: Jump to a given memory address.
  • BEQ address: Pop two values from the stack, if they are equal, jump to the given memory address.
Memory:

In addition to the main stack, this architecture would need some form of random-access memory for storing and retrieving data that's not currently being operated on.

A stack-based architecture like this could have some interesting advantages. It could potentially be simpler and more compact than traditional architectures, and it could be very efficient for executing certain kinds of code. However, it would likely face significant challenges in terms of performance and compatibility with existing software, which is mostly designed around register-based architectures.
 
I don't think I'd consider 32 GP registers "reduced register set". Many well-known 8-bit CPUs had around 3 general purpose registers, like the 6502.

The end result of RRSC is a zero-register architecture. Instead, everything goes on the stack. IIRC Java's VM and .Net IL are more or less stack-based, not register-based.
Forth is a stack-based language that lends itself well to stack-based microcontrollers. See the Forth based J1 or the RTX2010 (rad hardened).
 
Hmm, fascinating, so about this 1_rick

an ISA for this kind of RRSC architecture:

Instruction Set Architecture:

Instructions would operate directly on the stack, with operands being popped from the stack and results pushed onto it. Unlike JVM or .NET, this would be a hardware level stack and not a virtual one. This could allow the stack to operate faster than traditional memory.

Here's an outline of how a few common operations might look:

  • PUSH value: Push a value onto the stack.
  • POP: Pop a value from the top of the stack and discard it.
  • ADD: Pop two values from the stack, add them, and push the result.
  • SUB: Pop two values from the stack, subtract the second from the first, and push the result.
  • MUL: Pop two values from the stack, multiply them, and push the result.
  • DIV: Pop two values from the stack, divide the first by the second, and push the result.
  • JMP address: Jump to a given memory address.
  • BEQ address: Pop two values from the stack, if they are equal, jump to the given memory address.
Memory:

In addition to the main stack, this architecture would need some form of random-access memory for storing and retrieving data that's not currently being operated on.

A stack-based architecture like this could have some interesting advantages. It could potentially be simpler and more compact than traditional architectures, and it could be very efficient for executing certain kinds of code. However, it would likely face significant challenges in terms of performance and compatibility with existing software, which is mostly designed around register-based architectures.
That sounds rather like the Mill conceptually. What am I missing?
 
  • Like
Reactions: erek
like this
Mill? What do you mean
https://millcomputing.com/

Here's a bit of talk about parts of the project: https://millcomputing.com/docs/ "The Belt is the data interchange mechanism for the Mill general purpose CPU architecture, replacing the general registers of other architectures. The Mill’s belt is unique both in its programming model and its implementation at the micro-architecture level. Destination addressing is implicit, yielding more compact instruction encoding. The Belt is integrated with the function call mechanism; it eliminates caller/callee save conventions and callee pre-/postlude instructions, and it supports multi-result calls naturally. The Belt is Single-assignment, so rename registers and pipeline phases are unnecessary."
 
https://millcomputing.com/

Here's a bit of talk about parts of the project: https://millcomputing.com/docs/ "The Belt is the data interchange mechanism for the Mill general purpose CPU architecture, replacing the general registers of other architectures. The Mill’s belt is unique both in its programming model and its implementation at the micro-architecture level. Destination addressing is implicit, yielding more compact instruction encoding. The Belt is integrated with the function call mechanism; it eliminates caller/callee save conventions and callee pre-/postlude instructions, and it supports multi-result calls naturally. The Belt is Single-assignment, so rename registers and pipeline phases are unnecessary."
Oh, this is really cool

Data Interchange like associating to a Beltway in the city?

Is The Belt like a Ring Bus that ATi used?

—-
“the "Belt" is a key feature of the Mill architecture. In traditional CPU architectures, registers are used to hold temporary data during computations. This can be a source of inefficiency, as managing these registers and ensuring they contain the correct values at the right time can add overhead. The "Belt" replaces these registers with a stream of values. When an operation is performed, the result is placed on the "Belt," and it "shifts" the other values down, with the oldest value "falling off" the end.

Comparing this to ATI's X1900 series ring bus, both can be considered as methods of managing data flow, but they serve very different purposes and operate in different contexts:

  • The "Belt" in the Mill architecture deals with CPU data handling, aiming to streamline the execution of instructions by eliminating the need for explicit destination addressing and eliminating caller/callee save conventions, among other benefits.
  • The ring bus in the ATI X1900 series, on the other hand, was used as a memory controller within a GPU to efficiently distribute data across various components.
So while they both concern data flow management in their respective hardware, the similarities largely end there. They are designed to tackle different problems in different hardware (CPU vs GPU), and they operate based on different principles. The Mill "Belt" is more akin to a radical redesign of the CPU register concept, while the ring bus is a specific type of memory controller design for GPUs.”
 
Is The Belt like a Ring Bus that ATi used?
I'm not clear on that. I have a video that talks about the Belt cued up but it's 90 minutes so I haven't watched it yet.
 
  • Like
Reactions: erek
like this
Back
Top