h/t RedGamingTech
It is here: multi-GCD chip. The holy grail of GPU
https://www.freepatentsonline.com/y2023/0376318.html#google_vignette
DISTRIBUTED GEOMETRY
In one implementation, chiplets (labelled) 320 A to 320 N are coupled to (a single) index buffer (labelled) 360 stored in memory (labelled) 350 via communication link (labelled) 310
In one implementation, when a draw call is initiated on GPU 300, chiplets 320A-N are notified of the draw call and the location and size of index buffer 360 which includes the indices corresponding to one or more graphics objects of the draw call. Index buffer 360 can be stored in any number and type of memory devices accessible by chiplets 320A-N. In one implementation, index buffer 360 includes a list of pointers to vertices of graphics primitives that make up the graphics object(s). The graphics primitives can be, but are not limited to, points, lines, triangles, rectangles, patches, and so on.
In response to receiving the notification of the initiation of the draw call, each chiplet 320A-N calculates which indices to process from index buffer 360. Then each chiplet 320A-N fetches and processes indices independently and in parallel with other chiplets 320A-N fetching and processing their corresponding portions of index buffer 360. In one implementation, the chiplets 320A-N fetch indices a primitive group at a time in a round-robin fashion, resulting in an interleaving arrangement of portions of indices of index buffer 360 mapped to chiplets 320A-N. This allows chiplets 320A-N to process different portions of a draw call independently and in parallel with each other. This distributed geometry processing scheme relies on each chiplet 320A-N determining which portion(s) of the draw call to process without relying on a central distributor of work that dispenses work to the chiplets 320A-N. In other words, each chiplet knows where in index buffer 360 the previous chiplet left off and where the next chiplet will pick up again.
Referring now to FIG. 5, a block diagram of another implementation of a chiplet GPU 500 is shown. As shown in FIG. 5, chiplet GPU 500 includes chiplets 510A-N which are representative of any number of chiplets. In one implementation, in order to keep chiplets 510A-N in synchronization when processing draw calls, chiplets 510A-N utilize a state management scheme. For example, in this implementation, for a given draw call, each command processor 520A-N generates a state ID 560A-P, respectively, for each corresponding pipeline. The pipeline refers to the various graphics processing stages implemented by each chiplet 510A-N.
It should be emphasized that the above-described implementations are only non-limiting examples of implementations. Numerous variations and modifications will become apparent to those skilled in the art once the above disclosure is fully appreciated. It is intended that the following claims be interpreted to embrace all such variations and modifications.
It is here: multi-GCD chip. The holy grail of GPU
https://www.freepatentsonline.com/y2023/0376318.html#google_vignette
DISTRIBUTED GEOMETRY
In one implementation, chiplets (labelled) 320 A to 320 N are coupled to (a single) index buffer (labelled) 360 stored in memory (labelled) 350 via communication link (labelled) 310
In one implementation, when a draw call is initiated on GPU 300, chiplets 320A-N are notified of the draw call and the location and size of index buffer 360 which includes the indices corresponding to one or more graphics objects of the draw call. Index buffer 360 can be stored in any number and type of memory devices accessible by chiplets 320A-N. In one implementation, index buffer 360 includes a list of pointers to vertices of graphics primitives that make up the graphics object(s). The graphics primitives can be, but are not limited to, points, lines, triangles, rectangles, patches, and so on.
In response to receiving the notification of the initiation of the draw call, each chiplet 320A-N calculates which indices to process from index buffer 360. Then each chiplet 320A-N fetches and processes indices independently and in parallel with other chiplets 320A-N fetching and processing their corresponding portions of index buffer 360. In one implementation, the chiplets 320A-N fetch indices a primitive group at a time in a round-robin fashion, resulting in an interleaving arrangement of portions of indices of index buffer 360 mapped to chiplets 320A-N. This allows chiplets 320A-N to process different portions of a draw call independently and in parallel with each other. This distributed geometry processing scheme relies on each chiplet 320A-N determining which portion(s) of the draw call to process without relying on a central distributor of work that dispenses work to the chiplets 320A-N. In other words, each chiplet knows where in index buffer 360 the previous chiplet left off and where the next chiplet will pick up again.
Referring now to FIG. 5, a block diagram of another implementation of a chiplet GPU 500 is shown. As shown in FIG. 5, chiplet GPU 500 includes chiplets 510A-N which are representative of any number of chiplets. In one implementation, in order to keep chiplets 510A-N in synchronization when processing draw calls, chiplets 510A-N utilize a state management scheme. For example, in this implementation, for a given draw call, each command processor 520A-N generates a state ID 560A-P, respectively, for each corresponding pipeline. The pipeline refers to the various graphics processing stages implemented by each chiplet 510A-N.
It should be emphasized that the above-described implementations are only non-limiting examples of implementations. Numerous variations and modifications will become apparent to those skilled in the art once the above disclosure is fully appreciated. It is intended that the following claims be interpreted to embrace all such variations and modifications.