• @BorgDrone@lemmy.one
    link
    fedilink
    81 year ago

    The Last of Us Part 1 is another example. We know it should run better on our hardware (…) because we have already seen the original game run far better on less capable hardware.

    You cannot directly compare PC specs with those of a console. TLoU was made by Naughty Dog who are well known for squeezing absurd amounts or performance out of console hardware. The way to do this by leveraging a platforms specific strong points. The engine is very likely designed around the strengths of the console’s hardware.

    PCs have a different architecture from consoles, with different trade-offs. For example: PCs are designed to be modular. You can replace graphics cards, processors, RAM, etc. This comes at a cost. One such cost is that a PC GPU has to have it’s own discrete RAM. There is a performance penalty to this. On a console things can be much more tightly integrated. I/O on a PS5 is a good example. It’s not just a fast SSD, it’s also a storage controller with more priority levels, it’s also a storage controller that interfaces directly with the GPU cache, etc.

    • ono
      link
      fedilink
      English
      101 year ago

      Sigh… You conveniently deleted important parts of my comment, such as “at least with low-graphics settings” and “adjust for a few years of hardware inflation”, and completely ignored the fact that I am talking about cases of abnormally bad performance compared to entire categories of games. The straw man you’re arguing against is not what I wrote.

      • @BorgDrone@lemmy.one
        link
        fedilink
        41 year ago

        You conveniently deleted important parts of my comment, such as “at least with low-graphics settings” and “adjust for a few years of hardware inflation”,

        No, that just supports my theory. Graphics settings usually scale really well, that’s the reason they are adjustable by the end-user in the first place. Those should not cause any of the issues you are talking about. The problems lie in parts that take advantage of certain architectural differences.

        A hypothetical example that highlights a real architectural difference between consoles and PCs:

        Say you have a large chunk of data and you need to perform some kind of operation on all this data. Say, adjust the contents of buffer A based on the contents of buffer B. It’s all pretty much the same: read some data from A and B, perform some operation on it, write back the results to A. Just for millions of data points. There are many things you could be doing that follow such a pattern. You know who’s really good at doing a similar operation millions of times? The GPU! It was made specifically to perform such operations. So as a smart console game developer you decide to leverage the GPU for this task instead of doing it on the CPU. You write a small compute kernel, some lines in your CPU code to invoke it. Boom, super fast operation.

        Now imagine you’re tasked with porting this code to the PC. Now, suddenly this super fast operation is dog slow. Why? Because it’s data generated by the CPU, and the result is needed by the CPU. The console developer was just using the GPU for this one operation that’s part of a larger piece of code to take advantage of the parallel performance of the GPU. On PC, however, this won’t fly. The GPU cannot access this data because it’s on a separate card with it’s own RAM. The only way to get to the CPU is through the (relatively slow) PCIe bus. So now you have to copy the data to the GPU, perform the operation, and then copy the data back to system RAM. All over the limited bandwidth of the PCIe bus, that’s already being used for graphics-related tasks as well. On a console this is completely free, the GPU and CPU share the same memory so handing data back and forth is a zero-cost operation. On PC this may take so much time that it’s actually faster to do on the CPU, even though the CPU takes much more time to perform the operation, simply to avoid the overhead of copying the data back and forth.

        If an engine uses such an optimisation this will never run well on the PC, regardless of how fast your GPU is. You’d need a lot of years of ‘hardware inflation’ before either doing it on the CPU or doing it on the GPU + 2 times the copy overhead is faster than just doing it on the GPU of the console with zero overhead.

        In fact, things like this is why Apple moved away from dedicated GPUs in favour of a unified memory model. If you design your engine around such an architecture you can reach impressive performance gains. A good example of this is how Affinity Photo designed their app around the ‘ideal GPU’ that didn’t exist yet at the time, but which they were expecting to in the future. One with unified memory. When Apple finally released it’s M-series SoCs they finally had a GPU architecture that matched their predictions and when benchmarked with their code the M1 Max beat the crap out of a $6000 AMD Radeon Pro W6900X. Note that the AMD part is still much faster if you measure raw performance, it’s just that the system architecture doesn’t allow you to leverage that power in this particular use-case.

        It’s not just how fast the individual components are, it’s how well the are integrated and with a modular system like a PC this is always going to cause a performance bottleneck.