What are the L1 caches of typical pipeline stages?

More Hugo Décharnes's questions See All

How can i generate a CRISPR knockin mutation zebrafish model with a reporter?

Hey! I aim to generate a transgenic knockin zebrafish line that mimetizes a genetic condtition that leads to a certain disease on human. To do so, I need to insert a codon for mutagenic aminoacid...

14 July 2024 6,240 0 View

Path analysis: Do I control for covariate effects on all endogenous variables (and mediator) or just DV?

I did research and the studies suggest controlling for the effects of covariates on all endogenous variables, including mediator variables, to improve model accuracy and reliability. I want to...

04 June 2024 1,914 3 View

Literature: how do energy interests use energy crises?

I am looking for literature on how different interests (e.g. energy companies, governments, political parties, industries, business associatons, etc) use energy crises and concerns over energy...

31 January 2024 4,248 0 View

Tol2 - Is it normal to lose the transgene expression after a few days?

I'm using a transposon tol2 vector (pT2AL200R150G - 10.1534/genetics.106.060244) to create transgenic zebrafish line. We first inserted our gene of interest (GOI) into the BamHI site of the...

21 November 2023 9,618 1 View

Where can I find the raw data for S-N curve of steels?

Dear fellow researchers, I would like to know where I can find the editable/raw data for a set of points of a S-N curve for a high strenght steel, like a dual phase. I can only find the S-N...

19 November 2023 8,866 3 View

[Microstrip Combline Filter Design] - How do I choose the resonator's admittance, Y_{ai}, for optimal Q?

Matthaei (et al) writes that a common choice is Y_{ai} = 1/70; I've seen ranges from 1/60 to 1/100. Please could someone provide some intuition on the choice for Y_{ai} in Matthaei's design...

09 November 2023 5,741 1 View

How do I calculate the even and odd mode impedances of coupled microstrip lines in CST?

I'm trying to determine the spacing between resonators for a planar bandpass combline filter, such that the even and odd mode impedances match those that I've calculated as part of the filter...

05 August 2023 3,005 3 View

Can anyone help me with this CST simulation?

I'm trying to design a tunable microstrip combline filter with its initial f0 at 2.45GHz, with my resonator's electrical length at 45 degrees to provide maximum tuning range. I'm having problems...

02 August 2023 8,614 5 View

Why white precipitate formed in my BAP stock solution?

Hi, there is white precipitate formed in my stock solution of BAP? Why is this happened and should I use the solution?

25 July 2023 4,326 0 View

Label free quantification directly in proteome discoverer?

I have some shotgun data (MS spectra) that I analyzed in Proteome discoverer implementing the Minora node (which I understand is used to do label-free quantification). However, I have read so many...

04 June 2023 4,332 1 View

Difficulty with permittivitt and Magnetic Permeability Calculations?

Difficulty with permittivitt and Magnetic Permeability Calculations Hello everyone, I have all the parameters related to the calculations of the permittivitty and magnetic permeability...

30 July 2024 5,206 1 View

How to use Desmond in HPC ?

Our department has recently acquired an HPC (High-Performance Computing) system, and I'm thrilled to take my molecular dynamics calculations to the next level using Desmond. I used to run my...

28 July 2024 6,553 1 View

All math can be explained by iterator of code?

all math can be traversed by code? all math can be translate to code?

26 July 2024 9,530 0 View

Cuáles fueron las tendencias en investigaciones en arquitectura, urbanismo y patrimonio edificado en decadas del 2000 al 2020?

Cuáles fueron las tendencias en investigaciones en arquitectura, urbanismo y patrimonio edificado en decadas del 2000 al 2020? Porque requiero conocer tesis de posgrado nivel maestría...

24 July 2024 5,494 1 View

What is human-computer interaction (HCI)?

22 July 2024 10,056 2 View

Which are the Scopus Indexed Journals in Computer Science with short review time?

Hello, I am looking out for Scopus Indexed Journals in Computer Science with short review time and short time to publish after acceptance (with / without APC). Please mention the journals that you...

19 July 2024 4,250 2 View

How can I download an article to my computer?

I have tried sharing, but it only provide a list of persons that does not include me. When I click on the download button, it does not seem to download it to my computer. Thank you

19 July 2024 1,814 3 View

How to extract binding energy from pv.maegz file without using Schrodinger?

I have conducted virtual screening using Schrödinger on a database of 17,000 molecules. Unfortunately, I cannot use the system with the Schrödinger license at the moment. I am trying to find a way...

18 July 2024 2,881 4 View

How can I extract the mathematical equation from existing Neural Network Model?

There exists a neural network model designed to predict a specific output, detailed in a published article. The model comprises 14 inputs, each normalized with minimum and maximum parameters...

14 July 2024 2,714 3 View

I came across oscillations in a pressure profile for a pipeline flow along the axis of a cylinder, how do I prove that these are not numerical err?

In terms of CFD, we often analyze the stability of the error using Von-Neumann analysis, especially for FDM based problems. Should we follow the same approach for a compressible fluid flow using FVM ?

13 July 2024 6,296 5 View

C.P. Ravikumar

Pipeline stages are not really related to caches. You can have more stages in the instruction pipeline of a CPU - fetch instructiion, decode instruction, compute address of operand1, fetch operand1, compute address of operand 2, fetch operand 2, execute instruction, ...

Caches store the instructions and data for the CPU so that these steps can happen at a very high speed.

Hugo Décharnes

@C.P. Ravikumar : That's not my question. I wanted to know what do the four pipeline stages – dedicated to cache access – do in detail (known as EXE, DC1, DC2 and DC3).

Vladimir Stankovic

It can be seen from the picture that the fourth cycle is for Write Back. I cannot tell for sure without "the whole picture" (you attached only the picture for the L1 pipeline stages), but the write is not necessary if you only read from the cache, for example when you use the cache in the instruction fetch stage, or when you have instructions that read from the memory (like Load). If, on the other hand, you have instructions like Store, which writes to the memory, than that fourth cycle is spent for the write. Again, I am not sure this answer is complete (or at all correct), but I hope I helped you a little bit...

@Nikolaos Alachiotis: Write-back is not taken into account in the four stages.

@Vladimir Stankovic: Write-back is not taken into account in the four cycles. Moreover, these stages correspond to load accesses.

Hm... I'm confused... First of all, the fourth cycle IS dedicated to the write back - as shown in the picture (the 1st cycle is for EXE, the 2nd and 3rd for DC1 and DC2 respectively, and the 4th for the write-back). But I don't understand the purpose of the write-back. In case of a TLB miss, the new item is read from the next level of cache/memory, and then written back into the TLB, but I am not sure that it can be all done in a couple of cycles (the TLB read in the 2nd cycle and the write back in the 4th cycle)...

During the write-back (denoted as WB) stage, datas are driven to the register file and the forwarding logic.

TLB misses are handled by a L2 TLB in high-end CPUs. The miss penalty is 8 cycles in Haswell.

If the write-back is for the register file, then it indeed is not part of the L1 pipeline stage. I misunderstood your question, sorry... You attached a file for a 3-cycle L1 cache, and ask about the 4-cycle caches, for which you don't have a figure. I was trying to give a logical explanation, based on my general knowledge, and you obviously need someone with knowledge about this specific question. Hope you find someone... I can continue with "loud thinking" about the possible answers, if you think that might be helpful to find out the answer?

As I said, write-back is not taken into account in the four cycles. It is shown for information purposes only.

Low-end CPUs – such as Silvermont – have a 3-cycle data cache pipeline (c.f. my picture: EXE, DC1 and DC2). High-end CPUs have a 4-cycle one (EXE, DC1, DC2, DC3). My question is: why four cycles?

I can only wild-guess... It could be that high-end CPUs cannot do all the necessary "work" in just 3 cycles, due to their operating frequency, so they need one more cycle. Therefore, DC1, DC2 and DC3 in high-end CPUs do exactly the same as DC1 and DC2 in low-end CPUs, only in 3 cycles instead of 2. But that is just a thought, I have no idea if that is true. You did not answer me - do you want me (and the other people here) to write our thoughts about what could be the answer or you specifically ask if someone knows for sure the answer to your question.

I'm looking for someone who knows for sure the answer.

Alberto Ros

Hi Hugo,

Did you find the answer to that question? I am also interested on it.

Also, in your picture I can see that the cache miss/hit is know in the last stage (cycle 3). Do you know if in a 4-cycle pipelined cache the miss/hit info is obtained in the last stage?

Thanks,

Alberto

The diagram I attached seems to represent the typical pipeline of an L1 cache. A fourth cycle can be added because of:

– the presence of multiplexing/demultiplexing circuits in a multibank cache;

– the absence of load-hit speculation, which would explain why the load latency in the Intel's cores is 4-5 cycles (c.f. 7-cpu.com), although this seems unlikely.

The addition of a fourth cycle does not necessarily implies delaying the hit/miss information.

Thanks a lot! Very interesting.

In the book by Antonio González et al. "Processor Microarchitecture: An implementation perspective", Chapter 2, they mention the 4 states of a pipelined cache: Address calculation, disambiguation (decoder), cache access (parallel tag and data), and result drive (aligner).