Why is shifting a register faster than multiplying




















Improve this question. Kelu Thatsall Kelu Thatsall 2, 1 1 gold badge 19 19 silver badges 45 45 bronze badges. The answer to the first part of the question depends on the processor.

Why not leave that decision to the compiler, which knows the target architecture? Will the compiler do that always in the fastest way? He is not saying that he is optimizing code, try and get your limited mind around the fact that some people are simply curious and want to learn, you know, for the sake of learning. I suppose I had that coming I perfectly understand wanting to learn. The only workable approach is to 1 understand as many of the things involved that means everything , from theoretical CS down machine code and logic gates and their interactions, and then 2 use logic and the scientific method to derive answers for specific circumstances.

You're asking the wrong questions. Show 3 more comments. Active Oldest Votes. But you can simply view the dis-assembly of each option, and see for yourself.

BTW, you might get a different result if you enable optimization mine was disabled Improve this answer. Added this note: you might get a different result if you enable optimization — barak manos. Add a comment. But again, let the compiler do this. It's his job. Thanks, I wondered if there is some algorithm to find those.

But if you say compiler knows best then I'll just leave it to him ; — Kelu Thatsall. KeluThatsall: Regarding an algorithm see the edit. It's a lot more complex than what you can possibly gain.

Pandrei Pandrei 4, 3 3 gold badges 23 23 silver badges 42 42 bronze badges. The logic of summary point 1 is faulty. Multiple instruction are not necessarily slower than a single instruction that does something completely different. Different instructions can have radically different latency, throughput, and pipeline behavior. As an extreme example, consider register moves virtually free and reading from RAM with a cache miss hundreds of cycles of latency, and probably lower throughput too.

I did not want to overcomplicate things and get into a lot of architecture details BUT you can safely assume that the same premises are true for both cases; i. But give the same premises when you compare apple to apple ; on instruction executes faster then multiple instructions no matter what the architecture. Sign up or log in Sign up using Google. Sign up using Facebook. Sign up using Email and Password. The upshot then is that on recent Intel architectures, shift by a variable amount takes three "micro-operations" while most other simple operations add, bitwise ops, even multiplication only take 1.

Such shifts can execute at most once every 2 cycles. The trend in modern desktop and laptop hardware is to make multiplication a fast operation.

On recent Intel and AMD chips, in fact, one multiplication can be issued every cycle we call this reciprocal throughput. The latency , however, of a multiplication is 3 cycles. So that means you get the result of any given multiplication 3 cycles after you start it, but you are able to start a new multiplication every cycle.

Which value 1 cycle or 3 cycles is more important depends on the structure of your algorithm. If the multiplication is part of a critical dependency chain, the latency is important. If not, the reciprocal throughput or other factors may be more important.

They key takeaway is that on modern laptop chips or better , multiplication is a fast operation, and likely to be faster than the 3 or 4 instruction sequence that a compiler would issue to "get the rounding" right for strength reduced shifts. For variable shifts, on Intel, multiplication would also generally be preferred due to the above-mentioned issues. On smaller form-factor platforms, multiplication may still be slower, since building a full and fast bit or especially bit multiplier takes a lot of transistors and power.

If someone can fill in with details of the performance of multiply on recent mobile chips it would be much appreciated. Divide is both a more complex operation, hardware-wise, than multiplication and is also much less common in actual code - meaning that fewer resources are likely allocated to it.

The trend in modern chips is still towards faster dividers, but even modern top-of-the-line chips take cycles to do a divide, and they are only partially pipelined. In general, bit divides are even slower than bit divides.

Unlike most other operations, division may take a variable number of cycles depending on the arguments. Avoid divides and replace with shifts or let the compiler do it, but you may need to check the assembly if you can! That is if you have any binary number and need to bitshift by N, all you have to do is shift the digits over that many spaces and replace with zeros.

Binary multiplication is in general more complicated , though techniques like Dadda multiplier make it quite fast. By looking at disassembled byte code python apparently doesn't do this:. Sign up to join this community. The best answers are voted up and rise to the top.

Stack Overflow for Teams — Collaborate and share knowledge with a private group. Create a free Team What is Teams? Learn more. Asked 7 years, 7 months ago. Active 5 years ago.

Viewed 33k times. Improve this question. Crizly Crizly 1 1 gold badge 2 2 silver badges 8 8 bronze badges. Bit shift is faster in all languages, not just Python. Many processors have a native bit shift instruction that will accomplish it in one or two clock cycles. It should be kept in mind, however, that bitshifting, instead of using the normal division and multiplication operators, is generally bad practice, and can hinder readability.

There are exceptions to this, such as when code is extremely performance critical, but most of the time all you're doing is obfuscating your code. Crizly: Any compiler with a decent optimizer will recognize multiplications and divisions that can be done with bit shifts and generate code that uses them.

Don't ugly up your code trying to outsmart the compiler. In this question on StackOverflow a microbenchmark found slightly better performance in Python 3 for multiplication by 2 than for an equivalent left shift, for small enough numbers. I think I traced the reason down to small multiplications currently being optimized differently than bit shifts. Just goes to show you can't take for granted what will run faster based on theory. Show 3 more comments.

Active Oldest Votes. Lets look at two little C programs that do a bit shift and a divide. The key is what do they do? Lets look at this in a quick perl script:! Improve this answer.



0コメント

  • 1000 / 1000