import {AssemblyInstructionInfo} from '../base.js'; export function getAsmOpcode(opcode: string | undefined): AssemblyInstructionInfo | undefined { if (!opcode) return; switch (opcode.toUpperCase()) { case 'RET': return { url: `https://llvm.org/docs/LangRef.html#ret-instruction`, html: `
ret
’ Instruction¶ret <type> <value> ; Return a value from a non-void functionret void ; Return from void function
The ‘ret
’ instruction is used to return control flow (and optionallya value) from a function back to the caller.
There are two forms of the ‘ret
’ instruction: one that returns avalue and then causes control flow, and one that just causes controlflow to occur.
The ‘ret
’ instruction optionally accepts a single argument, thereturn value. The type of the return value must be a ‘firstclass’ type.
A function is not well formed if it has a non-voidreturn type and contains a ‘ret
’ instruction with no return value ora return value with a type that does not match its type, or if it has avoid return type and contains a ‘ret
’ instruction with a returnvalue.
When the ‘ret
’ instruction is executed, control flow returns back tothe calling function’s context. If the caller is a“call” instruction, execution continues at theinstruction after the call. If the caller was an“invoke” instruction, execution continues at thebeginning of the “normal” destination block. If the instruction returnsa value, that value shall set the call or invoke instruction’s returnvalue.
ret i32 5 ; Return an integer value of 5ret void ; Return from a void functionret { i32, i8 } { i32 4, i8 2 } ; Return a struct of values 4 and 2
br
’ Instruction¶br i1 <cond>, label <iftrue>, label <iffalse>br label <dest> ; Unconditional branch
The ‘br
’ instruction is used to cause control flow to transfer to adifferent basic block in the current function. There are two forms ofthis instruction, corresponding to a conditional branch and anunconditional branch.
The conditional branch form of the ‘br
’ instruction takes a single‘i1
’ value and two ‘label
’ values. The unconditional form of the‘br
’ instruction takes a single ‘label
’ value as a target.
Upon execution of a conditional ‘br
’ instruction, the ‘i1
’argument is evaluated. If the value is true
, control flows to the‘iftrue
’ label
argument. If “cond” is false
, control flowsto the ‘iffalse
’ label
argument.If ‘cond
’ is poison
or undef
, this instruction has undefinedbehavior.
Test: %cond = icmp eq i32 %a, %b br i1 %cond, label %IfEqual, label %IfUnequalIfEqual: ret i32 1IfUnequal: ret i32 0
switch
’ Instruction¶switch <intty> <value>, label <defaultdest> [ <intty> <val>, label <dest> ... ]
The ‘switch
’ instruction is used to transfer control flow to one ofseveral different places. It is a generalization of the ‘br
’instruction, allowing a branch to occur to one of many possibledestinations.
The ‘switch
’ instruction uses three parameters: an integercomparison value ‘value
’, a default ‘label
’ destination, and anarray of pairs of comparison value constants and ‘label
’s. The tableis not allowed to contain duplicate constant entries.
The switch
instruction specifies a table of values and destinations.When the ‘switch
’ instruction is executed, this table is searchedfor the given value. If the value is found, control flow is transferredto the corresponding destination; otherwise, control flow is transferredto the default destination.If ‘value
’ is poison
or undef
, this instruction has undefinedbehavior.
Depending on properties of the target machine and the particularswitch
instruction, this instruction may be code generated indifferent ways. For example, it could be generated as a series ofchained conditional branches or with a lookup table.
; Emulate a conditional br instruction%Val = zext i1 %value to i32switch i32 %Val, label %truedest [ i32 0, label %falsedest ]; Emulate an unconditional br instructionswitch i32 0, label %dest [ ]; Implement a jump table:switch i32 %val, label %otherwise [ i32 0, label %onzero i32 1, label %onone i32 2, label %ontwo ]
indirectbr
’ Instruction¶indirectbr ptr <address>, [ label <dest1>, label <dest2>, ... ]
The ‘indirectbr
’ instruction implements an indirect branch to alabel within the current function, whose address is specified by“address
”. Address must be derived from ablockaddress constant.
The ‘address
’ argument is the address of the label to jump to. Therest of the arguments indicate the full set of possible destinationsthat the address may point to. Blocks are allowed to occur multipletimes in the destination list, though this isn’t particularly useful.
This destination list is required so that dataflow analysis has anaccurate understanding of the CFG.
Control transfers to the block specified in the address argument. Allpossible destination blocks must be listed in the label list, otherwisethis instruction has undefined behavior. This implies that jumps tolabels defined in other functions have undefined behavior as well.If ‘address
’ is poison
or undef
, this instruction has undefinedbehavior.
This is typically implemented with a jump through a register.
indirectbr ptr %Addr, [ label %bb1, label %bb2, label %bb3 ]
invoke
’ Instruction¶<result> = invoke [cconv] [ret attrs] [addrspace(<num>)] <ty>|<fnty> <fnptrval>(<function args>) [fn attrs] [operand bundles] to label <normal label> unwind label <exception label>
The ‘invoke
’ instruction causes control to transfer to a specifiedfunction, with the possibility of control flow transfer to either the‘normal
’ label or the ‘exception
’ label. If the callee functionreturns with the “ret
” instruction, control flow will return to the“normal” label. If the callee (or any indirect callees) returns via the“resume” instruction or other exception handlingmechanism, control is interrupted and continued at the dynamicallynearest “exception” label.
The ‘exception
’ label is a landingpad for the exception. As such,‘exception
’ label is required to have the“landingpad” instruction, which contains theinformation about the behavior of the program after unwinding happens,as its first non-PHI instruction. The restrictions on the“landingpad
” instruction’s tightly couples it to the “invoke
”instruction, so that the important information contained within the“landingpad
” instruction can’t be lost through normal code motion.
This instruction requires several arguments:
zeroext
’, ‘signext
’, and ‘inreg
’ attributesare valid here.ty
’: the type of the call instruction itself which is also thetype of the return value. Functions that return no value are markedvoid
.fnty
’: shall be the signature of the function being invoked. Theargument types must match the types implied by this signature. Thistype can be omitted if the function is not varargs.fnptrval
’: An LLVM value containing a pointer to a function tobe invoked. In most cases, this is a direct function invocation, butindirect invoke
’s are just as possible, calling an arbitrary pointerto function value.function args
’: argument list whose types match the functionsignature argument types and parameter attributes. All arguments mustbe of first class type. If the function signatureindicates the function accepts a variable number of arguments, theextra arguments can be specified.normal label
’: the label reached when the called functionexecutes a ‘ret
’ instruction.exception label
’: the label reached when a callee returns viathe resume instruction or other exception handlingmechanism.This instruction is designed to operate as a standard ‘call
’instruction in most regards. The primary difference is that itestablishes an association with a label, which is used by the runtimelibrary to unwind the stack.
This instruction is used in languages with destructors to ensure thatproper cleanup is performed in the case of either a longjmp
or athrown exception. Additionally, this is important for implementation of‘catch
’ clauses in high-level languages that support them.
For the purposes of the SSA form, the definition of the value returnedby the ‘invoke
’ instruction is deemed to occur on the edge from thecurrent block to the “normal” label. If the callee unwinds then noreturn value is available.
%retval = invoke i32 @Test(i32 15) to label %Continue unwind label %TestCleanup ; i32:retval set%retval = invoke coldcc i32 %Testfnptr(i32 15) to label %Continue unwind label %TestCleanup ; i32:retval set
callbr
’ Instruction¶<result> = callbr [cconv] [ret attrs] [addrspace(<num>)] <ty>|<fnty> <fnptrval>(<function args>) [fn attrs] [operand bundles] to label <fallthrough label> [indirect labels]
The ‘callbr
’ instruction causes control to transfer to a specifiedfunction, with the possibility of control flow transfer to either the‘fallthrough
’ label or one of the ‘indirect
’ labels.
This instruction should only be used to implement the “goto” feature of gccstyle inline assembly. Any other usage is an error in the IR verifier.
This instruction requires several arguments:
zeroext
’, ‘signext
’, and ‘inreg
’ attributesare valid here.ty
’: the type of the call instruction itself which is also thetype of the return value. Functions that return no value are markedvoid
.fnty
’: shall be the signature of the function being called. Theargument types must match the types implied by this signature. Thistype can be omitted if the function is not varargs.fnptrval
’: An LLVM value containing a pointer to a function tobe called. In most cases, this is a direct function call, butother callbr
’s are just as possible, calling an arbitrary pointerto function value.function args
’: argument list whose types match the functionsignature argument types and parameter attributes. All arguments mustbe of first class type. If the function signatureindicates the function accepts a variable number of arguments, theextra arguments can be specified.fallthrough label
’: the label reached when the inline assembly’sexecution exits the bottom.indirect labels
’: the labels reached when a callee transfers controlto a location other than the ‘fallthrough label
’. Label constraintsrefer to these destinations.This instruction is designed to operate as a standard ‘call
’instruction in most regards. The primary difference is that itestablishes an association with additional labels to define where controlflow goes after the call.
The output values of a ‘callbr
’ instruction are available only tothe ‘fallthrough
’ block, not to any ‘indirect
’ blocks(s).
The only use of this today is to implement the “goto” feature of gcc inlineassembly where additional labels can be provided as locations for the inlineassembly to jump to.
; "asm goto" without output constraints.callbr void asm "", "r,!i"(i32 %x) to label %fallthrough [label %indirect]; "asm goto" with output constraints.<result> = callbr i32 asm "", "=r,r,!i"(i32 %x) to label %fallthrough [label %indirect]
resume
’ Instruction¶resume <type> <value>
The ‘resume
’ instruction is a terminator instruction that has nosuccessors.
The ‘resume
’ instruction requires one argument, which must have thesame type as the result of any ‘landingpad
’ instruction in the samefunction.
The ‘resume
’ instruction resumes propagation of an existing(in-flight) exception whose unwinding was interrupted with alandingpad instruction.
resume { ptr, i32 } %exn
catchswitch
’ Instruction¶<resultval> = catchswitch within <parent> [ label <handler1>, label <handler2>, ... ] unwind to caller<resultval> = catchswitch within <parent> [ label <handler1>, label <handler2>, ... ] unwind label <default>
The ‘catchswitch
’ instruction is used by LLVM’s exception handling system to describe the set of possible catch handlersthat may be executed by the EH personality routine.
The parent
argument is the token of the funclet that contains thecatchswitch
instruction. If the catchswitch
is not inside a funclet,this operand may be the token none
.
The default
argument is the label of another basic block beginning witheither a cleanuppad
or catchswitch
instruction. This unwind destinationmust be a legal target with respect to the parent
links, as described inthe exception handling documentation.
The handlers
are a nonempty list of successor blocks that each begin with acatchpad instruction.
Executing this instruction transfers control to one of the successors inhandlers
, if appropriate, or continues to unwind via the unwind label ifpresent.
The catchswitch
is both a terminator and a “pad” instruction, meaning thatit must be both the first non-phi instruction and last instruction in the basicblock. Therefore, it must be the only non-phi instruction in the block.
dispatch1: %cs1 = catchswitch within none [label %handler0, label %handler1] unwind to callerdispatch2: %cs2 = catchswitch within %parenthandler [label %handler0] unwind label %cleanup
catchret
’ Instruction¶catchret from <token> to label <normal>
The ‘catchret
’ instruction is a terminator instruction that has asingle successor.
The first argument to a ‘catchret
’ indicates which catchpad
itexits. It must be a catchpad.The second argument to a ‘catchret
’ specifies where control willtransfer to next.
The ‘catchret
’ instruction ends an existing (in-flight) exception whoseunwinding was interrupted with a catchpad instruction. Thepersonality function gets a chance to execute arbitrarycode to, for example, destroy the active exception. Control then transfers tonormal
.
The token
argument must be a token produced by a catchpad
instruction.If the specified catchpad
is not the most-recently-entered not-yet-exitedfunclet pad (as described in the EH documentation),the catchret
’s behavior is undefined.
catchret from %catch to label %continue
cleanupret
’ Instruction¶cleanupret from <value> unwind label <continue>cleanupret from <value> unwind to caller
The ‘cleanupret
’ instruction is a terminator instruction that hasan optional successor.
The ‘cleanupret
’ instruction requires one argument, which indicateswhich cleanuppad
it exits, and must be a cleanuppad.If the specified cleanuppad
is not the most-recently-entered not-yet-exitedfunclet pad (as described in the EH documentation),the cleanupret
’s behavior is undefined.
The ‘cleanupret
’ instruction also has an optional successor, continue
,which must be the label of another basic block beginning with either acleanuppad
or catchswitch
instruction. This unwind destination mustbe a legal target with respect to the parent
links, as described in theexception handling documentation.
The ‘cleanupret
’ instruction indicates to thepersonality function that onecleanuppad it transferred control to has ended.It transfers control to continue
or unwinds out of the function.
cleanupret from %cleanup unwind to callercleanupret from %cleanup unwind label %continue
unreachable
’ Instruction¶unreachable
The ‘unreachable
’ instruction has no defined semantics. Thisinstruction is used to inform the optimizer that a particular portion ofthe code is not reachable. This can be used to indicate that the codeafter a no-return function cannot be reached, and other facts.
The ‘unreachable
’ instruction has no defined semantics.
fneg
’ Instruction¶<result> = fneg [fast-math flags]* <ty> <op1> ; yields ty:result
The ‘fneg
’ instruction returns the negation of its operand.
The argument to the ‘fneg
’ instruction must be afloating-point or vector offloating-point values.
The value produced is a copy of the operand with its sign bit flipped.This instruction can also take any number of fast-mathflags, which are optimization hints to enable otherwiseunsafe floating-point optimizations:
<result> = fneg float %val ; yields float:result = -%var
add
’ Instruction¶<result> = add <ty> <op1>, <op2> ; yields ty:result<result> = add nuw <ty> <op1>, <op2> ; yields ty:result<result> = add nsw <ty> <op1>, <op2> ; yields ty:result<result> = add nuw nsw <ty> <op1>, <op2> ; yields ty:result
The ‘add
’ instruction returns the sum of its two operands.
The two arguments to the ‘add
’ instruction must beinteger or vector of integer values. Botharguments must have identical types.
The value produced is the integer sum of the two operands.
If the sum has unsigned overflow, the result returned is themathematical result modulo 2n, where n is the bit width ofthe result.
Because LLVM integers use a two’s complement representation, thisinstruction is appropriate for both signed and unsigned integers.
nuw
and nsw
stand for “No Unsigned Wrap” and “No Signed Wrap”,respectively. If the nuw
and/or nsw
keywords are present, theresult value of the add
is a poison value ifunsigned and/or signed overflow, respectively, occurs.
<result> = add i32 4, %var ; yields i32:result = 4 + %var
fadd
’ Instruction¶<result> = fadd [fast-math flags]* <ty> <op1>, <op2> ; yields ty:result
The ‘fadd
’ instruction returns the sum of its two operands.
The two arguments to the ‘fadd
’ instruction must befloating-point or vector offloating-point values. Both arguments must have identical types.
The value produced is the floating-point sum of the two operands.This instruction is assumed to execute in the default floating-pointenvironment.This instruction can also take any number of fast-mathflags, which are optimization hints to enable otherwiseunsafe floating-point optimizations:
<result> = fadd float 4.0, %var ; yields float:result = 4.0 + %var
sub
’ Instruction¶<result> = sub <ty> <op1>, <op2> ; yields ty:result<result> = sub nuw <ty> <op1>, <op2> ; yields ty:result<result> = sub nsw <ty> <op1>, <op2> ; yields ty:result<result> = sub nuw nsw <ty> <op1>, <op2> ; yields ty:result
The ‘sub
’ instruction returns the difference of its two operands.
Note that the ‘sub
’ instruction is used to represent the ‘neg
’instruction present in most other intermediate representations.
The two arguments to the ‘sub
’ instruction must beinteger or vector of integer values. Botharguments must have identical types.
The value produced is the integer difference of the two operands.
If the difference has unsigned overflow, the result returned is themathematical result modulo 2n, where n is the bit width ofthe result.
Because LLVM integers use a two’s complement representation, thisinstruction is appropriate for both signed and unsigned integers.
nuw
and nsw
stand for “No Unsigned Wrap” and “No Signed Wrap”,respectively. If the nuw
and/or nsw
keywords are present, theresult value of the sub
is a poison value ifunsigned and/or signed overflow, respectively, occurs.
<result> = sub i32 4, %var ; yields i32:result = 4 - %var<result> = sub i32 0, %val ; yields i32:result = -%var
fsub
’ Instruction¶<result> = fsub [fast-math flags]* <ty> <op1>, <op2> ; yields ty:result
The ‘fsub
’ instruction returns the difference of its two operands.
The two arguments to the ‘fsub
’ instruction must befloating-point or vector offloating-point values. Both arguments must have identical types.
The value produced is the floating-point difference of the two operands.This instruction is assumed to execute in the default floating-pointenvironment.This instruction can also take any number of fast-mathflags, which are optimization hints to enable otherwiseunsafe floating-point optimizations:
<result> = fsub float 4.0, %var ; yields float:result = 4.0 - %var<result> = fsub float -0.0, %val ; yields float:result = -%var
mul
’ Instruction¶<result> = mul <ty> <op1>, <op2> ; yields ty:result<result> = mul nuw <ty> <op1>, <op2> ; yields ty:result<result> = mul nsw <ty> <op1>, <op2> ; yields ty:result<result> = mul nuw nsw <ty> <op1>, <op2> ; yields ty:result
The ‘mul
’ instruction returns the product of its two operands.
The two arguments to the ‘mul
’ instruction must beinteger or vector of integer values. Botharguments must have identical types.
The value produced is the integer product of the two operands.
If the result of the multiplication has unsigned overflow, the resultreturned is the mathematical result modulo 2n, where n is thebit width of the result.
Because LLVM integers use a two’s complement representation, and theresult is the same width as the operands, this instruction returns thecorrect result for both signed and unsigned integers. If a full product(e.g. i32
* i32
-> i64
) is needed, the operands should besign-extended or zero-extended as appropriate to the width of the fullproduct.
nuw
and nsw
stand for “No Unsigned Wrap” and “No Signed Wrap”,respectively. If the nuw
and/or nsw
keywords are present, theresult value of the mul
is a poison value ifunsigned and/or signed overflow, respectively, occurs.
<result> = mul i32 4, %var ; yields i32:result = 4 * %var
fmul
’ Instruction¶<result> = fmul [fast-math flags]* <ty> <op1>, <op2> ; yields ty:result
The ‘fmul
’ instruction returns the product of its two operands.
The two arguments to the ‘fmul
’ instruction must befloating-point or vector offloating-point values. Both arguments must have identical types.
The value produced is the floating-point product of the two operands.This instruction is assumed to execute in the default floating-pointenvironment.This instruction can also take any number of fast-mathflags, which are optimization hints to enable otherwiseunsafe floating-point optimizations:
<result> = fmul float 4.0, %var ; yields float:result = 4.0 * %var
udiv
’ Instruction¶<result> = udiv <ty> <op1>, <op2> ; yields ty:result<result> = udiv exact <ty> <op1>, <op2> ; yields ty:result
The ‘udiv
’ instruction returns the quotient of its two operands.
The two arguments to the ‘udiv
’ instruction must beinteger or vector of integer values. Botharguments must have identical types.
The value produced is the unsigned integer quotient of the two operands.
Note that unsigned integer division and signed integer division aredistinct operations; for signed integer division, use ‘sdiv
’.
Division by zero is undefined behavior. For vectors, if any elementof the divisor is zero, the operation has undefined behavior.
If the exact
keyword is present, the result value of the udiv
isa poison value if %op1 is not a multiple of %op2 (assuch, “((a udiv exact b) mul b) == a”).
<result> = udiv i32 4, %var ; yields i32:result = 4 / %var
sdiv
’ Instruction¶<result> = sdiv <ty> <op1>, <op2> ; yields ty:result<result> = sdiv exact <ty> <op1>, <op2> ; yields ty:result
The ‘sdiv
’ instruction returns the quotient of its two operands.
The two arguments to the ‘sdiv
’ instruction must beinteger or vector of integer values. Botharguments must have identical types.
The value produced is the signed integer quotient of the two operandsrounded towards zero.
Note that signed integer division and unsigned integer division aredistinct operations; for unsigned integer division, use ‘udiv
’.
Division by zero is undefined behavior. For vectors, if any elementof the divisor is zero, the operation has undefined behavior.Overflow also leads to undefined behavior; this is a rare case, but canoccur, for example, by doing a 32-bit division of -2147483648 by -1.
If the exact
keyword is present, the result value of the sdiv
isa poison value if the result would be rounded.
<result> = sdiv i32 4, %var ; yields i32:result = 4 / %var
fdiv
’ Instruction¶<result> = fdiv [fast-math flags]* <ty> <op1>, <op2> ; yields ty:result
The ‘fdiv
’ instruction returns the quotient of its two operands.
The two arguments to the ‘fdiv
’ instruction must befloating-point or vector offloating-point values. Both arguments must have identical types.
The value produced is the floating-point quotient of the two operands.This instruction is assumed to execute in the default floating-pointenvironment.This instruction can also take any number of fast-mathflags, which are optimization hints to enable otherwiseunsafe floating-point optimizations:
<result> = fdiv float 4.0, %var ; yields float:result = 4.0 / %var
urem
’ Instruction¶<result> = urem <ty> <op1>, <op2> ; yields ty:result
The ‘urem
’ instruction returns the remainder from the unsigneddivision of its two arguments.
The two arguments to the ‘urem
’ instruction must beinteger or vector of integer values. Botharguments must have identical types.
This instruction returns the unsigned integer remainder of a division.This instruction always performs an unsigned division to get theremainder.
Note that unsigned integer remainder and signed integer remainder aredistinct operations; for signed integer remainder, use ‘srem
’.
Taking the remainder of a division by zero is undefined behavior.For vectors, if any element of the divisor is zero, the operation hasundefined behavior.
<result> = urem i32 4, %var ; yields i32:result = 4 % %var
srem
’ Instruction¶<result> = srem <ty> <op1>, <op2> ; yields ty:result
The ‘srem
’ instruction returns the remainder from the signeddivision of its two operands. This instruction can also takevector versions of the values in which case the elementsmust be integers.
The two arguments to the ‘srem
’ instruction must beinteger or vector of integer values. Botharguments must have identical types.
This instruction returns the remainder of a division (where the resultis either zero or has the same sign as the dividend, op1
), not themodulo operator (where the result is either zero or has the same signas the divisor, op2
) of a value. For more information about thedifference, see The MathForum. For atable of how this is implemented in various languages, please seeWikipedia: modulooperation.
Note that signed integer remainder and unsigned integer remainder aredistinct operations; for unsigned integer remainder, use ‘urem
’.
Taking the remainder of a division by zero is undefined behavior.For vectors, if any element of the divisor is zero, the operation hasundefined behavior.Overflow also leads to undefined behavior; this is a rare case, but canoccur, for example, by taking the remainder of a 32-bit division of-2147483648 by -1. (The remainder doesn’t actually overflow, but thisrule lets srem be implemented using instructions that return both theresult of the division and the remainder.)
<result> = srem i32 4, %var ; yields i32:result = 4 % %var
frem
’ Instruction¶<result> = frem [fast-math flags]* <ty> <op1>, <op2> ; yields ty:result
The ‘frem
’ instruction returns the remainder from the division ofits two operands.
The two arguments to the ‘frem
’ instruction must befloating-point or vector offloating-point values. Both arguments must have identical types.
The value produced is the floating-point remainder of the two operands.This is the same output as a libm ‘fmod
’ function, but without anypossibility of setting errno
. The remainder has the same sign as thedividend.This instruction is assumed to execute in the default floating-pointenvironment.This instruction can also take any number of fast-mathflags, which are optimization hints to enable otherwiseunsafe floating-point optimizations:
<result> = frem float 4.0, %var ; yields float:result = 4.0 % %var
shl
’ Instruction¶<result> = shl <ty> <op1>, <op2> ; yields ty:result<result> = shl nuw <ty> <op1>, <op2> ; yields ty:result<result> = shl nsw <ty> <op1>, <op2> ; yields ty:result<result> = shl nuw nsw <ty> <op1>, <op2> ; yields ty:result
The ‘shl
’ instruction returns the first operand shifted to the lefta specified number of bits.
Both arguments to the ‘shl
’ instruction must be the sameinteger or vector of integer type.‘op2
’ is treated as an unsigned value.
The value produced is op1
* 2op2 mod 2n,where n
is the width of the result. If op2
is (statically ordynamically) equal to or larger than the number of bits inop1
, this instruction returns a poison value.If the arguments are vectors, each vector element of op1
is shiftedby the corresponding shift amount in op2
.
If the nuw
keyword is present, then the shift produces a poisonvalue if it shifts out any non-zero bits.If the nsw
keyword is present, then the shift produces a poisonvalue if it shifts out any bits that disagree with the resultant sign bit.
<result> = shl i32 4, %var ; yields i32: 4 << %var<result> = shl i32 4, 2 ; yields i32: 16<result> = shl i32 1, 10 ; yields i32: 1024<result> = shl i32 1, 32 ; undefined<result> = shl <2 x i32> < i32 1, i32 1>, < i32 1, i32 2> ; yields: result=<2 x i32> < i32 2, i32 4>
lshr
’ Instruction¶<result> = lshr <ty> <op1>, <op2> ; yields ty:result<result> = lshr exact <ty> <op1>, <op2> ; yields ty:result
The ‘lshr
’ instruction (logical shift right) returns the firstoperand shifted to the right a specified number of bits with zero fill.
Both arguments to the ‘lshr
’ instruction must be the sameinteger or vector of integer type.‘op2
’ is treated as an unsigned value.
This instruction always performs a logical shift right operation. Themost significant bits of the result will be filled with zero bits afterthe shift. If op2
is (statically or dynamically) equal to or largerthan the number of bits in op1
, this instruction returns a poisonvalue. If the arguments are vectors, each vector elementof op1
is shifted by the corresponding shift amount in op2
.
If the exact
keyword is present, the result value of the lshr
isa poison value if any of the bits shifted out are non-zero.
<result> = lshr i32 4, 1 ; yields i32:result = 2<result> = lshr i32 4, 2 ; yields i32:result = 1<result> = lshr i8 4, 3 ; yields i8:result = 0<result> = lshr i8 -2, 1 ; yields i8:result = 0x7F<result> = lshr i32 1, 32 ; undefined<result> = lshr <2 x i32> < i32 -2, i32 4>, < i32 1, i32 2> ; yields: result=<2 x i32> < i32 0x7FFFFFFF, i32 1>
ashr
’ Instruction¶<result> = ashr <ty> <op1>, <op2> ; yields ty:result<result> = ashr exact <ty> <op1>, <op2> ; yields ty:result
The ‘ashr
’ instruction (arithmetic shift right) returns the firstoperand shifted to the right a specified number of bits with signextension.
Both arguments to the ‘ashr
’ instruction must be the sameinteger or vector of integer type.‘op2
’ is treated as an unsigned value.
This instruction always performs an arithmetic shift right operation,The most significant bits of the result will be filled with the sign bitof op1
. If op2
is (statically or dynamically) equal to or largerthan the number of bits in op1
, this instruction returns a poisonvalue. If the arguments are vectors, each vector elementof op1
is shifted by the corresponding shift amount in op2
.
If the exact
keyword is present, the result value of the ashr
isa poison value if any of the bits shifted out are non-zero.
<result> = ashr i32 4, 1 ; yields i32:result = 2<result> = ashr i32 4, 2 ; yields i32:result = 1<result> = ashr i8 4, 3 ; yields i8:result = 0<result> = ashr i8 -2, 1 ; yields i8:result = -1<result> = ashr i32 1, 32 ; undefined<result> = ashr <2 x i32> < i32 -2, i32 4>, < i32 1, i32 3> ; yields: result=<2 x i32> < i32 -1, i32 0>
and
’ Instruction¶<result> = and <ty> <op1>, <op2> ; yields ty:result
The ‘and
’ instruction returns the bitwise logical and of its twooperands.
The two arguments to the ‘and
’ instruction must beinteger or vector of integer values. Botharguments must have identical types.
<result> = and i32 4, %var ; yields i32:result = 4 & %var<result> = and i32 15, 40 ; yields i32:result = 8<result> = and i32 4, 8 ; yields i32:result = 0
or
’ Instruction¶<result> = or <ty> <op1>, <op2> ; yields ty:result
The ‘or
’ instruction returns the bitwise logical inclusive or of itstwo operands.
The two arguments to the ‘or
’ instruction must beinteger or vector of integer values. Botharguments must have identical types.
<result> = or i32 4, %var ; yields i32:result = 4 | %var<result> = or i32 15, 40 ; yields i32:result = 47<result> = or i32 4, 8 ; yields i32:result = 12
xor
’ Instruction¶<result> = xor <ty> <op1>, <op2> ; yields ty:result
The ‘xor
’ instruction returns the bitwise logical exclusive or ofits two operands. The xor
is used to implement the “one’scomplement” operation, which is the “~” operator in C.
The two arguments to the ‘xor
’ instruction must beinteger or vector of integer values. Botharguments must have identical types.
<result> = xor i32 4, %var ; yields i32:result = 4 ^ %var<result> = xor i32 15, 40 ; yields i32:result = 39<result> = xor i32 4, 8 ; yields i32:result = 12<result> = xor i32 %V, -1 ; yields i32:result = ~%V
extractelement
’ Instruction¶<result> = extractelement <n x <ty>> <val>, <ty2> <idx> ; yields <ty><result> = extractelement <vscale x n x <ty>> <val>, <ty2> <idx> ; yields <ty>
The ‘extractelement
’ instruction extracts a single scalar elementfrom a vector at a specified index.
The first operand of an ‘extractelement
’ instruction is a value ofvector type. The second operand is an index indicatingthe position from which to extract the element. The index may be avariable of any integer type, and will be treated as an unsigned integer.
The result is a scalar of the same type as the element type of val
.Its value is the value at position idx
of val
. If idx
exceeds the length of val
for a fixed-length vector, the result is apoison value. For a scalable vector, if the valueof idx
exceeds the runtime length of the vector, the result is apoison value.
<result> = extractelement <4 x i32> %vec, i32 0 ; yields i32
insertelement
’ Instruction¶<result> = insertelement <n x <ty>> <val>, <ty> <elt>, <ty2> <idx> ; yields <n x <ty>><result> = insertelement <vscale x n x <ty>> <val>, <ty> <elt>, <ty2> <idx> ; yields <vscale x n x <ty>>
The ‘insertelement
’ instruction inserts a scalar element into avector at a specified index.
The first operand of an ‘insertelement
’ instruction is a value ofvector type. The second operand is a scalar value whosetype must equal the element type of the first operand. The third operandis an index indicating the position at which to insert the value. Theindex may be a variable of any integer type, and will be treated as anunsigned integer.
The result is a vector of the same type as val
. Its element valuesare those of val
except at position idx
, where it gets the valueelt
. If idx
exceeds the length of val
for a fixed-length vector,the result is a poison value. For a scalable vector,if the value of idx
exceeds the runtime length of the vector, the resultis a poison value.
<result> = insertelement <4 x i32> %vec, i32 1, i32 0 ; yields <4 x i32>
shufflevector
’ Instruction¶<result> = shufflevector <n x <ty>> <v1>, <n x <ty>> <v2>, <m x i32> <mask> ; yields <m x <ty>><result> = shufflevector <vscale x n x <ty>> <v1>, <vscale x n x <ty>> v2, <vscale x m x i32> <mask> ; yields <vscale x m x <ty>>
The ‘shufflevector
’ instruction constructs a permutation of elementsfrom two input vectors, returning a vector with the same element type asthe input and length that is the same as the shuffle mask.
The first two operands of a ‘shufflevector
’ instruction are vectorswith the same type. The third argument is a shuffle mask vector constantwhose element type is i32
. The mask vector elements must be constantintegers or undef
values. The result of the instruction is a vectorwhose length is the same as the shuffle mask and whose element type is thesame as the element type of the first two operands.
The elements of the two input vectors are numbered from left to rightacross both of the vectors. For each element of the result vector, theshuffle mask selects an element from one of the input vectors to copyto the result. Non-negative elements in the mask represent an indexinto the concatenated pair of input vectors.
If the shuffle mask is undefined, the result vector is undefined. Ifthe shuffle mask selects an undefined element from one of the inputvectors, the resulting element is undefined. An undefined elementin the mask vector specifies that the resulting element is undefined.An undefined element in the mask vector prevents a poisoned vectorelement from propagating.
For scalable vectors, the only valid mask values at present arezeroinitializer
and undef
, since we cannot write all indices asliterals for a vector with a length unknown at compile time.
<result> = shufflevector <4 x i32> %v1, <4 x i32> %v2, <4 x i32> <i32 0, i32 4, i32 1, i32 5> ; yields <4 x i32><result> = shufflevector <4 x i32> %v1, <4 x i32> undef, <4 x i32> <i32 0, i32 1, i32 2, i32 3> ; yields <4 x i32> - Identity shuffle.<result> = shufflevector <8 x i32> %v1, <8 x i32> undef, <4 x i32> <i32 0, i32 1, i32 2, i32 3> ; yields <4 x i32><result> = shufflevector <4 x i32> %v1, <4 x i32> %v2, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7 > ; yields <8 x i32>
extractvalue
’ Instruction¶<result> = extractvalue <aggregate type> <val>, <idx>{, <idx>}*
The ‘extractvalue
’ instruction extracts the value of a member fieldfrom an aggregate value.
The first operand of an ‘extractvalue
’ instruction is a value ofstruct or array type. The other operands areconstant indices to specify which value to extract in a similar manneras indices in a ‘getelementptr
’ instruction.
The major differences to getelementptr
indexing are:
The result is the value at the position in the aggregate specified bythe index operands.
<result> = extractvalue {i32, float} %agg, 0 ; yields i32
insertvalue
’ Instruction¶<result> = insertvalue <aggregate type> <val>, <ty> <elt>, <idx>{, <idx>}* ; yields <aggregate type>
The first operand of an ‘insertvalue
’ instruction is a value ofstruct or array type. The second operand isa first-class value to insert. The following operands are constantindices indicating the position at which to insert the value in asimilar manner as indices in a ‘extractvalue
’ instruction. The valueto insert must have the same type as the value identified by theindices.
The result is an aggregate of the same type as val
. Its value isthat of val
except that the value at the position specified by theindices is that of elt
.
%agg1 = insertvalue {i32, float} undef, i32 1, 0 ; yields {i32 1, float undef}%agg2 = insertvalue {i32, float} %agg1, float %val, 1 ; yields {i32 1, float %val}%agg3 = insertvalue {i32, {float}} undef, float %val, 1, 0 ; yields {i32 undef, {float %val}}
alloca
’ Instruction¶<result> = alloca [inalloca] <type> [, <ty> <NumElements>] [, align <alignment>] [, addrspace(<num>)] ; yields type addrspace(num)*:result
The ‘alloca
’ instruction allocates memory on the stack frame of thecurrently executing function, to be automatically released when thisfunction returns to its caller. If the address space is not explicitlyspecified, the object is allocated in the alloca address space from thedatalayout string.
The ‘alloca
’ instruction allocates sizeof(<type>)*NumElements
bytes of memory on the runtime stack, returning a pointer of theappropriate type to the program. If “NumElements” is specified, it isthe number of elements allocated, otherwise “NumElements” is defaultedto be one. If a constant alignment is specified, the value result of theallocation is guaranteed to be aligned to at least that boundary. Thealignment may not be greater than 1 << 32
. If not specified, or ifzero, the target can choose to align the allocation on any convenientboundary compatible with the type.
‘type
’ may be any sized type.
Memory is allocated; a pointer is returned. The allocated memory isuninitialized, and loading from uninitialized memory produces an undefinedvalue. The operation itself is undefined if there is insufficient stackspace for the allocation.’alloca
’d memory is automatically releasedwhen the function returns. The ‘alloca
’ instruction is commonly usedto represent automatic variables that must have an address available. Whenthe function returns (either with the ret
or resume
instructions),the memory is reclaimed. Allocating zero bytes is legal, but the returnedpointer may not be unique. The order in which memory is allocated (ie.,which way the stack grows) is not specified.
Note that ‘alloca
’ outside of the alloca address space from thedatalayout string is meaningful only if thetarget has assigned it a semantics.
If the returned pointer is used by llvm.lifetime.start,the returned object is initially dead.See llvm.lifetime.start andllvm.lifetime.end for the precise semantics oflifetime-manipulating intrinsics.
%ptr = alloca i32 ; yields ptr%ptr = alloca i32, i32 4 ; yields ptr%ptr = alloca i32, i32 4, align 1024 ; yields ptr%ptr = alloca i32, align 1024 ; yields ptr
load
’ Instruction¶<result> = load [volatile] <ty>, ptr <pointer>[, align <alignment>][, !nontemporal !<nontemp_node>][, !invariant.load !<empty_node>][, !invariant.group !<empty_node>][, !nonnull !<empty_node>][, !dereferenceable !<deref_bytes_node>][, !dereferenceable_or_null !<deref_bytes_node>][, !align !<align_node>][, !noundef !<empty_node>]<result> = load atomic [volatile] <ty>, ptr <pointer> [syncscope("<target-scope>")] <ordering>, align <alignment> [, !invariant.group !<empty_node>]!<nontemp_node> = !{ i32 1 }!<empty_node> = !{}!<deref_bytes_node> = !{ i64 <dereferenceable_bytes> }!<align_node> = !{ i64 <value_alignment> }
The ‘load
’ instruction is used to read from memory.
The argument to the load
instruction specifies the memory address from whichto load. The type specified must be a first class type ofknown size (i.e. not containing an opaque structural type). Ifthe load
is marked as volatile
, then the optimizer is not allowed tomodify the number or order of execution of this load
with othervolatile operations.
If the load
is marked as atomic
, it takes an extra ordering and optional syncscope("<target-scope>")
argument. Therelease
and acq_rel
orderings are not valid on load
instructions.Atomic loads produce defined results when they may seemultiple atomic stores. The type of the pointee must be an integer, pointer, orfloating-point type whose bit width is a power of two greater than or equal toeight and less than or equal to a target-specific size limit. align
must beexplicitly specified on atomic loads, and the load has undefined behavior if thealignment is not set to a value which is at least the size in bytes of thepointee. !nontemporal
does not have any defined semantics for atomic loads.
The optional constant align
argument specifies the alignment of theoperation (that is, the alignment of the memory address). A value of 0or an omitted align
argument means that the operation has the ABIalignment for the target. It is the responsibility of the code emitterto ensure that the alignment information is correct. Overestimating thealignment results in undefined behavior. Underestimating the alignmentmay produce less efficient code. An alignment of 1 is always safe. Themaximum possible alignment is 1 << 32
. An alignment value higherthan the size of the loaded type implies memory up to the alignmentvalue bytes can be safely loaded without trapping in the defaultaddress space. Access of the high bytes can interfere with debuggingtools, so should not be accessed if the function has thesanitize_thread
or sanitize_address
attributes.
The optional !nontemporal
metadata must reference a singlemetadata name <nontemp_node>
corresponding to a metadata node with onei32
entry of value 1. The existence of the !nontemporal
metadata on the instruction tells the optimizer and code generatorthat this load is not expected to be reused in the cache. The codegenerator may select special instructions to save cache bandwidth, suchas the MOVNT
instruction on x86.
The optional !invariant.load
metadata must reference a singlemetadata name <empty_node>
corresponding to a metadata node with noentries. If a load instruction tagged with the !invariant.load
metadata is executed, the memory location referenced by the load hasto contain the same value at all points in the program where thememory location is dereferenceable; otherwise, the behavior isundefined.
!invariant.group
metadata must reference a single metadata name<empty_node>
corresponding to a metadata node with no entries.See invariant.group
metadata invariant.group.The optional !nonnull
metadata must reference a singlemetadata name <empty_node>
corresponding to a metadata node with noentries. The existence of the !nonnull
metadata on theinstruction tells the optimizer that the value loaded is known tonever be null. If the value is null at runtime, a poison value is returnedinstead. This is analogous to the nonnull
attribute on parameters andreturn values. This metadata can only be applied to loads of a pointer type.
The optional !dereferenceable
metadata must reference a single metadataname <deref_bytes_node>
corresponding to a metadata node with one i64
entry.See dereferenceable
metadata dereferenceable.
The optional !dereferenceable_or_null
metadata must reference a singlemetadata name <deref_bytes_node>
corresponding to a metadata node with onei64
entry.See dereferenceable_or_null
metadata dereferenceable_or_null.
The optional !align
metadata must reference a single metadata name<align_node>
corresponding to a metadata node with one i64
entry.The existence of the !align
metadata on the instruction tells theoptimizer that the value loaded is known to be aligned to a boundary specifiedby the integer value in the metadata node. The alignment must be a power of 2.This is analogous to the ‘’align’’ attribute on parameters and return values.This metadata can only be applied to loads of a pointer type. If the returnedvalue is not appropriately aligned at runtime, a poison value is returnedinstead.
The optional !noundef
metadata must reference a single metadata name<empty_node>
corresponding to a node with no entries. The existence of!noundef
metadata on the instruction tells the optimizer that the valueloaded is known to be well defined.If the value isn’t well defined, the behavior is undefined. If the !noundef
metadata is combined with poison-generating metadata like !nonnull
,violation of that metadata constraint will also result in undefined behavior.
The location of memory pointed to is loaded. If the value being loadedis of scalar type then the number of bytes read does not exceed theminimum number of bytes needed to hold all bits of the type. Forexample, loading an i24
reads at most three bytes. When loading avalue of a type like i20
with a size that is not an integral numberof bytes, the result is undefined if the value was not originallywritten using a store of the same type.If the value being loaded is of aggregate type, the bytes that correspond topadding may be accessed but are ignored, because it is impossible to observepadding from the loaded aggregate value.If <pointer>
is not a well-defined value, the behavior is undefined.
%ptr = alloca i32 ; yields ptrstore i32 3, ptr %ptr ; yields void%val = load i32, ptr %ptr ; yields i32:val = i32 3
store
’ Instruction¶store [volatile] <ty> <value>, ptr <pointer>[, align <alignment>][, !nontemporal !<nontemp_node>][, !invariant.group !<empty_node>] ; yields voidstore atomic [volatile] <ty> <value>, ptr <pointer> [syncscope("<target-scope>")] <ordering>, align <alignment> [, !invariant.group !<empty_node>] ; yields void!<nontemp_node> = !{ i32 1 }!<empty_node> = !{}
The ‘store
’ instruction is used to write to memory.
There are two arguments to the store
instruction: a value to store and anaddress at which to store it. The type of the <pointer>
operand must be apointer to the first class type of the <value>
operand. If the store
is marked as volatile
, then the optimizer is notallowed to modify the number or order of execution of this store
with othervolatile operations. Only values of first class types of known size (i.e. not containing an opaquestructural type) can be stored.
If the store
is marked as atomic
, it takes an extra ordering and optional syncscope("<target-scope>")
argument. Theacquire
and acq_rel
orderings aren’t valid on store
instructions.Atomic loads produce defined results when they may seemultiple atomic stores. The type of the pointee must be an integer, pointer, orfloating-point type whose bit width is a power of two greater than or equal toeight and less than or equal to a target-specific size limit. align
must beexplicitly specified on atomic stores, and the store has undefined behavior ifthe alignment is not set to a value which is at least the size in bytes of thepointee. !nontemporal
does not have any defined semantics for atomic stores.
The optional constant align
argument specifies the alignment of theoperation (that is, the alignment of the memory address). A value of 0or an omitted align
argument means that the operation has the ABIalignment for the target. It is the responsibility of the code emitterto ensure that the alignment information is correct. Overestimating thealignment results in undefined behavior. Underestimating thealignment may produce less efficient code. An alignment of 1 is alwayssafe. The maximum possible alignment is 1 << 32
. An alignmentvalue higher than the size of the stored type implies memory up to thealignment value bytes can be stored to without trapping in the defaultaddress space. Storing to the higher bytes however may result in dataraces if another thread can access the same address. Introducing adata race is not allowed. Storing to the extra bytes is not allowedeven in situations where a data race is known to not exist if thefunction has the sanitize_address
attribute.
The optional !nontemporal
metadata must reference a single metadataname <nontemp_node>
corresponding to a metadata node with one i32
entryof value 1. The existence of the !nontemporal
metadata on the instructiontells the optimizer and code generator that this load is not expected tobe reused in the cache. The code generator may select specialinstructions to save cache bandwidth, such as the MOVNT
instruction onx86.
The optional !invariant.group
metadata must reference asingle metadata name <empty_node>
. See invariant.group
metadata.
The contents of memory are updated to contain <value>
at thelocation specified by the <pointer>
operand. If <value>
isof scalar type then the number of bytes written does not exceed theminimum number of bytes needed to hold all bits of the type. Forexample, storing an i24
writes at most three bytes. When writing avalue of a type like i20
with a size that is not an integral numberof bytes, it is unspecified what happens to the extra bits that do notbelong to the type, but they will typically be overwritten.If <value>
is of aggregate type, padding is filled withundef.If <pointer>
is not a well-defined value, the behavior is undefined.
%ptr = alloca i32 ; yields ptrstore i32 3, ptr %ptr ; yields void%val = load i32, ptr %ptr ; yields i32:val = i32 3
fence
’ Instruction¶fence [syncscope("<target-scope>")] <ordering> ; yields void
The ‘fence
’ instruction is used to introduce happens-before edgesbetween operations.
‘fence
’ instructions take an ordering argument whichdefines what synchronizes-with edges they add. They can only be givenacquire
, release
, acq_rel
, and seq_cst
orderings.
A fence A which has (at least) release
ordering semanticssynchronizes with a fence B with (at least) acquire
orderingsemantics if and only if there exist atomic operations X and Y, bothoperating on some atomic object M, such that A is sequenced before X, Xmodifies M (either directly or through some side effect of a sequenceheaded by X), Y is sequenced before B, and Y observes M. This provides ahappens-before dependency between A and B. Rather than an explicitfence
, one (but not both) of the atomic operations X or Y mightprovide a release
or acquire
(resp.) ordering constraint andstill synchronize-with the explicit fence
and establish thehappens-before edge.
A fence
which has seq_cst
ordering, in addition to having bothacquire
and release
semantics specified above, participates inthe global program order of other seq_cst
operations and/or fences.
A fence
instruction can also take an optional“syncscope” argument.
fence acquire ; yields voidfence syncscope("singlethread") seq_cst ; yields voidfence syncscope("agent") seq_cst ; yields void
cmpxchg
’ Instruction¶cmpxchg [weak] [volatile] ptr <pointer>, <ty> <cmp>, <ty> <new> [syncscope("<target-scope>")] <success ordering> <failure ordering>[, align <alignment>] ; yields { ty, i1 }
The ‘cmpxchg
’ instruction is used to atomically modify memory. Itloads a value in memory and compares it to a given value. If they areequal, it tries to store a new value into the memory.
There are three arguments to the ‘cmpxchg
’ instruction: an addressto operate on, a value to compare to the value currently be at thataddress, and a new value to place at that address if the compared valuesare equal. The type of ‘<cmp>’ must be an integer or pointer type whosebit width is a power of two greater than or equal to eight and lessthan or equal to a target-specific size limit. ‘<cmp>’ and ‘<new>’ musthave the same type, and the type of ‘<pointer>’ must be a pointer tothat type. If the cmpxchg
is marked as volatile
, then theoptimizer is not allowed to modify the number or order of execution ofthis cmpxchg
with other volatile operations.
The success and failure ordering arguments specify how thiscmpxchg
synchronizes with other atomic operations. Both ordering parametersmust be at least monotonic
, the failure ordering cannot be eitherrelease
or acq_rel
.
A cmpxchg
instruction can also take an optional“syncscope” argument.
The instruction can take an optional align
attribute.The alignment must be a power of two greater or equal to the size of the<value> type. If unspecified, the alignment is assumed to be equal to thesize of the ‘<value>’ type. Note that this default alignment assumption isdifferent from the alignment used for the load/store instructions when alignisn’t specified.
The pointer passed into cmpxchg must have alignment greater than orequal to the size in memory of the operand.
The contents of memory at the location specified by the ‘<pointer>
’ operandis read and compared to ‘<cmp>
’; if the values are equal, ‘<new>
’ iswritten to the location. The original value at the location is returned,together with a flag indicating success (true) or failure (false).
If the cmpxchg operation is marked as weak
then a spurious failure ispermitted: the operation may not write <new>
even if the comparisonmatched.
If the cmpxchg operation is strong (the default), the i1 value is 1 if and onlyif the value loaded equals cmp
.
A successful cmpxchg
is a read-modify-write instruction for the purpose ofidentifying release sequences. A failed cmpxchg
is equivalent to an atomicload with an ordering parameter determined the second ordering parameter.
entry: %orig = load atomic i32, ptr %ptr unordered, align 4 ; yields i32 br label %looploop: %cmp = phi i32 [ %orig, %entry ], [%value_loaded, %loop] %squared = mul i32 %cmp, %cmp %val_success = cmpxchg ptr %ptr, i32 %cmp, i32 %squared acq_rel monotonic ; yields { i32, i1 } %value_loaded = extractvalue { i32, i1 } %val_success, 0 %success = extractvalue { i32, i1 } %val_success, 1 br i1 %success, label %done, label %loopdone: ...
atomicrmw
’ Instruction¶atomicrmw [volatile] <operation> ptr <pointer>, <ty> <value> [syncscope("<target-scope>")] <ordering>[, align <alignment>] ; yields ty
The ‘atomicrmw
’ instruction is used to atomically modify memory.
There are three arguments to the ‘atomicrmw
’ instruction: anoperation to apply, an address whose value to modify, an argument to theoperation. The operation must be one of the following keywords:
For most of these operations, the type of ‘<value>’ must be an integertype whose bit width is a power of two greater than or equal to eightand less than or equal to a target-specific size limit. For xchg, thismay also be a floating point or a pointer type with the same size constraintsas integers. For fadd/fsub/fmax/fmin, this must be a floating point type. Thetype of the ‘<pointer>
’ operand must be a pointer to that type. Ifthe atomicrmw
is marked as volatile
, then the optimizer is notallowed to modify the number or order of execution of thisatomicrmw
with other volatile operations.
The instruction can take an optional align
attribute.The alignment must be a power of two greater or equal to the size of the<value> type. If unspecified, the alignment is assumed to be equal to thesize of the ‘<value>’ type. Note that this default alignment assumption isdifferent from the alignment used for the load/store instructions when alignisn’t specified.
A atomicrmw
instruction can also take an optional“syncscope” argument.
The contents of memory at the location specified by the ‘<pointer>
’operand are atomically read, modified, and written back. The originalvalue at the location is returned. The modification is specified by theoperation argument:
*ptr = val
*ptr = *ptr + val
*ptr = *ptr - val
*ptr = *ptr & val
*ptr = ~(*ptr & val)
*ptr = *ptr | val
*ptr = *ptr ^ val
*ptr = *ptr > val ? *ptr : val
(using a signed comparison)*ptr = *ptr < val ? *ptr : val
(using a signed comparison)*ptr = *ptr > val ? *ptr : val
(using an unsigned comparison)*ptr = *ptr < val ? *ptr : val
(using an unsigned comparison)*ptr = *ptr + val
(using floating point arithmetic)*ptr = *ptr - val
(using floating point arithmetic)*ptr = maxnum(*ptr, val)
(match the llvm.maxnum.*\` intrinsic)*ptr = minnum(*ptr, val)
(match the llvm.minnum.*\` intrinsic)%old = atomicrmw add ptr %ptr, i32 1 acquire ; yields i32
getelementptr
’ Instruction¶<result> = getelementptr <ty>, ptr <ptrval>{, [inrange] <ty> <idx>}*<result> = getelementptr inbounds <ty>, ptr <ptrval>{, [inrange] <ty> <idx>}*<result> = getelementptr <ty>, <N x ptr> <ptrval>, [inrange] <vector index type> <idx>
The ‘getelementptr
’ instruction is used to get the address of asubelement of an aggregate data structure. It performsaddress calculation only and does not access memory. The instruction can alsobe used to calculate a vector of such addresses.
The first argument is always a type used as the basis for the calculations.The second argument is always a pointer or a vector of pointers, and is thebase address to start from. The remaining arguments are indicesthat indicate which of the elements of the aggregate object are indexed.The interpretation of each index is dependent on the type being indexedinto. The first index always indexes the pointer value given as thesecond argument, the second index indexes a value of the type pointed to(not necessarily the value directly pointed to, since the first indexcan be non-zero), etc. The first type indexed into must be a pointervalue, subsequent types can be arrays, vectors, and structs. Note thatsubsequent types being indexed into can never be pointers, since thatwould require loading the pointer before continuing calculation.
The type of each index argument depends on the type it is indexing into.When indexing into a (optionally packed) structure, only i32
integerconstants are allowed (when using a vector of indices they must allbe the same i32
integer constant). When indexing into an array,pointer or vector, integers of any width are allowed, and they are notrequired to be constant. These integers are treated as signed valueswhere relevant.
For example, let’s consider a C code fragment and how it gets compiledto LLVM:
struct RT { char A; int B[10][20]; char C;};struct ST { int X; double Y; struct RT Z;};int *foo(struct ST *s) { return &s[1].Z.B[5][13];}
The LLVM code generated by Clang is:
%struct.RT = type { i8, [10 x [20 x i32]], i8 }%struct.ST = type { i32, double, %struct.RT }define ptr @foo(ptr %s) nounwind uwtable readnone optsize ssp {entry: %arrayidx = getelementptr inbounds %struct.ST, ptr %s, i64 1, i32 2, i32 1, i64 5, i64 13 ret ptr %arrayidx}
In the example above, the first index is indexing into the‘%struct.ST*
’ type, which is a pointer, yielding a ‘%struct.ST
’= ‘{ i32, double, %struct.RT }
’ type, a structure. The second indexindexes into the third element of the structure, yielding a‘%struct.RT
’ = ‘{ i8 , [10 x [20 x i32]], i8 }
’ type, anotherstructure. The third index indexes into the second element of thestructure, yielding a ‘[10 x [20 x i32]]
’ type, an array. The twodimensions of the array are subscripted into, yielding an ‘i32
’type. The ‘getelementptr
’ instruction returns a pointer to thiselement.
Note that it is perfectly legal to index partially through a structure,returning a pointer to an inner element. Because of this, the LLVM codefor the given testcase is equivalent to:
define ptr @foo(ptr %s) { %t1 = getelementptr %struct.ST, ptr %s, i32 1 %t2 = getelementptr %struct.ST, ptr %t1, i32 0, i32 2 %t3 = getelementptr %struct.RT, ptr %t2, i32 0, i32 1 %t4 = getelementptr [10 x [20 x i32]], ptr %t3, i32 0, i32 5 %t5 = getelementptr [20 x i32], ptr %t4, i32 0, i32 13 ret ptr %t5}
If the inbounds
keyword is present, the result value of thegetelementptr
is a poison value if one of thefollowing rules is violated:
nsw
).nsw
).nuw
).inbounds
keywordapplies to each of the computations element-wise.These rules are based on the assumption that no allocated object may crossthe unsigned address space boundary, and no allocated object may be largerthan half the pointer index type space.
If the inbounds
keyword is not present, the offsets are added to thebase address with silently-wrapping two’s complement arithmetic. If theoffsets have a different width from the pointer, they are sign-extendedor truncated to the width of the pointer. The result value of thegetelementptr
may be outside the object pointed to by the basepointer. The result value may not necessarily be used to access memorythough, even if it happens to point into allocated storage. See thePointer Aliasing Rules section for moreinformation.
If the inrange
keyword is present before any index, loading from orstoring to any pointer derived from the getelementptr
has undefinedbehavior if the load or store would access memory outside of the bounds ofthe element selected by the index marked as inrange
. The result of apointer comparison or ptrtoint
(including ptrtoint
-like operationsinvolving memory) involving a pointer derived from a getelementptr
withthe inrange
keyword is undefined, with the exception of comparisonsin the case where both operands are in the range of the element selectedby the inrange
keyword, inclusive of the address one past the end ofthat element. Note that the inrange
keyword is currently only allowedin constant getelementptr
expressions.
The getelementptr instruction is often confusing. For some more insightinto how it works, see the getelementptr FAQ.
%aptr = getelementptr {i32, [12 x i8]}, ptr %saptr, i64 0, i32 1%vptr = getelementptr {i32, <2 x i8>}, ptr %svptr, i64 0, i32 1, i32 1%eptr = getelementptr [12 x i8], ptr %aptr, i64 0, i32 1%iptr = getelementptr [10 x i32], ptr @arr, i16 0, i16 0
The getelementptr
returns a vector of pointers, instead of a single address,when one or more of its arguments is a vector. In such cases, all vectorarguments should have the same number of elements, and every scalar argumentwill be effectively broadcast into a vector during address calculation.
; All arguments are vectors:; A[i] = ptrs[i] + offsets[i]*sizeof(i8)%A = getelementptr i8, <4 x i8*> %ptrs, <4 x i64> %offsets; Add the same scalar offset to each pointer of a vector:; A[i] = ptrs[i] + offset*sizeof(i8)%A = getelementptr i8, <4 x ptr> %ptrs, i64 %offset; Add distinct offsets to the same pointer:; A[i] = ptr + offsets[i]*sizeof(i8)%A = getelementptr i8, ptr %ptr, <4 x i64> %offsets; In all cases described above the type of the result is <4 x ptr>
The two following instructions are equivalent:
getelementptr %struct.ST, <4 x ptr> %s, <4 x i64> %ind1, <4 x i32> <i32 2, i32 2, i32 2, i32 2>, <4 x i32> <i32 1, i32 1, i32 1, i32 1>, <4 x i32> %ind4, <4 x i64> <i64 13, i64 13, i64 13, i64 13>getelementptr %struct.ST, <4 x ptr> %s, <4 x i64> %ind1, i32 2, i32 1, <4 x i32> %ind4, i64 13
Let’s look at the C code, where the vector version of getelementptr
makes sense:
// Let's assume that we vectorize the following loop:double *A, *B; int *C;for (int i = 0; i < size; ++i) { A[i] = B[C[i]];}
; get pointers for 8 elements from array B%ptrs = getelementptr double, ptr %B, <8 x i32> %C; load 8 elements from array B into A%A = call <8 x double> @llvm.masked.gather.v8f64.v8p0f64(<8 x ptr> %ptrs, i32 8, <8 x i1> %mask, <8 x double> %passthru)
trunc .. to
’ Instruction¶<result> = trunc <ty> <value> to <ty2> ; yields ty2
The ‘trunc
’ instruction truncates its operand to the type ty2
.
The ‘trunc
’ instruction takes a value to trunc, and a type to truncit to. Both types must be of integer types, or vectorsof the same number of integers. The bit size of the value
must belarger than the bit size of the destination type, ty2
. Equal sizedtypes are not allowed.
The ‘trunc
’ instruction truncates the high order bits in value
and converts the remaining bits to ty2
. Since the source size mustbe larger than the destination size, trunc
cannot be a no-op cast.It will always truncate bits.
%X = trunc i32 257 to i8 ; yields i8:1%Y = trunc i32 123 to i1 ; yields i1:true%Z = trunc i32 122 to i1 ; yields i1:false%W = trunc <2 x i16> <i16 8, i16 7> to <2 x i8> ; yields <i8 8, i8 7>
zext .. to
’ Instruction¶<result> = zext <ty> <value> to <ty2> ; yields ty2
The ‘zext
’ instruction zero extends its operand to type ty2
.
The ‘zext
’ instruction takes a value to cast, and a type to cast itto. Both types must be of integer types, or vectors ofthe same number of integers. The bit size of the value
must besmaller than the bit size of the destination type, ty2
.
The zext
fills the high order bits of the value
with zero bitsuntil it reaches the size of the destination type, ty2
.
When zero extending from i1, the result will always be either 0 or 1.
%X = zext i32 257 to i64 ; yields i64:257%Y = zext i1 true to i32 ; yields i32:1%Z = zext <2 x i16> <i16 8, i16 7> to <2 x i32> ; yields <i32 8, i32 7>
sext .. to
’ Instruction¶<result> = sext <ty> <value> to <ty2> ; yields ty2
The ‘sext
’ sign extends value
to the type ty2
.
The ‘sext
’ instruction takes a value to cast, and a type to cast itto. Both types must be of integer types, or vectors ofthe same number of integers. The bit size of the value
must besmaller than the bit size of the destination type, ty2
.
The ‘sext
’ instruction performs a sign extension by copying the signbit (highest order bit) of the value
until it reaches the bit sizeof the type ty2
.
When sign extending from i1, the extension always results in -1 or 0.
%X = sext i8 -1 to i16 ; yields i16 :65535%Y = sext i1 true to i32 ; yields i32:-1%Z = sext <2 x i16> <i16 8, i16 7> to <2 x i32> ; yields <i32 8, i32 7>
fptrunc .. to
’ Instruction¶<result> = fptrunc <ty> <value> to <ty2> ; yields ty2
The ‘fptrunc
’ instruction truncates value
to type ty2
.
The ‘fptrunc
’ instruction takes a floating-pointvalue to cast and a floating-point type to cast it to.The size of value
must be larger than the size of ty2
. Thisimplies that fptrunc
cannot be used to make a no-op cast.
The ‘fptrunc
’ instruction casts a value
from a largerfloating-point type to a smaller floating-point type.This instruction is assumed to execute in the default floating-pointenvironment.
%X = fptrunc double 16777217.0 to float ; yields float:16777216.0%Y = fptrunc double 1.0E+300 to half ; yields half:+infinity
fpext .. to
’ Instruction¶<result> = fpext <ty> <value> to <ty2> ; yields ty2
The ‘fpext
’ extends a floating-point value
to a larger floating-pointvalue.
The ‘fpext
’ instruction takes a floating-pointvalue
to cast, and a floating-point type to cast itto. The source type must be smaller than the destination type.
The ‘fpext
’ instruction extends the value
from a smallerfloating-point type to a larger floating-point type. The fpext
cannot be used to make ano-op cast because it always changes bits. Use bitcast
to make ano-op cast for a floating-point cast.
%X = fpext float 3.125 to double ; yields double:3.125000e+00%Y = fpext double %X to fp128 ; yields fp128:0xL00000000000000004000900000000000
fptoui .. to
’ Instruction¶<result> = fptoui <ty> <value> to <ty2> ; yields ty2
The ‘fptoui
’ converts a floating-point value
to its unsignedinteger equivalent of type ty2
.
The ‘fptoui
’ instruction takes a value to cast, which must be ascalar or vector floating-point value, and a type tocast it to ty2
, which must be an integer type. Ifty
is a vector floating-point type, ty2
must be a vector integertype with the same number of elements as ty
The ‘fptoui
’ instruction converts its floating-point operand into the nearest (rounding towards zero)unsigned integer value. If the value cannot fit in ty2
, the resultis a poison value.
%X = fptoui double 123.0 to i32 ; yields i32:123%Y = fptoui float 1.0E+300 to i1 ; yields undefined:1%Z = fptoui float 1.04E+17 to i8 ; yields undefined:1
fptosi .. to
’ Instruction¶<result> = fptosi <ty> <value> to <ty2> ; yields ty2
The ‘fptosi
’ instruction converts floating-pointvalue
to type ty2
.
The ‘fptosi
’ instruction takes a value to cast, which must be ascalar or vector floating-point value, and a type tocast it to ty2
, which must be an integer type. Ifty
is a vector floating-point type, ty2
must be a vector integertype with the same number of elements as ty
The ‘fptosi
’ instruction converts its floating-point operand into the nearest (rounding towards zero)signed integer value. If the value cannot fit in ty2
, the resultis a poison value.
%X = fptosi double -123.0 to i32 ; yields i32:-123%Y = fptosi float 1.0E-247 to i1 ; yields undefined:1%Z = fptosi float 1.04E+17 to i8 ; yields undefined:1
uitofp .. to
’ Instruction¶<result> = uitofp <ty> <value> to <ty2> ; yields ty2
The ‘uitofp
’ instruction regards value
as an unsigned integerand converts that value to the ty2
type.
The ‘uitofp
’ instruction takes a value to cast, which must be ascalar or vector integer value, and a type to cast it toty2
, which must be an floating-point type. Ifty
is a vector integer type, ty2
must be a vector floating-pointtype with the same number of elements as ty
The ‘uitofp
’ instruction interprets its operand as an unsignedinteger quantity and converts it to the corresponding floating-pointvalue. If the value cannot be exactly represented, it is rounded usingthe default rounding mode.
%X = uitofp i32 257 to float ; yields float:257.0%Y = uitofp i8 -1 to double ; yields double:255.0
sitofp .. to
’ Instruction¶<result> = sitofp <ty> <value> to <ty2> ; yields ty2
The ‘sitofp
’ instruction regards value
as a signed integer andconverts that value to the ty2
type.
The ‘sitofp
’ instruction takes a value to cast, which must be ascalar or vector integer value, and a type to cast it toty2
, which must be an floating-point type. Ifty
is a vector integer type, ty2
must be a vector floating-pointtype with the same number of elements as ty
The ‘sitofp
’ instruction interprets its operand as a signed integerquantity and converts it to the corresponding floating-point value. If thevalue cannot be exactly represented, it is rounded using the default roundingmode.
%X = sitofp i32 257 to float ; yields float:257.0%Y = sitofp i8 -1 to double ; yields double:-1.0
ptrtoint .. to
’ Instruction¶<result> = ptrtoint <ty> <value> to <ty2> ; yields ty2
The ‘ptrtoint
’ instruction converts the pointer or a vector ofpointers value
to the integer (or vector of integers) type ty2
.
The ‘ptrtoint
’ instruction takes a value
to cast, which must bea value of type pointer or a vector of pointers, and atype to cast it to ty2
, which must be an integer ora vector of integers type.
The ‘ptrtoint
’ instruction converts value
to integer typety2
by interpreting the pointer value as an integer and eithertruncating or zero extending that value to the size of the integer type.If value
is smaller than ty2
then a zero extension is done. Ifvalue
is larger than ty2
then a truncation is done. If they arethe same size, then nothing is done (no-op cast) other than a typechange.
%X = ptrtoint ptr %P to i8 ; yields truncation on 32-bit architecture%Y = ptrtoint ptr %P to i64 ; yields zero extension on 32-bit architecture%Z = ptrtoint <4 x ptr> %P to <4 x i64>; yields vector zero extension for a vector of addresses on 32-bit architecture
inttoptr .. to
’ Instruction¶<result> = inttoptr <ty> <value> to <ty2>[, !dereferenceable !<deref_bytes_node>][, !dereferenceable_or_null !<deref_bytes_node>] ; yields ty2
The ‘inttoptr
’ instruction converts an integer value
to apointer type, ty2
.
The ‘inttoptr
’ instruction takes an integer value tocast, and a type to cast it to, which must be a pointertype.
The optional !dereferenceable
metadata must reference a single metadataname <deref_bytes_node>
corresponding to a metadata node with one i64
entry.See dereferenceable
metadata.
The optional !dereferenceable_or_null
metadata must reference a singlemetadata name <deref_bytes_node>
corresponding to a metadata node with onei64
entry.See dereferenceable_or_null
metadata.
The ‘inttoptr
’ instruction converts value
to type ty2
byapplying either a zero extension or a truncation depending on the sizeof the integer value
. If value
is larger than the size of apointer then a truncation is done. If value
is smaller than the sizeof a pointer then a zero extension is done. If they are the same size,nothing is done (no-op cast).
%X = inttoptr i32 255 to ptr ; yields zero extension on 64-bit architecture%Y = inttoptr i32 255 to ptr ; yields no-op on 32-bit architecture%Z = inttoptr i64 0 to ptr ; yields truncation on 32-bit architecture%Z = inttoptr <4 x i32> %G to <4 x ptr>; yields truncation of vector G to four pointers
bitcast .. to
’ Instruction¶<result> = bitcast <ty> <value> to <ty2> ; yields ty2
The ‘bitcast
’ instruction converts value
to type ty2
withoutchanging any bits.
The ‘bitcast
’ instruction takes a value to cast, which must be anon-aggregate first class value, and a type to cast it to, which mustalso be a non-aggregate first class type. Thebit sizes of value
and the destination type, ty2
, must beidentical. If the source type is a pointer, the destination type mustalso be a pointer of the same size. This instruction supports bitwiseconversion of vectors to integers and to vectors of other types (aslong as they have the same size).
The ‘bitcast
’ instruction converts value
to type ty2
. Itis always a no-op cast because no bits change with thisconversion. The conversion is done as if the value
had been storedto memory and read back as type ty2
. Pointer (or vector ofpointers) types may only be converted to other pointer (or vector ofpointers) types with the same address space through this instruction.To convert pointers to other types, use the inttoptror ptrtoint instructions first.
There is a caveat for bitcasts involving vector types in relation toendianess. For example bitcast <2 x i8> <value> to i16
puts element zeroof the vector in the least significant bits of the i16 for little-endian whileelement zero ends up in the most significant bits for big-endian.
%X = bitcast i8 255 to i8 ; yields i8 :-1%Y = bitcast i32* %x to i16* ; yields i16*:%x%Z = bitcast <2 x i32> %V to i64; ; yields i64: %V (depends on endianess)%Z = bitcast <2 x i32*> %V to <2 x i64*> ; yields <2 x i64*>
addrspacecast .. to
’ Instruction¶<result> = addrspacecast <pty> <ptrval> to <pty2> ; yields pty2
The ‘addrspacecast
’ instruction converts ptrval
from pty
inaddress space n
to type pty2
in address space m
.
The ‘addrspacecast
’ instruction takes a pointer or vector of pointer valueto cast and a pointer type to cast it to, which must have a differentaddress space.
The ‘addrspacecast
’ instruction converts the pointer valueptrval
to type pty2
. It can be a no-op cast or a complexvalue modification, depending on the target and the address spacepair. Pointer conversions within the same address space must beperformed with the bitcast
instruction. Note that if the addressspace conversion produces a dereferenceable result then both resultand operand refer to the same memory location. The conversion musthave no side effects, and must not capture the value of the pointer.
If the source is poison, the result ispoison.
If the source is not poison, and both source anddestination are integral pointers, and theresult pointer is dereferenceable, the cast is assumed to bereversible (i.e. casting the result back to the original address spaceshould yield the original bit pattern).
%X = addrspacecast ptr %x to ptr addrspace(1)%Y = addrspacecast ptr addrspace(1) %y to ptr addrspace(2)%Z = addrspacecast <4 x ptr> %z to <4 x ptr addrspace(3)>
icmp
’ Instruction¶<result> = icmp <cond> <ty> <op1>, <op2> ; yields i1 or <N x i1>:result
The ‘icmp
’ instruction returns a boolean value or a vector ofboolean values based on comparison of its two integer, integer vector,pointer, or pointer vector operands.
The ‘icmp
’ instruction takes three operands. The first operand isthe condition code indicating the kind of comparison to perform. It isnot a value, just a keyword. The possible condition codes are:
eq
: equalne
: not equalugt
: unsigned greater thanuge
: unsigned greater or equalult
: unsigned less thanule
: unsigned less or equalsgt
: signed greater thansge
: signed greater or equalslt
: signed less thansle
: signed less or equalThe remaining two arguments must be integer orpointer or integer vector typed. Theymust also be identical types.
The ‘icmp
’ compares op1
and op2
according to the conditioncode given as cond
. The comparison performed always yields either ani1 or vector of i1
result, as follows:
eq
: yields true
if the operands are equal, false
otherwise. No sign interpretation is necessary or performed.ne
: yields true
if the operands are unequal, false
otherwise. No sign interpretation is necessary or performed.ugt
: interprets the operands as unsigned values and yieldstrue
if op1
is greater than op2
.uge
: interprets the operands as unsigned values and yieldstrue
if op1
is greater than or equal to op2
.ult
: interprets the operands as unsigned values and yieldstrue
if op1
is less than op2
.ule
: interprets the operands as unsigned values and yieldstrue
if op1
is less than or equal to op2
.sgt
: interprets the operands as signed values and yields true
if op1
is greater than op2
.sge
: interprets the operands as signed values and yields true
if op1
is greater than or equal to op2
.slt
: interprets the operands as signed values and yields true
if op1
is less than op2
.sle
: interprets the operands as signed values and yields true
if op1
is less than or equal to op2
.If the operands are pointer typed, the pointer valuesare compared as if they were integers.
If the operands are integer vectors, then they are compared element byelement. The result is an i1
vector with the same number of elementsas the values being compared. Otherwise, the result is an i1
.
<result> = icmp eq i32 4, 5 ; yields: result=false<result> = icmp ne ptr %X, %X ; yields: result=false<result> = icmp ult i16 4, 5 ; yields: result=true<result> = icmp sgt i16 4, 5 ; yields: result=false<result> = icmp ule i16 -4, 5 ; yields: result=false<result> = icmp sge i16 4, 5 ; yields: result=false
fcmp
’ Instruction¶<result> = fcmp [fast-math flags]* <cond> <ty> <op1>, <op2> ; yields i1 or <N x i1>:result
The ‘fcmp
’ instruction returns a boolean value or vector of booleanvalues based on comparison of its operands.
If the operands are floating-point scalars, then the result type is aboolean (i1).
If the operands are floating-point vectors, then the result type is avector of boolean with the same number of elements as the operands beingcompared.
The ‘fcmp
’ instruction takes three operands. The first operand isthe condition code indicating the kind of comparison to perform. It isnot a value, just a keyword. The possible condition codes are:
false
: no comparison, always returns falseoeq
: ordered and equalogt
: ordered and greater thanoge
: ordered and greater than or equalolt
: ordered and less thanole
: ordered and less than or equalone
: ordered and not equalord
: ordered (no nans)ueq
: unordered or equalugt
: unordered or greater thanuge
: unordered or greater than or equalult
: unordered or less thanule
: unordered or less than or equalune
: unordered or not equaluno
: unordered (either nans)true
: no comparison, always returns trueOrdered means that neither operand is a QNAN while unordered meansthat either operand may be a QNAN.
Each of val1
and val2
arguments must be either a floating-point type or a vector of floating-point type.They must have identical types.
The ‘fcmp
’ instruction compares op1
and op2
according to thecondition code given as cond
. If the operands are vectors, then thevectors are compared element by element. Each comparison performedalways yields an i1 result, as follows:
false
: always yields false
, regardless of operands.oeq
: yields true
if both operands are not a QNAN and op1
is equal to op2
.ogt
: yields true
if both operands are not a QNAN and op1
is greater than op2
.oge
: yields true
if both operands are not a QNAN and op1
is greater than or equal to op2
.olt
: yields true
if both operands are not a QNAN and op1
is less than op2
.ole
: yields true
if both operands are not a QNAN and op1
is less than or equal to op2
.one
: yields true
if both operands are not a QNAN and op1
is not equal to op2
.ord
: yields true
if both operands are not a QNAN.ueq
: yields true
if either operand is a QNAN or op1
isequal to op2
.ugt
: yields true
if either operand is a QNAN or op1
isgreater than op2
.uge
: yields true
if either operand is a QNAN or op1
isgreater than or equal to op2
.ult
: yields true
if either operand is a QNAN or op1
isless than op2
.ule
: yields true
if either operand is a QNAN or op1
isless than or equal to op2
.une
: yields true
if either operand is a QNAN or op1
isnot equal to op2
.uno
: yields true
if either operand is a QNAN.true
: always yields true
, regardless of operands.The fcmp
instruction can also optionally take any number offast-math flags, which are optimization hints to enableotherwise unsafe floating-point optimizations.
Any set of fast-math flags are legal on an fcmp
instruction, but theonly flags that have any effect on its semantics are those that allowassumptions to be made about the values of input arguments; namelynnan
, ninf
, and reassoc
. See Fast-Math Flags for more information.
<result> = fcmp oeq float 4.0, 5.0 ; yields: result=false<result> = fcmp one float 4.0, 5.0 ; yields: result=true<result> = fcmp olt float 4.0, 5.0 ; yields: result=true<result> = fcmp ueq double 1.0, 2.0 ; yields: result=false
phi
’ Instruction¶<result> = phi [fast-math-flags] <ty> [ <val0>, <label0>], ...
The ‘phi
’ instruction is used to implement the φ node in the SSAgraph representing the function.
The type of the incoming values is specified with the first type field.After this, the ‘phi
’ instruction takes a list of pairs asarguments, with one pair for each predecessor basic block of the currentblock. Only values of first class type may be used asthe value arguments to the PHI node. Only labels may be used as thelabel arguments.
There must be no non-phi instructions between the start of a basic blockand the PHI instructions: i.e. PHI instructions must be first in a basicblock.
For the purposes of the SSA form, the use of each incoming value isdeemed to occur on the edge from the corresponding predecessor block tothe current block (but after any definition of an ‘invoke
’instruction’s return value on the same edge).
The optional fast-math-flags
marker indicates that the phi has oneor more fast-math-flags. These are optimization hintsto enable otherwise unsafe floating-point optimizations. Fast-math-flagsare only valid for phis that return a floating-point scalar or vectortype, or an array (nested to any depth) of floating-point scalar or vectortypes.
At runtime, the ‘phi
’ instruction logically takes on the valuespecified by the pair corresponding to the predecessor basic block thatexecuted just prior to the current block.
Loop: ; Infinite loop that counts from 0 on up... %indvar = phi i32 [ 0, %LoopHeader ], [ %nextindvar, %Loop ] %nextindvar = add i32 %indvar, 1 br label %Loop
select
’ Instruction¶<result> = select [fast-math flags] selty <cond>, <ty> <val1>, <ty> <val2> ; yields tyselty is either i1 or {<N x i1>}
The ‘select
’ instruction is used to choose one value based on acondition, without IR-level branching.
The ‘select
’ instruction requires an ‘i1’ value or a vector of ‘i1’values indicating the condition, and two values of the same firstclass type.
fast-math flags
marker indicates that the select has one or morefast-math flags. These are optimization hints to enableotherwise unsafe floating-point optimizations. Fast-math flags are only validfor selects that return a floating-point scalar or vector type, or an array(nested to any depth) of floating-point scalar or vector types.If the condition is an i1 and it evaluates to 1, the instruction returnsthe first value argument; otherwise, it returns the second valueargument.
If the condition is a vector of i1, then the value arguments must bevectors of the same size, and the selection is done element by element.
If the condition is an i1 and the value arguments are vectors of thesame size, then an entire vector is selected.
%X = select i1 true, i8 17, i8 42 ; yields i8:17
freeze
’ Instruction¶<result> = freeze ty <val> ; yields ty:result
The ‘freeze
’ instruction takes a single argument.
If the argument is undef
or poison
, ‘freeze
’ returns anarbitrary, but fixed, value of type ‘ty
’.Otherwise, this instruction is a no-op and returns the input argument.All uses of a value returned by the same ‘freeze
’ instruction areguaranteed to always observe the same value, while different ‘freeze
’instructions may yield different values.
While undef
and poison
pointers can be frozen, the result is anon-dereferenceable pointer. See thePointer Aliasing Rules section for more information.If an aggregate value or vector is frozen, the operand is frozen element-wise.The padding of an aggregate isn’t considered, since it isn’t visiblewithout storing it into memory and loading it with a different type.
%w = i32 undef%x = freeze i32 %w%y = add i32 %w, %w ; undef%z = add i32 %x, %x ; even number because all uses of %x observe ; the same value%x2 = freeze i32 %w%cmp = icmp eq i32 %x, %x2 ; can be true or false; example with vectors%v = <2 x i32> <i32 undef, i32 poison>%a = extractelement <2 x i32> %v, i32 0 ; undef%b = extractelement <2 x i32> %v, i32 1 ; poison%add = add i32 %a, %a ; undef%v.fr = freeze <2 x i32> %v ; element-wise freeze%d = extractelement <2 x i32> %v.fr, i32 0 ; not undef%add.f = add i32 %d, %d ; even number; branching on frozen value%poison = add nsw i1 %k, undef ; poison%c = freeze i1 %poisonbr i1 %c, label %foo, label %bar ; non-deterministic branch to %foo or %bar
call
’ Instruction¶<result> = [tail | musttail | notail ] call [fast-math flags] [cconv] [ret attrs] [addrspace(<num>)] <ty>|<fnty> <fnptrval>(<function args>) [fn attrs] [ operand bundles ]
The ‘call
’ instruction represents a simple function call.
This instruction requires several arguments:
The optional tail
and musttail
markers indicate that the optimizersshould perform tail call optimization. The tail
marker is a hint thatcan be ignored. The musttail
markermeans that the call must be tail call optimized in order for the program tobe correct. This is true even in the presence of attributes like“disable-tail-calls”. The musttail
marker provides these guarantees:
"thunk"
attributeand the caller and callee both have varargs, than any unprototypedarguments in register or memory are forwarded to the callee. Similarly,the return value of the callee is returned to the caller’s caller, evenif a void return type is in use.Both markers imply that the callee does not access allocas from the caller.The tail
marker additionally implies that the callee does not accessvarargs from the caller. Calls marked musttail
must obey the followingadditional rules:
In addition, if the calling convention is not swifttailcc or tailcc:
- All ABI-impacting function attributes, such as sret, byval, inreg,returned, and inalloca, must match.
- The caller and callee prototypes must match. Pointer types of parametersor return types may differ in pointee type, but not in address space.
On the other hand, if the calling convention is swifttailcc or swiftcc:
- Only these ABI-impacting attributes attributes are allowed: sret, byval,swiftself, and swiftasync.
- Prototypes are not required to match.
Tail call optimization for calls marked
tail
is guaranteed to occur ifthe following conditions are met:
- Caller and callee both have the calling convention
fastcc
ortailcc
.- The call is in tail position (ret immediately follows call and retuses value of call or is void).
- Option
-tailcallopt
is enabled,llvm::GuaranteedTailCallOpt
istrue
, or the calling conventionistailcc
- Platform-specific constraints aremet.
notail
marker indicates that the optimizers should not addtail
or musttail
markers to the call. It is used to prevent tailcall optimization from being performed on the call.fast-math flags
marker indicates that the call has one or morefast-math flags, which are optimization hints to enableotherwise unsafe floating-point optimizations. Fast-math flags are only validfor calls that return a floating-point scalar or vector type, or an array(nested to any depth) of floating-point scalar or vector types.zeroext
’, ‘signext
’, and ‘inreg
’ attributesare valid here.ty
’: the type of the call instruction itself which is also thetype of the return value. Functions that return no value are markedvoid
.fnty
’: shall be the signature of the function being called. Theargument types must match the types implied by this signature. Thistype can be omitted if the function is not varargs.fnptrval
’: An LLVM value containing a pointer to a function tobe called. In most cases, this is a direct function call, butindirect call
’s are just as possible, calling an arbitrary pointerto function value.function args
’: argument list whose types match the functionsignature argument types and parameter attributes. All arguments mustbe of first class type. If the function signatureindicates the function accepts a variable number of arguments, theextra arguments can be specified.The ‘call
’ instruction is used to cause control flow to transfer toa specified function, with its incoming arguments bound to the specifiedvalues. Upon a ‘ret
’ instruction in the called function, controlflow continues with the instruction after the function call, and thereturn value of the function is bound to the result argument.
%retval = call i32 @test(i32 %argc)call i32 (ptr, ...) @printf(ptr %msg, i32 12, i8 42) ; yields i32%X = tail call i32 @foo() ; yields i32%Y = tail call fastcc i32 @foo() ; yields i32call void %foo(i8 signext 97)%struct.A = type { i32, i8 }%r = call %struct.A @foo() ; yields { i32, i8 }%gr = extractvalue %struct.A %r, 0 ; yields i32%gr1 = extractvalue %struct.A %r, 1 ; yields i8%Z = call void @foo() noreturn ; indicates that %foo never returns normally%ZZ = call zeroext i32 @bar() ; Return value is %zero extended
llvm treats calls to some functions with names and arguments that matchthe standard C99 library as being the C99 library functions, and mayperform optimizations or generate code for them under that assumption.This is something we’d like to change in the future to provide bettersupport for freestanding environments and non-C-based languages.
va_arg
’ Instruction¶<resultval> = va_arg <va_list*> <arglist>, <argty>
The ‘va_arg
’ instruction is used to access arguments passed throughthe “variable argument” area of a function call. It is used to implementthe va_arg
macro in C.
This instruction takes a va_list*
value and the type of theargument. It returns a value of the specified argument type andincrements the va_list
to point to the next argument. The actualtype of va_list
is target specific.
The ‘va_arg
’ instruction loads an argument of the specified typefrom the specified va_list
and causes the va_list
to point tothe next argument. For more information, see the variable argumenthandling Intrinsic Functions.
It is legal for this instruction to be called in a function which doesnot take a variable number of arguments, for example, the vfprintf
function.
va_arg
is an LLVM instruction instead of an intrinsicfunction because it takes a type as an argument.
See the variable argument processing section.
Note that the code generator does not yet fully support va_arg on manytargets. Also, it does not currently support va_arg with aggregatetypes on any target.
landingpad
’ Instruction¶<resultval> = landingpad <resultty> <clause>+<resultval> = landingpad <resultty> cleanup <clause>*<clause> := catch <type> <value><clause> := filter <array constant type> <array constant>
The ‘landingpad
’ instruction is used by LLVM’s exception handlingsystem to specify that a basic blockis a landing pad — one where the exception lands, and corresponds to thecode found in the catch
portion of a try
/catch
sequence. Itdefines values supplied by the personality function uponre-entry to the function. The resultval
has the type resultty
.
The optionalcleanup
flag indicates that the landing pad block is a cleanup.
A clause
begins with the clause type — catch
or filter
— andcontains the global variable representing the “type” that may be caughtor filtered respectively. Unlike the catch
clause, the filter
clause takes an array constant as its argument. Use“[0 x ptr] undef
” for a filter which cannot throw. The‘landingpad
’ instruction must contain at least one clause
orthe cleanup
flag.
The ‘landingpad
’ instruction defines the values which are set by thepersonality function upon re-entry to the function, andtherefore the “result type” of the landingpad
instruction. As withcalling conventions, how the personality function results arerepresented in LLVM IR is target specific.
The clauses are applied in order from top to bottom. If twolandingpad
instructions are merged together through inlining, theclauses from the calling function are appended to the list of clauses.When the call stack is being unwound due to an exception being thrown,the exception is compared against each clause
in turn. If it doesn’tmatch any of the clauses, and the cleanup
flag is not set, thenunwinding continues further up the call stack.
The landingpad
instruction has several restrictions:
invoke
’ instruction.landingpad
’ instruction as itsfirst non-PHI instruction.landingpad
’ instruction within the landingpad block.landingpad
’ instruction.;; A landing pad which can catch an integer.%res = landingpad { ptr, i32 } catch ptr @_ZTIi;; A landing pad that is a cleanup.%res = landingpad { ptr, i32 } cleanup;; A landing pad which can catch an integer and can only throw a double.%res = landingpad { ptr, i32 } catch ptr @_ZTIi filter [1 x ptr] [ptr @_ZTId]
catchpad
’ Instruction¶<resultval> = catchpad within <catchswitch> [<args>*]
The ‘catchpad
’ instruction is used by LLVM’s exception handlingsystem to specify that a basic blockbegins a catch handler — one where a personality routine attempts to transfercontrol to catch an exception.
The catchswitch
operand must always be a token produced by acatchswitch instruction in a predecessor block. Thisensures that each catchpad
has exactly one predecessor block, and it alwaysterminates in a catchswitch
.
The args
correspond to whatever information the personality routinerequires to know if this is an appropriate handler for the exception. Controlwill transfer to the catchpad
if this is the first appropriate handler forthe exception.
The resultval
has the type token and is used to match thecatchpad
to corresponding catchrets and other nested EHpads.
When the call stack is being unwound due to an exception being thrown, theexception is compared against the args
. If it doesn’t match, control willnot reach the catchpad
instruction. The representation of args
isentirely target and personality function-specific.
Like the landingpad instruction, the catchpad
instruction must be the first non-phi of its parent basic block.
The meaning of the tokens produced and consumed by catchpad
and other “pad”instructions is described in theWindows exception handling documentation.
When a catchpad
has been “entered” but not yet “exited” (asdescribed in the EH documentation),it is undefined behavior to execute a call or invokethat does not carry an appropriate “funclet” bundle.
dispatch: %cs = catchswitch within none [label %handler0] unwind to caller ;; A catch block which can catch an integer.handler0: %tok = catchpad within %cs [ptr @_ZTIi]
cleanuppad
’ Instruction¶<resultval> = cleanuppad within <parent> [<args>*]
The ‘cleanuppad
’ instruction is used by LLVM’s exception handlingsystem to specify that a basic blockis a cleanup block — one where a personality routine attempts totransfer control to run cleanup actions.The args
correspond to whatever additionalinformation the personality function requires toexecute the cleanup.The resultval
has the type token and is used tomatch the cleanuppad
to corresponding cleanuprets.The parent
argument is the token of the funclet that contains thecleanuppad
instruction. If the cleanuppad
is not inside a funclet,this operand may be the token none
.
The instruction takes a list of arbitrary values which are interpretedby the personality function.
When the call stack is being unwound due to an exception being thrown,the personality function transfers control to thecleanuppad
with the aid of the personality-specific arguments.As with calling conventions, how the personality function results arerepresented in LLVM IR is target specific.
The cleanuppad
instruction has several restrictions:
cleanuppad
’ instruction as itsfirst non-PHI instruction.cleanuppad
’ instruction within thecleanup block.cleanuppad
’ instruction.When a cleanuppad
has been “entered” but not yet “exited” (asdescribed in the EH documentation),it is undefined behavior to execute a call or invokethat does not carry an appropriate “funclet” bundle.
%tok = cleanuppad within %cs []