帮帮忙 SUBP: MOV R1,A 中 SUBPapec是什么意思思啊

点击联系发帖人 时间：2014-10-26 06:07

mov是什么格式

PDP-11 target for lcc
A PDP-11 target for lcc
I have made this commentary as a straightforward case study
which I hope might assist anyone else who is trying to retarget lcc.
I also hope that experts are willing to enlarge on its deficiences
and thereby improve my understanding of lcc (and even the PDP-11 :)
Please be free with constructive criticism.
Also, if anyone has suggestions to better explain the process of
retargeting, please let me know. Perhaps this can be the basis
for a tutorial on that subject.
I believe there already exists (somewhere) a PDP-11 back-end,
which I have not seen. Knowing this didn't deter me from going through this
self-educational exercise, however. It was also an opportunity to complete
my knowledge of .
If the other implementation surfaces, it may provide an interesting comparison.
I found producing the back-end challenging and at times very
frustrating. This is not to say lcc is at fault -
it's extremely well put together, but despite the many assertions
already in lcc, a machine description can be difficult to debug.
To forestall inevitable criticisms of the generated code:
I am aware of very many imperfections, some no doubt
due to bugs (or omissions) in the machine description, but others
are certainly due to the broad approach taken by lcc. Targeting
smaller architectures also rubs against some of its design assumptions.
Code sample
The following small function exploits several PDP-11-specific idioms.
unsigned countbits(unsigned v){
register unsigned t=v,n;
for( n=0 ; t>>=1 )
/* for better methods, see
http://www.caam.rice.edu/~dougm/twiddle/BitCount.html */
countbits:
mov fp,-(sp) ; save frame pointer
mov r2,-(sp) ; save
mov r3,-(sp) ; save
mov 4(fp),r3 ; reg <- opd
mov r2,r0 ; LOADU2
mov (sp)+,r3 ; restore
mov (sp)+,r2 ; restore
mov (sp)+, restore frame pointer
(For assembly listing, see .)
However, within these parameters, I think the generated code
is not too bad (I would like to compare in detail to other PDP compilers
Where there are exceptions I've tried to explain how they arise.
Note that the instruction templates below include many assembler comments
which I've found useful during debugging. Since this is a beta release,
it would be premature to remove them.
Limitations
For instance, I believe it is difficult to make use
of the PDP's auto-increment and auto-decrement modes, because lcc
must represent the increment and dereference as separate trees,
precluding templates. In any case, these PDP addressing modes
apply only to operations on pointers. (A peephole pass after tree
covering might be able to retrofit increments in obvious situations.
Certainly a peephole pass would remove some redundant moves
that lcc wants to emit.)
lcc's disdain for register starved architectures has so far
proved a showstopper to the implementation of long arithmetic.
Since I have decided to use a permanent frame pointer (R5),
five registers 0-4 remain. For char and short operations,
this can mean two variables and three temporaries. However,
for long operations, this allows only two register temporaries
(it would be foolish to try to allocate long variables
to registers) - and lcc simply hangs under such tight constraints.
It's not possible to write working templates for some byte operations,
such as INCB/DECB; lcc's automatic insertion of LOAD operators
breaks the match [e.g. ASGNI1(mem,ADDI2(CVII2(opd),con1))&]
Casting a constant to void* causes a fatal error, perhaps related to missing long support.
sizeof causes a fatal error, related to missing long support.
The machine description
The machine description, pdp11.md is input to lburg.
Like conventional parser descriptions it begins
with a C passthrough section bracketed by %{ and }%.
This is standard boilerplate. Let there be no confusion about the license:
This file is part of lcc-pdp11.md, an lcc machine description for PDP-11.
Copyright (C) 2004 Toby Thain, .au
This prog you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software F either version 2 of the License, or
(at your option) any later version.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
GNU General Public License for more details.
You should have received a copy of the GNU General Public License
alo if not, write to the Free Software
Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA
0.1b1, 16 Nov 2004
0.2b1, 16 Nov 2004 - fix costs on CNST, EQ(BAND(),0)
0.3b1, 16 Nov 2004 - fix local(), detect framesize==2 in function()
0.4b1, 16 Nov 2004 - added rules for the BCOMU2(ADDU2(x,1)), lcc's "negate unsigned";
put nonzero costs on INC/DEC rules
0.5b2, 18 Nov 2004 - shift-by-8 optimisations
0.6b1, 20 Nov 2004 - fix ASH generation bugs, register saving etc
0.6b2, 20 Nov 2004 - strip out some "long" rules
5 Dec 2004 - delete costs for base address rules.
also fix statics in function scope.
5 Dec 2004 - fix JMP/BR rules
5 Dec 2004 - improve handling of pointer constants (addresses of globals)
5 Dec 2004 - put all CALL variants in rules (don't use emit2)
#include &stdlib.h&
#include "c.h"
Next some constants that define the planned register layout.
The PDP-11 has eight general purpose registers. R7 and R6 are the
program counter (PC) and system stack pointer (SP) respectively,
which leaves us with six, R0-R5. I've decided to use R5 as a frame
pointer, leaving five registers for variables and temporaries.
#define INTTMP 0x13
/* temps R4,R0/1 - three short or one long */
#define INTVAR (0x1f ^ INTTMP)
/* rvars R2/3 - two short or one long */
#define FLTTMP 0x31
/* float temps */
#define FLTVAR (0x3f ^ FLTTMP)
/* the rest float vars */
The following macros and function declarations are standard boilerplate.
#define NODEPTR_TYPE Node
#define OP_LABEL(p) ((p)->op)
#define LEFT_CHILD(p) ((p)->kids[0])
#define RIGHT_CHILD(p) ((p)->kids[1])
#define STATE_LABEL(p) ((p)->x.state)
static void address(Symbol, Symbol, long);
... some prototypes omitted ...
static void target(Node);
Declarations of "cost functions" used by the templates in this file:
static int memop(Node);
static int cnstrhs(Node);
static int sametree(Node,Node);
Now variables representing the machine registers. Although there
are fewer than 32 registers, I superstitiously leave these arrays
with 32 lcc uses many 32-bit vectors to describe
register sets. The divd register is explained below in
progbeg(). The "wildcards" iregw,lregw,fregw
define allocable register sets for short, long and floating point
variables respectively.
static Symbol ireg[32],lreg[32],freg[32],divd,iregw,lregw,
Next come lburg's terminal definitions. These are generated
by the utility ops (in etc directory). Each term encodes
an operator, type (I,U,P,V,F,B), and size (1,2,4,8). These sizes
are passed to ops; for this target I have chosen
char=1short=2int=2long=4float=4double=8pointer=2.
Note that long, float and double operations are not yet included
in this machine description.
%start stmt
%term CNSTF4=4113 CNSTF8=8209
... many definitions omitted ...
%term VREGP=711
In the next section are described our rule patterns
and their corresponding templates. lcc's code generator
uses these patterns to cover and reduce a syntax tree of operators.
Templates ending in '\n' are complete instructions.
The rest are instruction "parts", usually instruction operands,
that are matched as part of larger rules.
First, list a series of rules that reduce references to "VREG"s
(variables in registers) to a register name. The nonterminal
"reg" is used wherever we want to refer to a variable.
These rules are common to all machine descriptions.
A template beginning with '#' is treated here
rather than causing an instruction to be generated directly,
this signals the code generator to call emit2() (defined below)
to do it. We'll later see where this is useful.
(emit2 need do nothing for these particular rules.)
INDIRI1(VREGP)
"# read reg\n"
INDIRU1(VREGP)
"# read reg\n"
INDIRI2(VREGP)
"# read reg\n"
INDIRU2(VREGP)
"# read reg\n"
INDIRP2(VREGP)
"# read reg\n"
INDIRF4(VREGP)
"# read reg\n"
INDIRF8(VREGP)
"# read reg\n"
The following rules reduce assignment to a register variable.
This works because, by referring to the nonterminal "reg",
we know that earlier rules have produced instructions that
reference the register.
ASGNI1(VREGP,reg)
"# write reg\n"
ASGNU1(VREGP,reg)
"# write reg\n"
ASGNI2(VREGP,reg)
"# write reg\n"
ASGNU2(VREGP,reg)
"# write reg\n"
ASGNP2(VREGP,reg)
"# write reg\n"
ASGNF4(VREGP,freg)
"# write reg\n"
ASGNF8(VREGP,freg)
"# write reg\n"
The LOAD rules implement register-to-register moves.
Unlike other operators, these don't correspond to C semantic constructs
but are compiler generated. (I'm not yet sure if/when LOADB is generated.)
LOADB(reg)
LOADF4(freg)
LOADF8(freg)
LOADI1(reg)
"movb %0,% LOADI1\n" move(a)
LOADU1(reg)
"movb %0,% LOADU1\n" move(a)
LOADI2(reg)
"mov %0,% LOADI2\n"
LOADU2(reg)
"mov %0,% LOADU2\n"
LOADP2(reg)
"mov %0,% LOADP2\n"
The following rule "discards" the rare statements that reduce to a register reference.
for an example.)
The rules for constants are
the most trivial of all. These say that wherever the CNST operator appears,
it should be substituted for a text fragment (its value). We'll refer to this
fragment, nonterminal "con", in rules for assembling instructions.
The following are "special case" constants used for many optimisations.
The cost function range(a,lo,hi) specifies the values
for which the rule should match. (For a detailed explanation, see lcc book.)
range(a,0,0)
range(a,0,0)
range(a,0,0)
range(a,0,0)
range(a,0,0)
range(a,1,1)
range(a,1,1)
range(a,2,2)
range(a,2,2)
range(a,8,8)
range(a,8,8)
range(a,-1,-1)
The constants above were self-explanatory. The next ones describe some constrained ranges.
Firstly shift amounts are restricted to -32..31 by the ASH instruction, so our templates
for shifts ensure that ASH is not generated for other amounts (shifts outside this range
don't make sense anyway, but this caution can prevent assembly errors).
We don't directly handle negative shifts (which are undefined by the C standard anyway).
The following rules define valid ranges for character constants.
range(a,0,31)
range(a,0,31)
range(a,-128,127)
The next rules define addressing modes and the fragments of operand text that
specify them in instruction formats.
Addresses of values in memory enter lcc's syntax tree as one of these three operators -
ADDRG, ADDRL and ADDRF - address of Global, Local, or Formal parameter respectively.
On the PDP-11, addresses of globals are always symbols, and locals and formals
are always on the stack frame (and accessed using Register Indexed mode).
(I have somewhat arbitrarily assigned costs for these modes
based on whether they use an extra instruction word.)
These parallel rules, which for stack variables
give the "raw" offset rather than a formatted addressing mode, come in handy later.
Now come rules for two further addressing modes. Where an address is found in a
register, the operand is accessed using Register Deferred mode.
Where an operand is at a constant offset from an address in a register,
we use Register Index mode.
ADDU2(reg,con)
ADDI2(reg,con)
ADDP2(reg,con)
Now, we can bundle the above into two flavours of addresses.
The first kind is simple addressing of a variable using the unchanged
templates above. The second kind matches a new operator, INDIR,
to show that the operand is actually a pointer to the variable's address.
The PDP supports this indirection by "Deferred" variants of
the modes already described, encoded by '@' in the instruction format.
For example, to load a global variable G into a register, we might use
MOV&G,R0. Where G contains the address-of a variable (i.e.
G is a pointer), and we wish to load the
variable pointed-to (this is what the INDIR operator means), we would MOV&@G,R0.
INDIRP2(addr)
The PDP lacks a "load effective address" instruction, so we manually construct
equivalent calculations for the addressing modes above. This is used, for instance,
when we take the address of a variable.
To take the address of a global we take
the 'literal' value of the symbol (format '#' means immediate constant).
To take the address of a local variable or formal parameter, we load the frame
pointer and add the appropriate offset.
ADDRGP2 "mov # %a,% reg <- ADDRGP2\n"
ADDRLP2 "mov fp,%c\nadd # %a,%c\n"
ADDRFP2 "mov fp,%c\nadd # %a,%c\n"
We can always reduce a constant zero to a register, if needed, by using
a CLR instruction. This is a typical use of the special-case constant
rules given above.
"clr %c\n"
Now a new, versatile non-terminal, "opd" (short for operand), is introduced, which represents yet another
architectural concept: an instruction operand -
in PDP parlance, a source or destination field of an instruction.
Many addressing modes are possible for such an operand:
immediate constant operands are formed by taking a "con"
nonterminal and prefixing '#'
any register is a valid operand (its name)
any "addr", as a variable reference (possibly Deferred).
INDIRI1(mem)
INDIRU1(mem)
INDIRI2(mem)
INDIRU2(mem)
INDIRP2(mem)
Any "opd" can reduce to a register, by loading it into that register.
This is the basic rule for MOV. (In templates, '%c' signifies the target register for a node.)
"mov %0,% reg <- opd\n" 5
Now it's time for an operator that actually does something: the ASGN operator.
Such rules, rather than reducing to a "reg" or "opd"
like the rules above, reduce to the special nonterminal "stmt"
("statement executed for side effect").
To assign to a global variable G, for instance,
the tree will look like ASGNx(ADDRGP2(G),...).
The left side of the operator is always an address,
so the nonterminal "addr" is exactly what we need here.
The value assigned (right side) can take various forms: Either a general
"opd" - constant, register or memory reference -
or a special case such as constant zero.
Note that without the latter rule, a register would be cleared using the "reg: con0"
rule above, and then that register would be assigned via the rules below.
By giving the rule for "con0" below, we provide a better reduction by matching
the zero-assignment with a single CLR or CLRB (depending on whether the destination is a word or byte in size).
First a bunch of byte assignments, then word equivalents:
ASGNI1(mem,opd)
"movb %1,%0 ; ASGNI1\n"
ASGNU1(mem,opd)
"movb %1,%0 ; ASGNU1\n"
ASGNI1(mem,con0)
"clrb %0 ; ASGNI1\n"
ASGNU1(mem,con0)
"clrb %0 ; ASGNU1\n"
ASGNI2(mem,opd)
"mov %1,%0 ; ASGNI2\n"
ASGNU2(mem,opd)
"mov %1,%0 ; ASGNU2\n"
ASGNP2(mem,opd)
"mov %1,%0 ; ASGNP2\n"
ASGNI2(mem,con0)
"clr %0 ; ASGNI2\n"
ASGNU2(mem,con0)
"clr %0 ; ASGNU2\n"
ASGNP2(mem,con0)
"clr %0 ; ASGNP2\n"
With ASGN we have an opportunity to give rules for
some useful optimisations. The following rules are for "in-place"
modifications, such as might occur in a statement such as A&+=&2.
Conceptually this is a match for a rule ASGN(A,OP(A,X)) where
the two A's are equal addresses, OP is one of several applicable operations,
and X is the second operand. The cost function memop(a)
ensures that these rules only apply when the assignment destination
and the operand address are the same (see lcc book for more details).
We give rules for each PDP-11 operation that can be performed
in-place on an operand (register or memory). You will note several
uses of the special-case constants, such as "con1", for instance to match
additions-by-1 with a template that emits a simple INC.
(Ditto for single-bit shifts.)
(Another tempting reduction/optimisation would be to match
in-place operations on bytes such as INCB and DECB. This would be done
with rules that resemble "ASGNI1(addr,ADDI2(CVII2(opd),con1))",
but unfortunately lcc automatically inserts a LOAD operator
above the right hand of this assignment, making it impossible
to write rules that match. :(&)
ASGNI2(mem,NEGI2(opd))
"neg %0\n"
ASGNI2(mem,ADDI2(opd,opd))
"add %2,%0\n"
ASGNU2(mem,ADDU2(opd,opd))
"add %2,%0\n"
ASGNP2(mem,ADDP2(opd,opd))
"add %2,%0\n"
ASGNI2(mem,ADDI2(opd,con1))
"inc %0\n"
ASGNU2(mem,ADDU2(opd,con1))
"inc %0\n"
ASGNP2(mem,ADDP2(opd,con1))
"inc %0\n"
ASGNI2(mem,ADDI2(opd,conm1))
"dec %0\n"
ASGNU2(mem,ADDU2(opd,conm1))
"dec %0\n"
ASGNP2(mem,ADDP2(opd,conm1))
"dec %0\n"
ASGNI2(mem,SUBI2(opd,opd))
"sub %2,%0\n"
ASGNU2(mem,SUBU2(opd,opd))
"sub %2,%0\n"
ASGNP2(mem,SUBP2(opd,opd))
"sub %2,%0\n"
ASGNI2(mem,SUBI2(opd,con1))
"dec %0\n"
ASGNU2(mem,SUBU2(opd,con1))
"dec %0\n"
ASGNP2(mem,SUBP2(opd,con1))
"dec %0\n"
ASGNI2(mem,LSHI2(opd,con1))
"asl %0\n"
ASGNU2(mem,LSHU2(opd,con1))
"asl %0\n"
ASGNI2(mem,LSHI2(opd,con8))
"swab %0\nclrb %0\n"
ASGNU2(mem,LSHU2(opd,con8))
"swab %0\nclrb %0\n"
ASGNI2(mem,RSHI2(opd,con1))
"asr %0\n"
ASGNU2(mem,RSHU2(opd,con1))
"clc\nror %0\n" memop(a)
ASGNU2(mem,RSHU2(opd,con8))
"clrb %0\nswab %0\n"
ASGNI2(mem,DIVI2(opd,con2))
"asr %0\n"
ASGNU2(mem,DIVU2(opd,con2))
"clc\nror %0\n" memop(a)
(Note rules above to reduce a divide-by-2 to shifts. We could also recognise larger
power-of-2 divisors in emit2 and generate ASH instructions.
lcc does not do this by itself, although it does generate shift operations
for multiply-by-2.)
Matching rules for bitwise AND operator (BAND) raises a quirk of the PDP-11
instruction set, which does not have an instruction to perform AND directly.
Instead, it has the BIC (Bit Clear) instruction, which is equivalent to
AND with the complement of the source operand. This has a couple of implications.
First, an instruction sequence to compute the AND of two registers will
require the source operand to be complemented first. Of course lcc is not
expecting us to "clobber" that operand, so we have to complement it back
afterwards. (We could allow a general "opd" here but since we are accessing
the source operand three times I decided to make lcc force it into a "reg" first.)
ASGNI2(mem,BANDI2(opd,reg))
"com %2 ; BANDI2\nbic %2,%0 ; BANDI2\ncom %2 ; BANDI2\n"
ASGNU2(mem,BANDU2(opd,reg))
"com %2 ; BANDU2\nbic %2,%0 ; BANDU2\ncom %2 ; BANDU2\n"
Second, we can make lemonade from the lemon (of not having AND),
by exploiting the semantics of BIC as a complemented AND.
An expression such as A&&=&~B can then reduce to a single PDP-11 instruction:
ASGNI2(mem,BANDI2(opd,BCOMI2(opd))) "bic %2,%0\n" memop(a)
ASGNI2(mem,BANDU2(opd,BCOMU2(opd))) "bic %2,%0\n" memop(a)
The rest of the bitwise operations are straightforward, except noting that
on the PDP-11, XOR's source must be a register.
ASGNI2(mem,BORI2(opd,opd))
"bis %2,%0\n"
ASGNU2(mem,BORU2(opd,opd))
"bis %2,%0\n"
ASGNI2(mem,BXORI2(opd,reg))
"xor %2,%0\n"
ASGNU2(mem,BXORU2(opd,reg))
"xor %2,%0\n"
ASGNI2(mem,BCOMI2(opd))
"com %0\n"
ASGNU2(mem,BCOMU2(opd))
"com %0\n"
That concludes the combined assign-and-operate rules.
Now we give a simple rule for struct assignment,
or block copy. This delegates emit2 to emit a series of MOV instructions,
or a macro call which expands to a copying loop.
(For macro definitions, see lcc.mlb.)
ASGNB(reg,INDIRB(reg))
"# use emit2\n"
In the spirit of the ASGN rules above, we present another optimisation technique.
This one notes that, when building actual parameters to a function on the stack,
we can perform some operations on the top-of-stack itself, which can save a temporary location.
Some of the most useful top-of-stack operations are zero extension
and sign extension when char-sized parameters are promoted to int, as required by C.
We prefer to do this directly at the top of stack, rather than requiring a scratch register (or worse).
An actual parameter of zero can be pushed trivially with a CLR.
ARGI2(opd)
"mov %0,-(sp) ; ARGI2\n"
ARGU2(opd)
"mov %0,-(sp) ; ARGU2\n"
ARGP2(opd)
"mov %0,-(sp) ; ARGP2\n"
ARGI2(CVII2(opd))
"movb %0,-(sp) ; ARGI2(CVII2)\n"
ARGU2(CVIU2(opd))
"movb %0,-(sp) ; ARGU2(CVIU2)\n"
ARGI2(CVUI2(opd))
"clr -(sp)\nbisb %0,(sp) ; ARGI2(CVUI2)\n"
ARGU2(CVUU2(opd))
"clr -(sp)\nbisb %0,(sp) ; ARGU2(CVUU2)\n"
ARGI2(con0)
"clr -(sp) ; ARGI2\n"
ARGU2(con0)
"clr -(sp) ; ARGU2\n"
ARGP2(con0)
"clr -(sp) ; ARGP2\n"
There is a laundry list of other useful things we can do to parameters
after they're pushed on to the stack. These largely correspond to the assignment rules above.
In several cases we can sign-extend (CVIx2) and push in one MOVB; even zero extension (CVUx2)
takes only two instructions and needs no temporary.
ARGI2(ADDI2(opd,opd))
"mov %0,-(sp)\nadd %1,(sp) ; ARGI2(ADDI1)\n"
ARGU2(ADDU2(opd,opd))
"mov %0,-(sp)\nadd %1,(sp) ; ARGU2(ADDU2)\n"
ARGP2(ADDP2(opd,opd))
"mov %0,-(sp)\nadd %1,(sp) ; ARGP2(ADDP2)\n"
ARGI2(ADDI2(opd,con1))
"mov %0,-(sp)\ninc (sp) ; ARGI2(ADDI1)\n"
ARGI2(ADDI2(CVII2(opd),con1))
"movb %0,-(sp)\ninc (sp) ; ARGI2(ADDI2(CVII2))\n"
ARGU2(ADDU2(CVIU2(opd),con1))
"movb %0,-(sp)\ninc (sp) ; ARGU2(ADDU2(CVIU2))\n"
ARGI2(ADDI2(CVUI2(opd),con1))
"clr -(sp)\nbisb %0,(sp)\ninc (sp) ; ARGI2(ADDI2(CVUI2))\n"
ARGU2(ADDU2(CVUU2(opd),con1))
"clr -(sp)\nbisb %0,(sp)\ninc (sp) ; ARGU2(ADDU2(CVUU2))\n"
ARGU2(ADDU2(opd,con1))
"mov %0,-(sp)\ninc (sp) ; ARGU2(ADDU2)\n"
ARGU2(ADDU2(BCOMU2(opd),con1))
"mov %0,-(sp)\nneg (sp) ; ARGU2(ADDU2(BCOMU2(opd),1))\n"
ARGP2(ADDP2(opd,con1))
"mov %0,-(sp)\ninc (sp) ; ARGP2(ADDP2)\n"
ARGP2(ADDP2(opd,conm1))
"mov %0,-(sp)\ndec (sp) ; ARGP2(ADDP2)\n"
ARGI2(SUBI2(opd,opd))
"mov %0,-(sp)\nsub %1,(sp) ; ARGI2(SUBI1)\n"
ARGU2(SUBU2(opd,opd))
"mov %0,-(sp)\nsub %1,(sp) ; ARGU2(SUBU2)\n"
ARGP2(SUBP2(opd,opd))
"mov %0,-(sp)\nsub %1,(sp) ; ARGP2(SUBP2)\n"
ARGI2(SUBI2(opd,con1))
"mov %0,-(sp)\ndec (sp) ; ARGI2(SUBI1)\n"
ARGU2(SUBU2(opd,con1))
"mov %0,-(sp)\ndec (sp) ; ARGU2(SUBU2)\n"
ARGP2(SUBP2(opd,con1))
"mov %0,-(sp)\ndec (sp) ; ARGP2(SUBP2)\n"
ARGI2(SUBI2(CVII2(opd),con1))
"movb %0,-(sp)\ndec (sp) ; ARGI2(SUBI2(CVII2))\n"
ARGU2(SUBU2(CVIU2(opd),con1))
"movb %0,-(sp)\ndec (sp) ; ARGU2(SUBU2(CVIU2))\n"
ARGI2(SUBI2(CVUI2(opd),con1))
"clr -(sp)\nbisb %0,(sp)\ndec (sp) ; ARGI2(SUBI2(CVUI2))\n"
ARGU2(SUBU2(CVUU2(opd),con1))
"clr -(sp)\nbisb %0,(sp)\ndec (sp) ; ARGU2(SUBU2(CVUU2))\n"
ARGI2(BORI2(opd,opd))
"mov %0,-(sp)\nbis %1,(sp) ; ARGI2(BORI2)\n"
ARGI2(BANDI2(reg,con))
"# give to emit2\n"
ARGU2(BANDU2(reg,con))
"# give to emit2\n"
ARGU2(BORU2(opd,opd))
"mov %0,-(sp)\nbis %1,(sp) ; ARGU2(BORU2)\n"
ARGI2(BXORI2(opd,reg))
"mov %0,-(sp)\nxor %1,(sp) ; ARGI2(BXORI2)\n"
ARGU2(BXORU2(opd,reg))
"mov %0,-(sp)\nxor %1,(sp) ; ARGU2(BXORU2)\n"
ARGI2(LSHI2(opd,con1))
"mov %0,-(sp)\nasl (sp) ; ARGI2(LSHI2)\n"
ARGU2(LSHU2(opd,con1))
"mov %0,-(sp)\nasl (sp) ; ARGU2(LSHU2)\n"
ARGI2(MULI2(opd,con2))
"mov %0,-(sp)\nasl (sp) ; ARGI2(MULI2)\n"
ARGU2(MULU2(opd,con2))
"mov %0,-(sp)\nasl (sp) ; ARGU2(MULU2)\n"
ARGI2(RSHI2(opd,con1))
"mov %0,-(sp)\nasr (sp) ; ARGI2(RSHI2)\n"
ARGU2(RSHU2(opd,con1))
"mov %0,-(sp)\nclc\nror (sp) ; ARGU2(RSHU2)\n"
ARGI2(DIVI2(opd,con2))
"mov %0,-(sp)\nasr (sp) ; ARGI2(DIVI2)\n"
ARGU2(DIVU2(opd,con2))
"mov %0,-(sp)\nclc\nror (sp) ; ARGU2(DIVU2)\n"
ARGI2(NEGI2(opd))
"mov %0,-(sp)\nneg (sp) ; ARGI2(NEGI2)\n"
ARGI2(BCOMI2(opd))
"mov %0,-(sp)\ncom (sp) ; ARGI2(BCOMI2)\n"
ARGU2(BCOMU2(opd))
"mov %0,-(sp)\ncom (sp) ; ARGU2(BCOMU2)\n"
The following two rules optimise the case where an actual parameter
is a computed address. Again, it's most efficient to do the offset computation (add)
on the top-of-stack. (We can't use the "mem" nonterminal because the
template is wrong: it is built as an addressing mode indexing the frame pointer.
In this case we need the bare offset, which is provided by nonterminal "memf".)
ARGP2(addrf)
"mov fp,-(sp)\nadd #%0,(sp) ; ARGP2(frame)\n"
There are two kinds of rules for CALL operators. The first kind is more general,
and can deal with, for instance, Deferred addressing: where the called function address
is in a variable. The arguments have already been pushed on the stack,
and after the call we must adjust the stack pointer to remove them.
CALLI2(mem) "jsr pc,%0\n"
(a->syms[0]->u.c.v.u == 0 ? 1 : LBURG_MAX)
CALLI2(mem) "jsr pc,%0\ntst (sp)+ ; pop args\n"
(a->syms[0]->u.c.v.u == 2 ? 1 : LBURG_MAX)
CALLI2(mem) "jsr pc,%0\ncmp (sp)+,(sp)+ ; pop args\n"
(a->syms[0]->u.c.v.u == 4 ? 1 : LBURG_MAX)
CALLI2(mem) "jsr pc,%0\nadd #%a, pop args\n"
CALLU2(mem) "jsr pc,%0\n"
(a->syms[0]->u.c.v.u == 0 ? 1 : LBURG_MAX)
CALLU2(mem) "jsr pc,%0\ntst (sp)+ ; pop args\n"
(a->syms[0]->u.c.v.u == 2 ? 1 : LBURG_MAX)
CALLU2(mem) "jsr pc,%0\ncmp (sp)+,(sp)+ ; pop args\n"
(a->syms[0]->u.c.v.u == 4 ? 1 : LBURG_MAX)
CALLU2(mem) "jsr pc,%0\nadd #%a, pop args\n"
CALLP2(mem) "jsr pc,%0\n"
(a->syms[0]->u.c.v.u == 0 ? 1 : LBURG_MAX)
CALLP2(mem) "jsr pc,%0\ntst (sp)+ ; pop args\n"
(a->syms[0]->u.c.v.u == 2 ? 1 : LBURG_MAX)
CALLP2(mem) "jsr pc,%0\ncmp (sp)+,(sp)+ ; pop args\n"
(a->syms[0]->u.c.v.u == 4 ? 1 : LBURG_MAX)
CALLP2(mem) "jsr pc,%0\nadd #%a, pop args\n"
CALLV(mem)
"jsr pc,%0\n"
(a->syms[0]->u.c.v.u == 0 ? 1 : LBURG_MAX)
CALLV(mem)
"jsr pc,%0\ntst (sp)+ ; pop args\n"
(a->syms[0]->u.c.v.u == 2 ? 1 : LBURG_MAX)
CALLV(mem)
"jsr pc,%0\ncmp (sp)+,(sp)+ ; pop args\n"
(a->syms[0]->u.c.v.u == 4 ? 1 : LBURG_MAX)
CALLV(mem)
"jsr pc,%0\nadd #%a, pop args\n"
CALLB(mem)
"jsr pc,%0\n"
(a->syms[0]->u.c.v.u == 0 ? 1 : LBURG_MAX)
CALLB(mem)
"jsr pc,%0\ntst (sp)+ ; pop args\n"
(a->syms[0]->u.c.v.u == 2 ? 1 : LBURG_MAX)
CALLB(mem)
"jsr pc,%0\ncmp (sp)+,(sp)+ ; pop args\n"
(a->syms[0]->u.c.v.u == 4 ? 1 : LBURG_MAX)
CALLB(mem)
"jsr pc,%0\nadd #%a, pop args\n"
RET (return) operators require nothing from us, nor from emit2.
We do however use them for register targeting (see target), since the convention I've used
passes function return values in R0.
RETF4(freg) "# RETF4\n"
RETF8(freg) "# RETF8\n"
RETI2(reg)
"# RETI2\n"
RETU2(reg)
"# RETU2\n"
RETP2(reg)
"# RETP2\n"
"# RETV\n"
Now we deal with type conversions. We are only concerned here
with the widening conversions, the four operators:
CVII2: widen signed char to signed int (sign extend)
CVUI2: widen unsigned char to signed int (zero extend)
CVIU2: widen signed char to unsigned int (sign extend)
CVUU2: widen unsigned char to unsigned int (zero extend)
The PDP-11's instruction MOVB sign extends a byte to a word.
It takes one more instruction to zero extend,
which explains why signed chars are preferable on this target (right shifts
are another example of a penalised unsigned operation).
The conversions marked '#' are idempotent on this target
(unless I am mistaken about the subtleties of conversion operators),
and emit2 does nothing about them.
CVII1(reg)
"mov %0,% CVII1\n" move(a)
CVIU1(reg)
"mov %0,% CVIU1\n" move(a)
CVUI1(reg)
"mov %0,% CVUI1\n" move(a)
CVUU1(reg)
"mov %0,% CVUU1\n" move(a)
CVII2(opd)
"movb %0,% CVII2\n"
CVIU2(opd)
"movb %0,% CVIU2\n"
CVPU2(reg)
"mov %0,% CVPU2\n" move(a)
CVUP2(reg)
"mov %0,% CVUP2\n" move(a)
CVUI2(opd)
"movb %0,% CVUI2\nbic #^o-400,% CVUI2\n"
CVUU2(opd)
"movb %0,% CVUU2\nbic #^o-400,% CVUU2\n"
After those dry conversions, it's time for dessert: the arithmetic operations.
Again, you'll note special cases for add-by-one, shift-by-one. The costs
attached to the end of these rules try to encourage the special cases to be
used where the code generator has a choice.
The pattern of operand reductions
should also affect these cost calculations, so that where an operand can be
entirely avoided, that expansion will be preferred. (That's the theory, anyway.
In practice, choosing these costs seems a black art.)
NEGI2(reg)
"?mov %0,%c\nneg %c\n"
(The '?' suppresses the MOV where source (%0) and destination (%c) registers are the same.
See lcc book.)
For unsigned right-shifts where the shift is not a constant (i.e. it's in
a register or in memory), we invoke a macro to do the shift with the ASH instruction.
ASH always sign extends when it shifts right, so the macro includes code to mask away
any sign extension (see lcc.mlb). We even need a macro
for signed right shift, because the shift count operand must be negated before ASH can use it.
It's worth explaining the rule for ADDU2(BCOMU2(reg),con1) since this seems an obscure expression.
When "negating" an unsigned number, rather than invent an oxymoronic "NEGU" operation,
lcc prefers to convert the negate operator into an unsigned complement and then add 1. Makes perfect sense,
but the template can concisely specify the NEG instruction here (this special case also appeared earlier
in one of the ARG rules).
Where an unsigned value is shifted right by a constant amount, we let
to emit2 to generate the shift followed by a "Bit Clear"
with a computed mask to zero out the unwanted sign extension.
(A related rule below optimises the obscure case of a right shift
combined with a constant-bitwise-AND to use just one BIC. For instance,
on an unsigned variable U, we can compute (U>>3)&7 with just one BIC.
Without this optimisation we would be dismayed to see two BICs in a row, one generated for
the shift, and one for the &.)
ADDI2(reg,opd)
"?mov %0,%c\nadd %1,%c\n"
ADDU2(reg,opd)
"?mov %0,%c\nadd %1,%c\n"
ADDP2(reg,opd)
"?mov %0,%c\nadd %1,%c\n"
ADDI2(reg,con1)
"?mov %0,%c\ninc %c\n"
ADDU2(reg,con1)
"?mov %0,%c\ninc %c\n"
ADDP2(reg,con1)
"?mov %0,%c\ninc %c\n"
ADDI2(reg,conm1)
"?mov %0,%c\ndec %c\n"
ADDU2(reg,conm1)
"?mov %0,%c\ndec %c\n"
ADDP2(reg,conm1)
"?mov %0,%c\ndec %c\n"
ADDU2(BCOMU2(reg),con1)
"?mov %0,%c\nneg %c\n"
SUBI2(reg,opd)
"?mov %0,%c\nsub %1,%c\n"
SUBU2(reg,opd)
"?mov %0,%c\nsub %1,%c\n"
SUBP2(reg,opd)
"?mov %0,%c\nsub %1,%c\n"
SUBI2(reg,con1)
"?mov %0,%c\ndec %c\n"
SUBU2(reg,con1)
"?mov %0,%c\ndec %c\n"
SUBP2(reg,con1)
"?mov %0,%c\ndec %c\n"
LSHI2(reg,opd)
"?mov %0,%c\nash %1,%c\n"
LSHU2(reg,opd)
"?mov %0,%c\nash %1,%c\n"
LSHI2(reg,con1)
"?mov %0,%c\nasl %c\n"
LSHU2(reg,con1)
"?mov %0,%c\nasl %c\n"
LSHI2(reg,con8)
"?mov %0,%c\nswab %c\nclrb %c\n"
LSHU2(reg,con8)
"?mov %0,%c\nswab %c\nclrb %c\n"
RSHI2(reg,opd)
"?mov %0,%c\n$RSHI %1,%c\n" 6
RSHU2(reg,opd)
"?mov %0,%c\n$RSHU %1,%c\n" 6
RSHI2(reg,consh)
"?mov %0,%c\nash #-%1,%c\n" 6
RSHU2(reg,consh)
"?mov %0,%c\nash #-%1,%c\n" 6
RSHI2(reg,con1)
"?mov %0,%c\nasr %c\n"
RSHU2(reg,con1)
"?mov %0,%c\nclc\nror %c\n" 3
RSHU2(reg,con8)
"?mov %0,%c\nclrb %c\nswab %c\n"
RSHI2(reg,consh)
"?mov %0,%c\nash #-%1,%c\n" 3
RSHU2(reg,consh)
"# emit2 does shift and mask\n" 3
For constant-bitwise-AND, we simply complement
the constant first, and emit a BIC.
BANDI2(reg,reg)
"?mov %0,% BANDI2\ncom %1 ; BANDI2\nbic %1,% BANDI2\ncom %1 ; BANDI2\n"
BANDU2(reg,reg)
"?mov %0,% BANDU2\ncom %1 ; BANDU2\nbic %1,% BANDU2\ncom %1 ; BANDU2\n"
BANDI2(reg,con)
"# emit2 helps out\n"
BANDU2(reg,con)
"# emit2 helps out\n"
BANDI2(reg,BCOMI2(opd)) "?mov %0,%c\nbic %1,%c\n"
BANDU2(reg,BCOMU2(opd)) "?mov %0,%c\nbic %1,%c\n"
BANDI2(CVUI2(reg),con)
"# job for emit2\n"
BANDU2(CVUU2(reg),con)
"# job for emit2\n"
BANDU2(RSHU2(reg,consh),con)
"# emit2 does shift and mask\n"
BORI2(reg,opd)
"?mov %0,%c\nbis %1,%c\n"
BORU2(reg,opd)
"?mov %0,%c\nbis %1,%c\n"
BXORI2(reg,reg) "?mov %0,%c\nxor %1,%c\n"
BXORU2(reg,reg) "?mov %0,%c\nxor %1,%c\n"
BCOMI2(reg)
"?mov %0,%c\ncom %c\n"
BCOMU2(reg)
"?mov %0,%c\ncom %c\n"
DIV, MOD and MUL are only curious in that they are restricted in their operand registers.
Specifically, the DIV instruction takes a 32-bit register even/odd pair (we choose R0/1)
as its dividend. The divisor is the source operand (%1). The quotient and remainder
are produced in R0 and R1 respectively (see target() below). Again, we optimise
divide-by-2 into a shift, since lcc declines to do this by itself (probably because
it's only super-cheap if the target does arithmetic shift, and not all do). Unsigned divide uses
a runtime support subroutine that uses the signed DIV instruction,
but checks some special cases first.
DIVI2(reg,opd)
"tst r1 ; DIVI2\nsxt r0 ; DIVI2\ndiv %1,r0 ; DIVI2\n"
DIVU2(reg,reg)
"jsr pc,$DIVU ; DIVU2\n"
DIVI2(reg,con2) "?mov %0,%c\nasr %c\n"
DIVU2(reg,con2) "?mov %0,%c\nclc\nror %c\n"
MODI2(reg,opd)
"tst r1 ; MODI2\nsxt r0 ; MODI2\ndiv %1,r0 ; MODI2\n"
MODU2(reg,reg)
"jsr pc,$DIVU ; MODU2\n"
MULI2(reg,opd)
"mul %1,r1 ; MULI2\n"
MULU2(reg,opd)
"mul %1,r1 ; MULU2\n"
Now what you've all been waiting for, the conditional operators.
First it should be noted that all of these rely upon conditional branches,
which have a restricted range. This does put a limit on compiled function size,
as does the rule below for JUMP.
We could solve this by brute force,
inverting the sense of a branch to hop over an unconditional far jump
(e.g. BEQ&A becomes BNE&HOP, JMP&A, HOP:),
taking Pollyanna's view, the branch limits seem a very neat way
to enforce modularisation of big ugly functions, and I found it was a powerful
motivator to me, writing this back-end, to find shorter instruction
sequences to avoid branch-range fouls!
Of course, there will be people who can't
tolerate the assembler asking them to break up their multipage
functions. Those people can insert the branch diddling solution I just mentioned. :)
Note that we recognise and reduce some char-conversion operators,
and emit CMPB as appropriate. This is where we finally
get to use the "conc" nonterminal. (Where "conc" doesn't match,
we must have a degenerate compare (e.g. char&==&900) and
lcc will take the long route, extending with a MOVB and using a word CMP.)
Compares to zero can exploit the cheaper TST instruction.
Even more so for byte compares to zero, which optimise away a type conversion.
EQI2(opd,opd)
"cmp %0,%1\nbeq %a\n"
EQU2(opd,opd)
"cmp %0,%1\nbeq %a\n"
EQI2(CVII2(opd),CVII2(opd)) "cmpb %0,%1\nbeq %a\n"
EQI2(CVUI2(opd),CVUI2(opd)) "cmpb %0,%1\nbeq %a\n"
EQI2(CVII2(opd),consc)
"cmpb %0,#%1\nbeq %a\n"
EQI2(opd,con0)
"tst %0\nbeq %a\n"
EQU2(opd,con0)
"tst %0\nbeq %a\n"
EQI2(CVII2(opd),con0)
"tstb %0\nbeq %a\n"
EQI2(CVUI2(opd),con0)
"tstb %0\nbeq %a\n"
This is where it gets interesting. The PDP-11 has a combined "bitwise-AND then
compare to zero" instruction, Bit&Test (BIT), and its byte-sized equivalent, BITB.
Let's make the most of them:
EQI2(BANDI2(opd,opd),con0)
"bit %1,%0\nbeq %a\n"
EQU2(BANDU2(opd,opd),con0)
"bit %1,%0\nbeq %a\n"
EQI2(BANDI2(CVII2(opd),opd),con0)
"bitb %1,%0\nbeq %a\n"
EQI2(BANDI2(CVUI2(opd),opd),con0)
"bitb %1,%0\nbeq %a\n"
EQU2(BANDU2(CVIU2(opd),opd),con0)
"bitb %1,%0\nbeq %a\n"
EQU2(BANDU2(CVUU2(opd),opd),con0)
"bitb %1,%0\nbeq %a\n"
And all the same logic applies to NE (not&equal) en&bloc,
except the BNE conditional branch is used.
NEI2(opd,opd)
"cmp %0,%1\nbne %a\n"
NEU2(opd,opd)
"cmp %0,%1\nbne %a\n"
NEI2(CVII2(opd),CVII2(opd)) "cmpb %0,%1\nbne %a\n"
NEI2(CVUI2(opd),CVUI2(opd)) "cmpb %0,%1\nbne %a\n"
NEI2(CVII2(opd),consc)
"cmpb %0,#%1\nbne %a\n"
NEI2(opd,con0)
"tst %0\nbne %a\n"
NEU2(opd,con0)
"tst %0\nbne %a\n"
NEI2(CVII2(opd),con0)
"tstb %0\nbne %a\n"
NEI2(CVUI2(opd),con0)
"tstb %0\nbne %a\n"
NEI2(BANDI2(opd,opd),con0)
"bit %1,%0\nbne %a\n"
NEU2(BANDU2(opd,opd),con0)
"bit %1,%0\nbne %a\n"
NEI2(BANDI2(CVII2(opd),opd),con0)
"bitb %1,%0\nbne %a\n"
NEI2(BANDI2(CVUI2(opd),opd),con0)
"bitb %1,%0\nbne %a\n"
NEU2(BANDU2(CVIU2(opd),opd),con0)
"bitb %1,%0\nbne %a\n"
NEU2(BANDU2(CVUU2(opd),opd),con0)
"bitb %1,%0\nbne %a\n"
The remaining relational operators are only interesting in that
the PDP-11 has conditional branches useful for unsigned comparisons
(BLO, BLOS, BHI, BHIS).
LTI2(opd,opd)
"cmp %0,%1\nblt %a\n"
LTU2(opd,opd)
"cmp %0,%1\nblo %a\n"
LTI2(opd,con0)
"tst %0\nbmi %a\n"
LTI2(CVII2(opd),CVII2(opd)) "cmpb %0,%1\nblt %a\n"
LTI2(CVUI2(opd),CVUI2(opd)) "cmpb %0,%1\nblo %a\n"
LTI2(CVII2(opd),consc)
"cmpb %0,#%1\nblt %a\n"
LTI2(CVII2(opd),con0)
"tstb %0\nbmi %a\n"
LEI2(opd,opd)
"cmp %0,%1\nble %a\n"
LEU2(opd,opd)
"cmp %0,%1\nblos %a\n"
LEI2(opd,con0)
"tst %0\nble %a\n"
LEU2(opd,con0)
"tst %0\nbeq %a\n"
LEI2(CVII2(opd),CVII2(opd)) "cmpb %0,%1\nble %a\n"
LEI2(CVUI2(opd),CVUI2(opd)) "cmpb %0,%1\nblos %a\n"
LEI2(CVII2(opd),consc)
"cmpb %0,#%1\nble %a\n"
LEI2(CVII2(opd),con0)
"tstb %0\nble %a\n"
LEI2(CVUI2(opd),con0)
"tstb %0\nbeq %a\n"
GTI2(opd,opd)
"cmp %0,%1\nbgt %a\n"
GTU2(opd,opd)
"cmp %0,%1\nbhi %a\n"
GTI2(opd,con0)
"tst %0\nbgt %a\n"
GTU2(opd,con0)
"tst %0\nbne %a\n"
GTI2(CVII2(opd),CVII2(opd)) "cmpb %0,%1\nbgt %a\n"
GTI2(CVUI2(opd),CVUI2(opd)) "cmpb %0,%1\nbhi %a\n"
GTI2(CVII2(opd),consc)
"cmpb %0,#%1\nbgt %a\n"
GTI2(CVII2(opd),con0)
"tstb %0\nbgt %a\n"
GTI2(CVUI2(opd),con0)
"tstb %0\nbne %a\n"
GEI2(opd,opd)
"cmp %0,%1\nbge %a\n"
GEU2(opd,opd)
"cmp %0,%1\nbhis %a\n"
GEI2(opd,con0)
"tst %0\nbpl %a\n"
GEI2(CVII2(opd),CVII2(opd)) "cmpb %0,%1\nbge %a\n"
GEI2(CVUI2(opd),CVUI2(opd)) "cmpb %0,%1\nbhis %a\n"
GEI2(CVII2(opd),consc)
"cmpb %0,#%1\nbge %a\n"
GEI2(CVII2(opd),con0)
"tstb %0\nbpl %a\n"
The remaining two rules are trivial. Note that the "addr" nonterminal can theoretically
produce addressing modes that are incompatible with BR, but I can't think of C constructs
that might result in such a tree. For instance, it's only possible to goto a label.
JUMPV(addrg)
JUMPV(mem)
"jmp %0\n"
That's it for the rules. Now we cover the support functions.
First some fairly standard cost functions.
memop() detects the case ASGN(A,OP(A,...)) as required
for the in-place operations.
static int memop(Node p) {
assert( p && generic(p->op) == ASGN );
assert( p->kids[0] && p->kids[1] && p->kids[1]->kids[0] );
if (generic(p->kids[1]->kids[0]->op) == INDIR
&& sametree(p->kids[0], p->kids[1]->kids[0]->kids[0]))
return LBURG_MAX;
static int sametree(Node p, Node q) {
return (p == NULL && q == NULL)
|| (p && q && p->op == q->op && p->syms[0] == q->syms[0]
&& sametree(p->kids[0], q->kids[0])
&& sametree(p->kids[1], q->kids[1]));
cnstrhs() matches a constant right-hand of a rule.
static int cnstrhs(Node p) {
assert(p && p->kids[1]);
return generic(p->kids[1]->op) == CNST ? 1 : LBURG_MAX;
progbeg() emits the assembly source header. I believe it's polite
to name the assembler you are targeting right here. It can be very difficult
on some platforms to deduce the flavour of assembly, otherwise, for someone
confronted with your back-end output.
Importantly, here is where we define the machine registers, r0-r7.
We don't let lcc we set tmask and vmask to bit masks
of the allowed register sets for temporaries and variables respectively.
static void progbeg(int argc, char *argv[]) {
parseflags(argc, argv);
print("; PDP-11 assembly generated by lcc 4.2\n");
print("; assemble with MACRO-11\n");
print(".INCLUDE /LCC.MLB/\n");
//print(".MCALL $CPYB,$CPYW,$LSH,$RSHI,$RSHU\n");
print(".RADIX 10\n");
print("fp=%%5 ; R5 is frame pointer\n");
print(".ENABL LSB ; file-wide local symbols\n\n");
//print(".LIST ME ; list macro expansions\n\n");
for(i=0;ix.regnode->mask |= 2;
//rem = mkreg("r1",1,1,IREG); rem->x.regnode->mask |= 1;
for(i=0;i<=3;++i){
lreg[i] = mkreg("%%%d",i*2,3,IREG);
tmask[FREG] = FLTTMP; vmask[FREG] = FLTVAR;
tmask[IREG] = INTTMP; vmask[IREG] = INTVAR;
iregw = mkwildcard(ireg);
lregw = mkwildcard(lreg);
fregw = mkwildcard(freg);
Return the corresponding register wildcard for a given operator type
(Block, Pointer, Int, Unsigned int, Floating).
static Symbol rmap(int opk) {
switch (optype(opk)) {
case B: case P:
case I: case U:
return opsize(opk)==4 ? lregw :
static void segment(int n) {
/* put everything in default PSECT for now */
static void progend(void) {
print(".END\n");
#define INTCONST(P,V) ( generic(P->op)==CNST && \
( (optype(P->op)==I && P->syms[0]->u.c.v.i==V) \
|| (optype(P->op)==U && P->syms[0]->u.c.v.u==V) ) )
The target() function allows us to specify particular registers
we want operands in, for operators that need this.
On the PDP-11, we expect
DIV and MUL to operate on specific registers, and those nodes also produce
a result in a specific register. Set these with rtarget and setreg
(see lcc book).
Our calling convention puts function returns in R0, so we specify
that CALL nodes reduce to that register, and RET nodes must develop their value therein.
static void target(Node p) {
int op = specific(p->op);
assert(p);
switch (op) {
case MUL+I: case MUL+U:
[ Re,o means the register pair Rn (even) and Rn+1 (odd) ]
both multiplier (src) and multiplicand (R) are 16 bits
32-bit product: MUL src,Re
Re,o <- Re * src
16-bit product: MUL src,Ro
Ro kids[1],2)){ /* x2 is converted to left shift
rtarget(p, 0, ireg[1]); /* kids[0] must deliver multiplicand in R1 */
setreg(p, ireg[1]); /* product in R1 */
case DIV+I: case DIV+U:
32-bit divide:
DIV src,Re
[dst always even register]
divisor (src) is 16 dividend (Re,o) is 32 bits
Note hi word of dividend is R0, lo word R1
Re <- 16 bit quotient of Re,o / src
Ro kids[1],2)){ /* /2 is converted to right shift
if(op == DIV+U) /* special case for unsigned divide, put divisor in R0 */
rtarget(p,1,ireg[0]);
rtarget(p, 0, ireg[1]); /* kids[0] must deliver dividend into R1 */
setreg(p, ireg[0]);
/* produces quotient in R0 */
case MOD+I: case MOD+U:
if(op == MOD+U) /* special case for unsigned divide, put divisor in R0 */
rtarget(p,1,ireg[0]);
rtarget(p, 0, ireg[1]); /* kids[0] must deliver dividend into R1 */
setreg(p, ireg[1]);
/* produces remainder in R1 */
case CALL+I: case CALL+U: case CALL+P: case CALL+V:
setreg(p, opsize(p->op) == 4 ? lreg[0] : ireg[0]);
case CALL+F:
setreg(p, freg[0]);
case RET+I: case RET+U: case RET+P:
rtarget(p, 0, ireg[0]);
case RET+F:
rtarget(p, 0, freg[0]);
There may be opportunities to use clobber on this target
(for instance DIV and MUL, or where support macros trash registers)
but I don't und my attempts at using it produce lcc errors.
As a workaround, the support macros conservatively preserve registers,
and I seem to have got away with not telling lcc that DIV trashes anything.
The use of the special "divd" register, which covers R0 and R1, may be enough
for lcc to realise that both registers are trashed.
I won't pretend to know more about clobber than I&do.&:)
static void clobber(Node p) {
Helper routines for emit2().
static void emitband(char *dstname,unsigned andmask){
print("bic #^o%o,% BANDx 0%o\n", ~andmask & 0xffff, dstname, andmask);
static void emitrshu(Node p,char *dstname,unsigned andmask){
unsigned shift,
assert(p->kids[1]);
assert(p->kids[1]->syms[0]);
assert(generic(p->kids[1]->op) == CNST);
src = ireg[getregnum(p->kids[0])];
shift = p->kids[1]->syms[0]->u.c.v.u;
//dumptree(p);fputc('\n',stderr);
//fprintf(stderr,"emit2: emitrshu: emitted=%d src=%s dst=%s shift=%d &mask=0%o\n",
p->x.emitted, src->x.name, dstname,shift,andmask);
if(shift > 15 || !andmask) /* this would not normally occur */
print("clr % RSHU\n",dstname);
if (dstname != src->x.name)
print("mov %s,% RSHU\n", src->x.name, dstname);
if(shift > 0){
if(shift == 1)
print(" RSHU\nror % RSHU\n", dstname);
print("ash #%d,% RSHU\n", -shift, dstname);
//mask = ~(andmask & (0xffff >> shift)) & 0
emitband(dstname,andmask & (0xffff >> shift));
//print("bic #^o%o,% RSHU %d, &mask 0%o\n", mask, dstname, shift, andmask);
The big ball of wax: emit2().
Let's discuss each operator singly.
static void emit2(Node p) {
int op = generic(p->op);
unsigned shift,mask,size,align,i;
Symbol src,
char *m,*d,*s;
if(optype(p->op) == F){
/* Floating point operations not impl yet */
switch( op ){
//default: dumptree(p); fputc('\n',stderr); /* debugging only*/
Emit ASGNB: block copies.
There are three basic ways we can do block copies:
inline (unrolled) moves
macro which expands to a loop
subroutine call
For copies not larger than 16 bytes, we decide to unroll into moves.
(This threshold is chosen because it's about the same length
as the slower macro loop, so for shorter copies, we save neither
code space or time by using the macro.)
For larger copies, we could either expand a loop inline (macro),
or call a subroutine... the macro is 9-10 instructions
but, being inline, allows flexibility of source and destination registers.
The subroutine call would be fewer inline instructions,
but would constrain register targeting.
Since we're register starved, prefer macro for now:
$CPYB&src,dst,n
for unaligned copies,
$CPYW&src,dst,n
for aligned copies (can be odd number of bytes).
case ASGN:
//dumptree(p); fputc('\n',stderr);
if(specific(p->op) == ASGN+B){ /* copy struct, or data block */
d = ireg[getregnum(p->x.kids[0])]->x.
s = ireg[getregnum(p->x.kids[1])]->x.
size = p->syms[0]->u.c.v.u;
align = p->syms[1]->u.c.v.u;
if(align==1 || size 8)
print("$CPYB %s,%s,% ASGNB\n",s,d,size);
/* first byte is just register-indirect, not indexed */
print("movb
(%s), (%s) ; ASGNB\n",s,d);
for(i=1;i 16)
print("$CPYW %s,%s,% ASGNB\n",s,d,size);
/* first word is just register-indirect, not indexed */
print("mov
(%s), (%s) ; ASGNB\n",s,d);
for(i=2;i<=size-2;i+=2)
print("mov %d(%s),%d(%s) ; ASGNB\n",i,s,i,d);
if(size&1) /* if odd number, copy last byte */
print("movb %d(%s),%d(%s) ; ASGNB\n",i,s,i,d);
Emit ARG(BAND(X,CNST)). This is a hard-won optimisation like the examples earlier:
we want the opportunity to BIC directly on top of stack rather than on a temporary location:
if(optype(p->op) == F)
fpld(p, freg[getregnum(p->x.kids[0])]->x.name, "-(sp)");
/* stmt: ARGx2(BANDx2(reg,con)) */
//dumptree(p); fputc('\n',stderr);
assert( generic(p->kids[0]->op) == BAND
&& generic(p->kids[0]->kids[1]->op) == CNST );
src = ireg[getregnum(p->kids[0]->kids[0])];
print("mov %s,-(sp) ; ARGx(BANDx)\n",src->x.name);
emitband("(sp)", p->kids[0]->kids[1]->syms[0]->u.c.v.u);
Emit BAND(X,CNST). The idea here is simple enough. We complement the constant operand ourselves
and emit a BIC instruction. We notice BAND(CVU(X),CNST) operations as well,
because an unsigned zero-extend involves a bit mask. By combining both masks,
we save one BIC.
Ditto in the case of BAND(RSHU(X,CNST),CNST). The unsigned right shift
involves masking off sign-extended bits, so we notice this, and combine the masks.
(These unsigned operations occur in bit-field operations, so these type of optimisations
are rather important.)
Simple idea but a fair chunk of code:
case BAND:
//dumptree(p);fputc('\n',stderr);
/* also note possible optimisation of low byte extract, where source and dest differ:
MOV src, BIC ^O-400,dst = CLR BISB src,dst */
assert( generic(p->kids[1]->op) == CNST );
dst = ireg[getregnum(p)];
mask = p->kids[1]->syms[0]->u.c.v.u;
if(generic(p->kids[0]->op) == CVU ){
/* BANDx2(CVUx2(reg),con) */
/* combine AND mask with shift-clear mask */
src = ireg[getregnum(p->kids[0]->kids[0])];
//fprintf(stderr,"emit2: %s x.name,src->x.name,mask);
/* always move (to do sign extension) */
print("movb %s,% BANDx(CVUx)\n", src->x.name, dst->x.name);
emitband(dst->x.name,mask & 0377);
/* if LHS is an RSHU, we combine the two */
if(specific(p->kids[0]->op) == RSH+U){
BANDU2(RSHU2(reg,consh),con) */
//fprintf(stderr,"emit2: %s x.name,src->x.name,mask);
emitrshu(p->kids[0], dst->x.name, mask);
/* reg: BANDx(reg,con) */
src = ireg[getregnum(p->x.kids[0])];
//fprintf(stderr,"emit2: %s x.name,src->x.name,mask);
if (dst != src)
print("mov %s,% BANDx\n", src->x.name, dst->x.name);
emitband(dst->x.name,mask);
(I know of one bug in the machine description, relating to BAND(X,CNST), which I have not
flushed out yet, hence that diagnostic.)
Emit CALL(ADDRG). This is the minor optimisation of popping stacked arguments
after a CALL, mentioned earlier. If no arguments, we emit JSR if one word,
we can pop it with a compact TST&(SP)+; otherwise ADD a constant to the stack
case CALL:
/* CALLxx(ADDRGP2) */
//dumptree(p);fputc('\n',stderr);
//fprintf(stderr,"## emit2: CALL: p->syms[0]->u.c.v.u = %d\n", p->syms[0]->u.c.v.u);
assert(generic(p->kids[0]->op) == ADDRG);
print("jsr pc,% emit2: CALL\n",p->kids[0]->syms[0]->x.name);
i = p->syms[0]->u.c.v.u;
if(i == 2) print("tst (sp)+ ; pop arg\n");
else if(i == 4) print("cmp (sp)+,(sp)+ ; pop args\n");
else if(i>0) print("add #%d, pop args\n",i);
Emit RSHU(X,CNST). Emit ASH (arithmetic shift) followed by a bit mask
to zero the sign extended bits, as we mentioned earlier.
Note the following possible further optimisations (but these would be simple templates):
left shift by 8 bits = SWAB CLRB dst
right shift by 8 bits = CLRB SWAB dst
/* note the following possible optimisations:
left shift by 8 bits = SWAB CLRB dst
right shft by 8 bits = CLRB SWAB dst
these would be implemented by templates, for instance:
LSHx2(reg,con8) "swab %c\nclrb %c\n"
RSHU2(reg,con8) "clrb %c\nswab %c\n"
//dumptree(p);
/* reg: RSHx(reg,con) */
/* for unsigned, emit arithmetic shift, then mask out extended sign
assert(generic(p->kids[1]->op) == CNST);
shift = p->kids[1]->syms[0]->u.c.v.u;
dst = ireg[getregnum(p)];
if(optype(p->op) == U)
emitrshu(p, dst->x.name, ~0);
print("ash #%d,% emit2: RSHI\n",-shift,dst->x.name);
static void doarg(Node p) {
assert(p && p->syms[0]);
mkactual(2, p->syms[0]->u.c.v.i);
// Block operators not needed
static void blkfetch(int k, int off, int reg, int tmp) {}
static void blkstore(int k, int off, int reg, int tmp) {}
static void blkloop(int dreg, int doff, int sreg, int soff,int size, int tmps[]) {}
The local function gives lcc the chance to put a local variable in a register.
If this doesn't succeed, then we assign a stack offset and set its textual value
for use in instruction templates.
static void local(Symbol p) {
/* always put long locals on stack frame */
(isint(p->type) && p->type->size==4 && !p->temporary) || askregvar(p, rmap(ttob(p->type))) == 0)
assert(p->sclass == AUTO);
offset = roundup(offset + p->type->size,2);
p->x.offset = -
/* note, all offsets must have sign in front, so we can pull dirty tricks
with assembler address arithmetic later. */
p->x.name = stringf("-%d",offset);
//fprintf(stderr,"local(\"%s\") offset=%d\n",p->name,offset);
This is a biggie. Fairly self-explanatory, through wonderful ASCII art.
We work out the stack frame offsets of all formal parameters here, then emit:
function prologue: set up frame pointer and save all used registers
function body, and
epilogue to restore registers and frame pointer value.
We are not concerned w the function body
will have emitted any necessary instructions.
The only point of note here is that R0 is hardwired as unused, since it's
used for function result anyway and obviously must not be preserved to caller.
static void function(Symbol f, Symbol caller[], Symbol callee[], int n) {
int i,nreg,nfreg,
usedmask[IREG] = usedmask[FREG] = 0;
freemask[IREG] = INTTMP|INTVAR;
freemask[FREG] = FLTTMP|FLTVAR;
maxoffset = offset = maxargoffset = 0;
/* Determine offsets of parameters relative
to frame pointer (set up in prologue).
This assumes right-to-left pushing of parameters,
to facilitate variable-argument functions like printf.
With right-to-left, we always know the address
of the first parameter, because it is the last pushed,
so has highest address on stack and a fixed offset (4)
relative to frame pointer. */
offset = 4;
/* allow for link register word @ 0(SP), and saved FP, after JSR */
for (nargs = 0; callee[nargs]; nargs++) {
Symbol p = callee[nargs];
Symbol q = caller[nargs];
p->x.offset = q->x.offset =
p->x.name = q->x.name = stringf("%d", p->x.offset);
p->sclass = q->sclass = AUTO;
//fprintf(stderr,"callee/caller[%d]: %s offset=%d\n", nargs,p->name,offset);
offset += roundup(q->type->size,2);
// PDP-11 system stack is always word aligned
assert(caller[nargs] == 0);
offset = maxoffset = 0;
//fputs("...calling gencode()\n",stderr);
gencode(caller, callee);
framesize = roundup(maxoffset,2);
// prologue...
layout of system stack frame:
higher addresses
+---------------+
+---------------+
<- FP+2 = SP after JSR
+===============+
<- FP after prologue
+---------------+
framesize \ |
+---------------+
saved regs
print("%s:\n",f->x.name);
if(nargs||framesize){
print("mov fp,-(sp) ; save frame pointer\n");
print("mov sp,fp\n"); /* setup frame pointer */
if(framesize == 2)
print("clr -(sp) ; alloc stack frame\n");
else if(framesize > 2)
print("sub #%d, alloc stack frame\n",framesize);
/* don't bother saving scratch registers,
which are assumed to include return register (R0) */
usedmask[IREG] &= ~INTTMP;
fused = usedmask[FREG];
usedmask[FREG] &= ~FLTTMP;
/* save used registers on stack, below stack frame */
fprintf(stderr,"%16s used=",f->name);
for( i=usedmask[IREG],nreg=0 ; i>>=1,++nreg )
fputc(' ',stderr);
fputs(ireg[nreg]->x.name,stderr);
print("mov %s,-(sp) ; save\n",ireg[nreg]->x.name);
",stderr);
/* note we also need to save and restore FPP state,
if any floating point is done in the function */
// call front end to emit function body
//fputs("...calling emitcode()\n",stderr);
print(";\n");
emitcode();
// epilogue...
print(";\n");
/* restore used registers, in reverse order from save */
for(i=usedmask[IREG] ; nreg-- ; )
if(i & (1<x.name);
if(nargs||framesize){
if(framesize) print("mov fp, pop stack frame\n");
print("mov (sp)+, restore frame pointer\n");
print("rts pc\n\n");
Simple hack here: MACRO-11 restricts identifiers to A-Z, 0-9, '.' and '$'. C obviously
allows underscores, so we do a simple-minded transliteration. It should be noted
that MACRO-11 (and moreso the linker) will complain about identifiers that are not
distinct in the first six characters. This will likely mandate changes to code
that is ported to this target (abbreviation is good for the soul...)
static void defsymbol(Symbol p) {
if(p->scope >= LOCAL && p->sclass == STATIC)
p->x.name = stringf("%d$", genlabel(1));
else if (p->generated)
p->x.name = stringf("%s$", p->name);
else if(p->scope == CONSTANTS && isint(p->type)){
/* make sure constants are literal decimal (not hex or octal) */
p->x.name = isunsigned(p->type) ? stringf("%u",p->u.c.v.u) : stringf("%d",p->u.c.v.i);
}else if(p->scope == GLOBAL || p->sclass == EXTERN){
/* underscores not allowed in MACRO-11 replace with $ */
q = p->x.name = strdup(p->name);
for( ; * ++q )
if(*q == '_')
//fprintf(stderr,"defsymbol(%s = %s)\n",p->x.name,p->name);
I can think of nothing better than to quote the lcc bible here.
initialize p to a symbol that represents an address of the form x+n,
where x is the address represented by q and n is positive or negative.
Like defsymbol, address initializes p->x, but it does so based on
the values of q->x and n. A typical address adds q's stack offset to n
for locals and parameters, and sets p->x.name to q->x.name
concatenated with +n or -n for other variables.
... address accepts globals, parameters, and locals, and is called
only after these symbols have been initialized by defsymbol,
function, or local. Also, address is optional:
If the function pointer
in the interface record is null, the front end will not call address.
Is that clear?
static void address(Symbol q, Symbol p, long n) {
q->x.offset = p->x.offset +
q->x.name = stringf("%s%s%d",p->x.name,nx.name = stringf("%s%d%s%s", nx.name[0]=='+' || p->x.name[0]=='-')?"":"+",p->x.name);
//fprintf(stderr,"address(%s, %s, %d)\n",q->x.name,p->x.name,n);
Nothing too tricky about the remaining functions.
static void defconst(int suffix, int size, Value v) {
long val = suffix==I ? v.i : v.u;
char s[32];
switch(suffix){
sprintf(s,"%.18e",v.d);
print(".FLT%d %s\n",size,s);
if(size==1)
print(".BYTE %d\n.EVEN\n",v.i);
else if(size==2)
print(".WORD %d\n",v.i);
else if(size==4)
print(".WORD % lo\n.WORD % hi\n",v.u & 0xffff,v.u>>16);
print(".WORD ^o% ptr\n",v.p);
static void defaddress(Symbol p) {
print(".WORD %s\n",p->x.name);
Don't be tempted to change this to a "string" directive (such as .ASCII):
it can cause serious problems. For instance, some PDP-11 console environments
fold case from the terminal. If we emit a string, the contents of the string get folded.
If we emit numeric constants (as below), any such insidious data modification is prevented.
This could be a crucial point for strings where changing case is not merely cosmetic
(very easy to imagine examples).
Only put 8 bytes per line because that fits MACRO-11 listings nicely.
static void defstring(int n, char *str) {
for(i=0;ix.name);
static void import(Symbol p) {
Simply using a double colon (e.g. MAIN::) makes an identifier global, but then we lose the ability
to distinguish identifiers that are not supposed to be global (e.g.&statics).
So we assume all identifiers are local to the module, and allow lcc to call global
above for those that are not static.
static void global(Symbol p) {
print("%s:\n", p->x.name);
static void space(int n) {
print(".REPT %d\n.BYTE 0\n.ENDR\n",n);
print(".EVEN\n");
The mother ship: The Interface Record. This defines the sizes and alignments of our types,
and other machine specific parameters. In particular we specify that function arguments
are pushed right-to-left, which facilitates variable-argument calling (such as printf).
for details
of the interface record.
Interface pdp11IR = {
/* char */
/* short */
/* long */
/* long long */
/* float */
/* double */
/* long double */
/* struct */
/* little_endian */
/* mulops_calls */
/* wants_callb */
/* wants_argb */
/* left_to_right */
/* wants_dag */
/* unsigned_char */
defaddress,
defstring,
defsymbol,
0, 0, 0, 0, 0, 0, 0,
1, /* max_unaligned_load */
blkfetch, blkstore, blkloop,
_templates,
_isinstruction,
That's all folks, until I get floating point and long arithmetic sorted out.
I have a couple more targets in the wings too.
How to integrate with lcc
(These instructions are not needed if you simply check out the complete .)
Download and extract .
All instructions assume current directory is the unpacked lcc-4.2 directory.
Place pdp11.md in src subdirectory.
Some changes must be made to the makefile:
Optionally, you can change the compiler flags to add optimisation, and suppress a lot of build warnings:
CFLAGS=-O2 -Wno-long-double
To the RCCOBJS list, add pdp11 object like so:
RCCOBJS=$Balloc$O \
$Bbind$O \
... many lines omitted ...
$Bx86linux$O \
(All lines in the list should end with '\' except the last.)
After the $Bx86linux$O: line, add a line to compile pdp11.c:
$Bx86linux$O:
$Bx86linux.c;
$(CC) $(CFLAGS) -c -Isrc -o $@ $Bx86linux.c
$Bpdp11$O: $Bpdp11.c; $(CC) $(CFLAGS) -c -Isrc -o $@ $Bpdp11.c
And similarly to process pdp11.md:
$Bx86linux.c:
$Blburg$E src/x86linux. $Blburg src/x86linux.md $@
$Bpdp11.c: $Blburg$E src/pdp11. $Blburg src/pdp11.md $@
Modify src/bind.c after the line for bytecode:
xx(bytecode,
bytecodeIR) \
pdp11IR) \
It may be necessary to remove keyword "static" from line 80 of
src/bytecode.c,
because pdp11.md uses the dumptree call to aid debugging.
void dumptree(Node p) {
Also, a custom "HOSTFILE" for the platform is required. My etc/pdp11.c is unfinished,
but sufficient to allow cross-compile testing of the back-end. It belongs in the etc subdirectory
of the lcc-4.2 directory. I also create a pdp11 directory where the objects and executables live.
A build session looks like this:
mkdir pdp11
export LCCDIR=`pwd`/pdp11
export BUILDDIR=$LCCDIR
HOSTFILE=etc/pdp11.c make lcc
pdp11/lcc -S test.c -target=pdp11
(If you build the compiler under OS&X and get "Bad system call" when you try to run it,
I can't help you. Workaround: use Linux.)
COUNTBITS:
MOV FP,-(SP) ; SAVE FRAME POINTER
MOV R2,-(SP) ; SAVE
MOV R3,-(SP) ; SAVE
MOV 4(FP),R3 ; REG <- OPD
MOV R2,R0 ; LOADU2
MOV (SP)+,R3 ; RESTORE
MOV (SP)+,R2 ; RESTORE
MOV (SP)+,FP ; RESTORE FRAME POINTER
( a longer example, including a test run under RT-11.)}

叫阿莫西中心