HOWTO Compile BREW applets with WinARM 4.1.0
(and other applet startup topics)

Original: 11 May 06

Update: 30 Jul 06 - fixed an alignment bug in the armelf.brew script

Update: 01 Aug 06:
See the folks at Qualcomm just released a brand-new super-duper elf2mod utility that looks to be the future. Unfortunately, I don't have time to investigate this right now, but I would like to think this web-site shamed them into doing something! (Along with the need to support BREW 4, control their executable format, etc.)

Anyway, I will leave this info here for a while, but remember, what is described here only works with the OLD BREWelf2mod.

Update: 29 Apr 07:
Hey, global variables are nice, but global static C++ objects are nicer. Richard Willis over at Openwave Systems was generous enough to contribute a short C++ module and modified armelf.brew file that arranges to get all the constructors of your global objects called before your GCC compiled BREW applet starts running (and the destructors fired on the way out). The three files (and a zip of the three files) are here. The comments in the CPP file have a nice description of how it all works. Many thanks to Richard and Openwave.

Update: 11 Feb 2010: I am working with BREW 4 using RVCT 3, and have just put up an article about how to fix an armlink error about a missing main() function (along with other RVCT notes).


This post is a HOWTO of information describing how I was able to build (several) working brew MOD files using the WinARM 4.1.0 toolchain. It provides a step-by-step guide, a custom BREW-specific linker script for the GNU ld linker, and other tools.

I hope this information saves you headaches, but unfortunately I can't support you and the standard disclaimers apply. I've tried my best to be accurate, but a lot of this area is poorly or un-documented, so sometimes I guess and sometimes guess wrong (or only half-right).

Caveat developer!

We do C++ work in the native Win32 environment, so my steps are skewed to that environment. I have compiled a project of over 400 source files resulting in a MOD of over 1 MB.

So, onward, through the fog...

WHY WOULD YOU WANT TO DO THIS?

The gnude toolchain most folks seem to be using (as BREWelf2mod was originally developed against it) is now 3 years old, and its ARM code generation is, by today's standards, poor. This (and the way BREWelf2mod packages up the output MOD file) means that GNU mod files have gotten a reputation "for being at least 20% larger than ADS!" Yeah, but this situation changes if you use a recent compiler, plus:

At the end of this post I'll review how module loading works and the current state of BREW C++ toolchains (as I understand it).

WHAT DOES NOT WORK?

When I originally wrote this, THUMB and -O2 and -Os optimization did not work. These problems now seem to be solved. So now the only things that don't work are the usual suspects: RTTI, GNU exceptions, maybe floating point depending on your set up. Although I've heard talk some folks are using exceptions -- and have modified the linker script available here to support it...but I don't know the details.

SIZE BENCHMARKS

On 10May06 I grabbed every Win32 GNU ARM cross-compiler I could find and used it to compile our monster load of legacy C++. This is a big project that consists of about 480 source files and makes a MOD that is around 1MB in size. A big program seemed like a good test of compiler performance.

The compilers used were the gnude 3.3.1 that everyone knows and loves, the 2005q5 3.4.4 sourcery release and the (new) 2006q1 4.1.0 sourcery release, and WinARM. I linked against the libraries that came with each package, and linked with the armelf.brew custom linker script.

The winner (for me at least) is WinARM 4.1.0 thumb, and this is probably your best bet right now if you want to use the GNU toolchain. Using a 4.x series GCC also gives you the possibility of greatest improvements to the code generator and compiler down the road, since these compilers use the new parse-tree machinery that provides a much richer intermediate program representation to the compiler back-end.

I can't use ADS without making code changes since key parts of this program use function pointer tables (that are not -ropi compliant). Thus I have never been able to get an ADS 1.2 thumb version of this program to link. I would love to have that number to see how GNU 4.1.0 is comparing, but it would be a side-trip now since I've decided to go with GNU for our work here.

This just (11Jul06) in from one of the folks I've corresponded with privately about this:

Hi Ward,

Long time no talk. I've been busy with lots of build system-related
tasks around here. Yesterday my lead and a team member were saying how
some problem they needed to fix with our build could be done with global
vars, but since they were on the old build system, they were stuck with
ADS. So I decided to bump up getting your linker script working with
WinARM (even downloaded the new 20060606 distro).

Happy to say that it works great... and our .mod file is actually
smaller than it was with ADS (165k vs 170k). The only thing you may be
missing on your Web page is mention of the '-Os' flag for compiling
(without it, the mod size was 310k). Got it running on a Kyocera KX18
phone, even. So I know it's good!

This is great news and the first comfirmation I have that 1) the linker script works with the latest WinARM, and 2) THE OUTPUT MOD FILE CAN BE SMALLER THAN THE EQUIVALENT IN ADS

But, back to my benchmark: gnude won't compile our program in thumb mode, so I can't compare gnude thumb to WinARM thumb, which is disappointing. The best I can do is compare gnude 3.3.1 ARM optimized (1,283,688) to WinARM ARM 4.1.0 optimized (1,280,064) a whopping 3,624 byte saving (1%).

Here are the results. The columns are toolchain name, GCC version, instruction set used, optimization settings, size of exe/elf file output, size of mod file created, number of text relocations, number of data relocations, and the total number of relocations. (Relocations are worth looking at because GNU mod files carry a table of them around and I wanted to know about any radical deltas here.)

By MOD size:

gnude    GCC 3.3.1, MCL DEBUG, THM,   -Os           can't build because of GAS errors  [1]

sourcery GCC 4.1.0, ARM,   no optimize   exe: 3,660,618  mod: 2,267,416  trel: 66,319 drel: 7,892 total 74,211
WinARM   GCC 4.1.0, ARM,   no optimize   exe: 3,565,434  mod: 2,175,344  trel: 66,329 drel: 7,892 total 74,221
gnude    GCC 3.3.1, ARM,   no optimize   exe: 3,241,496  mod: 1,956,168  trel: 87,533 drel: 7,488 total 95,021
sourcery GCC 3.4.4, ARM,   no optimize   exe: 3,180,438  mod: 1,802.540  trel: 70,388 drel: 7,520 total 77,908 [2] [3] [4]
sourcery GCC 4.1.0, ARM,   -Os           exe: 2,662,110  mod: 1,373,984  trel: 58,508 drel: 7,891 total 66,399
sourcery GCC 3.4.4, ARM,   -Os           exe: 2,565,530  mod: 1,356,432  trel: 62,201 drel: 7,526 total 69,727 [2] [3] [4]
gnude    GCC 3.3.1, ARM,   -Os           exe: 2,373,336  mod: 1,283,688  trel: 73,233 drel: 7,405 total 80,638
WinARM   GCC 4.1.0, ARM,   -Os           exe: 2,566,422  mod: 1,280,064  trel: 58,571 drel: 7,890 total 66,461
sourcery GCC 3.4.4, THM,   -Os           exe: 2,250,203  mod: 1,031,804  trel: 61,586 drel: 8,116 total 69,702 [2] [3] [4] 
sourcery GCC 4.1.0, THM,   -Os --use-blx exe: 2,349,073  mod: 1,031,524  trel: 58,510 drel: 8,477 total 66,987
WinARM   GCC 4.1.0, THM,   -Os --use-blx exe: 2,303,839  mod:   969,699  trel: 58,170 drel: 8,476 total 66,646

By Toolchain:

gnude    GCC 3.3.1, ARM,   no optimize   exe: 3,241,496  mod: 1,956,168  trel: 87,533 drel: 7,488 total 95,021
gnude    GCC 3.3.1, ARM,   -Os           exe: 2,373,336  mod: 1,283,688  trel: 73,233 drel: 7,405 total 80,638
gnude    GCC 3.3.1, THM,   -Os           can't build because of GAS errors  [1]
sourcery GCC 3.4.4, ARM,   no optimize   exe: 3,180,438  mod: 1,802.540  trel: 70,388 drel: 7,520 total 77,908 [2] [3] [4]
sourcery GCC 3.4.4, ARM,   -Os           exe: 2,565,530  mod: 1,356,432  trel: 62,201 drel: 7,526 total 69,727 [2] [3] [4]
sourcery GCC 3.4.4, THM,   -Os           exe: 2,250,203  mod: 1,031,804  trel: 61,586 drel: 8,116 total 69,702 [2] [3] [4] 
WinARM   GCC 4.1.0, ARM,   no optimize   exe: 3,565,434  mod: 2,175,344  trel: 66,329 drel: 7,892 total 74,221 
WinARM   GCC 4.1.0, ARM,   -Os           exe: 2,566,422  mod: 1,280,064  trel: 58,571 drel: 7,890 total 66,461
WinARM   GCC 4.1.0, THM,   -Os --use-blx exe: 2,303,839  mod:   969,699  trel: 58,170 drel: 8,476 total 66,646
sourcery GCC 4.1.0, ARM,   no optimize   exe: 3,660,618  mod: 2,267,416  trel: 66,319 drel: 7,892 total 74,211
sourcery GCC 4.1.0, ARM,   -Os           exe: 2,662,110  mod: 1,373,984  trel: 58,508 drel: 7,891 total 66,399
sourcery GCC 4.1.0, THM,   -Os --use-blx exe: 2,349,073  mod: 1,031,524  trel: 58,510 drel: 8,477 total 66,987


1. Compiling for ARM with GNU C++: .\src\jobs\CXmlReaderSinkJob.cpp
   /Temp/cc7YEQ2s.s: Assembler messages:
   /Temp/cc7YEQ2s.s:681: Error: immediate value out of range -- `sub r0,r0,#588'
   /Temp/cc7YEQ2s.s:732: Error: immediate value out of range -- `sub r0,r0,#588'

2. Used experimental BREWe2modFilter to re-map R_ARM_JUMP24 and R_ARM_CALL to something
   BREWelf2mod could eat. Have _not_ verified image actually runs.

3. Not all versions of GNU like an include path that ends in \ (i.e. -I \myincs\) 
   Solution is to specify -I \myincs or -I \myincs\.

4. Compiling seemed slow for long include paths

Some observations:

  1. When you turn off optimization on a 4.x series GCC, you REALLY turn it off. The resulting output is huge.
  2. The sourcery chains are still an also-ran, though I want to like them better because I think those folks are doing great work. The problem is they track the EABI closely and their tools emit all kinds of fancy new relocations BREWelf2mod does not understand. Using my objdump filter (that I updated yesterday) I can re-map these to (what are probably) equivalent relocations for BREWelf2mod, but I HAVEN'T ACTUALLY RUN ANY OF THE SOURCERY-PRODUCED MODULES so I don't know if they really work. Since the sourcery output was (slightly) larger too, I did not pursue it further.
    It is interesting to speculate why the sourcery 4.1.0 output is larger than the WinARM 4.1.0 output. If I had to speculate I would assume it has to do with the way the runtime libraries are compiled and any unique EABI support they put in their back-end.
  3. When you turn on thumb, the number of data relocations spikes, but the overall total is about the same and the module size overall is still smaller than any ARM version.

So that's it. The smallest module size that I have verified actually seems to run OK, is WinARM 4.1.0 thumb.

Now its off to keep working and developing with this compiler and discover its other bugs and idiosyncrasies!

STEP-BY-STEP

1. Download the latest WinARM (4.1.0 at this writing) from http://gandalf.arubi.uni-kl.de/avr_projects/arm_projects/#winarm. I used this because it was the most recent one I could find that ran natively under Win32 (no CygWin) and was pre-built.

2. Unpack.

3. Compile your C++ sources with:

   \winarm\bin\arm-elf-g++.exe 
       -mlittle-endian       // or -mbig-endian if that's your target
       -mcpu=arm7tdmi        // or the right CPU for you, this is common
       -mapcs-frame          // use the "standard" arm procedure call standard
                             // (linked calling frames on the stack)
       -fno-builtin          // Don't generate inline versions memcpy(), memset()
                             // etc. Both because they may make the code larger, and
                             // because you need to call the BREW versions MEMCPY()
                             // and so forth.

       -ffunction-sections   // put each function in a separate .text._foo section
                             // this lets the linker throw away ones that are not
                             // called when it "garbage collects" (drops empty)
                             // sections. The default behavior for GCC ld 4.x seems 
                             // to be to garbage collect, so the old --gc-sections switch
                             // is no longer necessary or recognized. (But, the
                             // new --no-gc-sections switch is, if you need to 
                             // keep an uncalled section around for some reason.
                             // But for this, changing the linker script is 
                             // probably a better choice.)
                             // Having each function in a separate section also
                             // lets the linker move callers and callees closer
                             // together (locality improvement) so shorter
                             // instructions can be used. (I don't know how much
                             // of an improvement this is in non -fpic code, tho)

       -fno-exceptions       // nope, you can't have C++ exceptions. And if you
                             // try it you'll suck in a bunch of library stuff
                             // that won't link.

       -fno-unwind-tables    // When exceptions are thrown the stack needs to
                             // "unwound" and destructors called. These tables
			     // are not supported.

       -fno-rtti             // Nope, no runtime type identification either, so
                             // no new-fnagled dynamic_cast<>()-ing either. 

        // All of these -fno-xxx things stop the compiler from pulling in library stuff and 
        // using globals. It may be possible to support them at a later time with open-source
        // loaders and a good understanding of run-time internals.


       -DDEBUG               // If you need it

       -DDYNAMIC_APP         // Everyone includes this in applets, but I never
                             // see it used in the BREW headers (I think) Is it
                             // vestiagl? Or does it do something?

       -I "C:\Program Files\BREW 3.1.5\sdk\inc"  // and other include paths...
                                                 // repeat the -I for each one

       -o applet.o           // Your output (EABI ELF) file

       -c applet.c           // Your input source

If you want the smallest executable, optimize for size and turn on thumb (More details on thumb below)

       -Os                   // Optimize for size (may make debugging harder)
       
          // And turn on thumb and compile everything as thumb. Including AEEAppGen 
         // and AEEModGen. 

       -mthumb                         // enable thum instruction set
       -mtpcs-frame                    // this sets the thumb stack frame, use this INSTEAD of
                                       // -mapcs-frame above         
       --mcallee-super-interworking    // this is magic that makes thumb work

4. Compile your C sources with:

   \winarm\bin\arm-elf-gcc.exe 
       -mlittle-endian
       -mcpu=arm7tdmi
       -mapcs-frame
       -fno-builtin
       -ffunction-sections
       -DDEBUG 
       -DDYNAMIC_APP 
       -I "C:\Program Files\BREW 3.1.5\sdk\inc"  
       -o AEEModGen.o -c AEEModGen.c

You should use the C compiler for both AEEModGen.o and AEEAppGen.o. Throw in the same -Os, -mthumb, -mcallee-super-interworking and -mtpcs-frame instead of -mapcs-frame if you are "going small."

If you are compiling thumb, my current recommendation is to compile everything as thumb (including AEEModGen.c and AEEAppGen.c) and use the --mcallee-super-interworking switch to transition to thumb mode. (I could not get the traditional method of interwork building and compiling AEEModGen and AEEAppGen in ARM mode to work.)

To expand a bit, most folks seem to recommend doing something like this:

      AEEAppGen.c       -   -mthumb-interwork -mapcs-frame
      AEEModGen.c       -   -mthumb-interwork -mapcs-frame

      RestOfApplet.cpp  -   -mthumb-interwork -mtpcs-frame -mthumb

      Link with:  4.1.10/thumb/interwork/gcc

...the thinking being that AEE would call the ARM code in AEEModGen and we'd flip over to thumb on the call to (the C++) AEEClsCreateInstance(). The return from AEEClsCreateInstance would then be smart enough to return to ARM mode on exit.

Well, it seems that call (first switch to thumb) was failing and nothing I did could make it go.

What wound up working was to compile EVERYTHING in thumb:

      AEEAppGen.c       -   -mthumb -mtpcs-frame -mcallee-super-interworking 
      AEEModGen.c       -   -mthumb -mtpcs-frame -mcallee-super-interworking

      RestOfApplet.cpp  -   -mthumb -mtpcs-frame

      Link with:  4.1.10/thumb/gcc

The "callee-super-interworking" switch inserts some preamble code to switch into thumb mode at the start of the public functions AEE calls, so you don't have to compile AEEAppGen or AEEModGen in ARM mode. In fact, everything in the applet can be in thumb.

Callbacks from BREW seem to be working too (we do some socket stuff). I was worried about this.

There is one trick to make a mod. As I will explain, BREWelf2Mod does not understand R_ARM_THM_CALL. To get around this, I used the gnude objdump utility (that emits the old names):

      BREWelf2mod  test.elf test.mod \gnude\bin\arm-elf-objdump.exe
                                      ^^^^^

Or, you can use my wrapper program, which will also make your mod file a little smaller as well.

So....I'm not an expert, and I don't have a debugger for my target. I don't understand why more "traditional" approach doesn't work, but this workaround seems to go OK at the moment.

5. Link.

This is where things get tricky. The idea is to make an ELF file that can be fed to BREWelf2mod. Now, BREWelf2mod needs to see some things in the ELF file, or it will gack. Specifically:

So...if you do something mad like trying to use templates in your C++, this might bite you. Be careful.

Anyway, back to linking. To get the rellocations to be placed in the ELF output, use --emit-relocs.

Now, traditionally, to get the .text scection loaded at zero you had to use "-T text 0." And to quiet a linker warning you had to provide an entry point, AEEMod_Load(), with "-entry AEEMod_Load." And to get all the rellocations together, sorted and in the right place in the output, you either had to use the right (.xc) linker script -- that, thankfully, gnude used by default OR force it to use the (.xc) script with a -zcombreloc switch when using other toolchains (Sourcery).

And to get AEEModGen.o first you had to stick it first on the command line with something like this:

AEEModLoad.o AEEAppGen.o  -( onelib twolib threelib -)

Note: unlike a lot of other linkers, GNU only searchs libraries once unless you "group" them -- that what the -( and -) switches above do to the example libraries "onelib" "twolib" and "threelib."

AND...if you turned on -Os or -O2 optimization, GNU would reorder your sections (if you compiled with -ffunction-sections, as you shoould) to improve "locality." So folks wound up putting AEEMod_Load() in a separate file and linking it first to get around this.

Finally, to be sure a .bss and .data section were in your output, you didn't have to do ANYTHING -- if you were a good BREW programmer they would both be empty, but the old gnude chain would keep them around anyway. Not so with 4.1.0, it drops them faster than you can say "empty section" -- so BREWelf2mod would get confused and make ENORMOUS output files.....

The way to address this is with a "linker script." This program that tells ld how to take the input files and arrange them in the output. (Equivalent to ADS scatter files). In that script, you can do everything you can do on the command line PLUS, keep the .data and .bss sections around no matter what. Very handy.

And...as long as I was writing a linker script, I figured I'd automatically load AEEModGen.o first, set the text section offset to zero, and provide an entry point so you don't have to.

So, given we have a custom linker script, the link command becomes:

\winarm\bin\arm-elf-ld.exe 
    --script armelf.brew                   // req: use custom script
    --emit-relocs                          // req: BREWelf2mod needs the relocation info
    --verbose                              // opt: be verbose, handy
    --no-warn-mismatch                     // opt: you shouldn't have mismatch warnings, esp
                                           // if not generating thumb code. Usually this warns
                                           // you try to glue an interworking supporting object
                                           // to one that does not, or ld _thinks_ does not...
    -Map "wardtest.map" --cref             // opt: make a cross referenced map if you need it
    -L \winARM\lib\gcc\arm-elf\4.1.0\      // this is where to look if building ARM. If thumb
                                           // you'll have mixed ARM and thumb calling the library
                                           // and need to go down into the interworking and 
                                           // interworking/thumb directories.
    -lgcc                                  // This is the only one I've needed to link everything
    -o "wardtest.elf"                      // output file
    AEEAppGen.o                            // input file
    applet.o                               // input file

Note that I did not specify AEEModGen.o. The linker script takes care of this. See next section.

About __cxa_pure_virtual, memcpy(), memset(): The compiler generates these by itself and they need to be satisfied. Qualcomm provides a module called GCCResolver.o to do this, or you can just define them in your code. __cxa_pure_virtual needs to do nothing. memcpy(), memset(), strlen() (or any others) need to call their BREW AEEStdLib.h equivalents.

That's right, GCCResolver is not from GNU and has nothing to do with DNS!

Here is a more complicated input file scheme

\winARM\bin\arm-elf-ld.exe
    --verbose
    --no-warn-mismatch 
    -Map "myapp.map" --cref 
    --emit-relocs 
    -L "mylibpath1" -L "mylibpath2" ...   // quotes optional 
    --script armelf.brew 
    -o "myapp.elf"  AEEAppGen.o -( -l LONE -l LTWO -l LTHREE -l LFOUR -l gcc  -)

This says that somewhere in "mylibpaths" I have 4 libraries: libONE.a, libTWO.a, libTHREE.a, libFOUR.a and libgcc.a.

Note again I did not specify AEEModGen.o -- don't have to. The armelf.brew linker script brings it in for me.

6. Run BREWelf2mod and pray.

I talk about BREWelf2mod below. The key thing is that it strips all the ELF stuff off your image, squirts out your code and a table of relocations and slaps a 0x9C long startup routine onto your file before AEEMod_Load().

To get the relocations, it exec()s arm-elf-objdump and processes the ASCII list of reloactions that come out.

Now, BREWelf2mod and the loader only understand a couple of rellocation types, specifically R_ARM_PC24 and R_ARM_ABS32. If you have compiled straight ARM code all the way, that should be all that is in your objdump output. So you can use the objdump that comes with the WinARM toolchain:

     BREWelf2mod  myapp.elf myapp.mod \winARM\bin\arm-elf-objdump.exe 

If you build thumb, the new toolchain's objdump emit an R_THM_CALL relocation instead of the old name for this, R_ARM_THM_PC22. You can get around this by specifying an old version of objdump -- like the one from gnude:

     BREWelf2mod  myapp.elf myapp.mod \gnude\bin\arm-elf-objdump.exe 

Or I have just written a filter program to 1) transform the output of WinARM's objdump so that BREWelf2mod can understand and 2) drop the __cxa_pure_virtual relocations. The (simple) source code is here. Just stick it in Visual Studio and compile as a Win32 console app. Instructions for how to deploy it are in the comment at the start of the source. (If you trust me not to be giving you mal-ware, my pre-compiled debug version (from VS Express 2005) is here. (Just save this link to disk.)

...or..writing a wrapper program, or maybe just patching the string in the BREWelf2mod.exe directly.

BTW, I am using BREWElf2Mod of 15 March 2004, at 52,248 bytes.

7. Download your module with Apploader and start debugging.

THE LINKER SCRIPT

This is my first linker script, so I may have made stupid mistakes, but so far it is working. I used http://www.redhat.com/docs/manuals/enterprise/RHEL-4-Manual/gnu-linker/script-format.html to learn about this. Hopefully it is not too out of date.

My philosophy was to start with nothing and add stuff I needed or seemed "nice to have" -- this way I understand the reason for everything in there. The default GNU linker scripts are so terrifying because they include everything for every platform. We don't need that.

Here is the linker script. Save this link to disk in a file called "armelf.brew" and call it with the --script switch to ld as shown above.

DEVELOPMENT TOOLS OVERIVEW

I already posted this elsewhere, but I've updated and here is my read as of early May 2006:

GNU Gnude:
  Pro:
    Works
  Cons:
    Thumb broken
    Big executable
    Old compiler

GNU WinARM:
   Pro:
     Works
     Recent compiler (the June 06 release works)
     Smaller image (even than ADS it appears, maybe...)
     Thumb Works  (finally)
   Cons:
     None specific

All GNU in general:
     Pro:
        Free, open source
        Rellocation fixup of mod by startup code means
          tables of function pointers and string
          constant addresses supported, fast
          vtable dispatching, and unitialized
          global variables (possibly) supported in 
          .BSS. Also C++ vtable mapping schemes like
          "lightblue" supported.
     Cons:
         Half-hearted Qualcomm support
         Crummy documentation

ADS 1.2:
    Pro:
      thumb works, small mod
    Cons:
      -ropi model (no load-time fixup) means no tables of function ptrs,
      tables of string constants,  inefficient
      calling to far targets, no global variables.
      Slow win32 compiles (maybe local phenomenon)
	  Maybe makes bigger mods than GNU!
         
    Note: lightblue has plans to fix this by persuing a scheme
    similar to that used for gnu: that is, using a special 
    program to process and slapping run-time fixup code on
    the start.

RVDS 2.x:
    Pro:
       recent, "the future"
    Cons:
	   vtable implementation not -ropi compliant so
       (pure) virtuals don't work ??? Under guise of
       Green Hills "embedded C++" standard?

TUTORIAL ON LINKING, LOCATING AND LOADING IN BREW AND ELSEWHERE

When you compile a program with GCC ARM the resulting file is in a format called ELF or ARM-ELF. This stands for "executable and linking format" -- a souped-up version of the old "Portable Executable" (PE) format still used on Win32 (that superceeded earlier 16 bits formats such as the New Executable (NE) Linear Executable (LE) a.out, and old MS-DOS files (MZ -- "Mark Zbikowski") files. ELF files are structured as a header and then a series of logical sections that the linker makes up. Some of these are common and the operating system loader relies on being able to find them to get a program into memory, set it up for execution, and run it. For example, the .text section contains the program code (and is generally marked read-only). The .rodata section contains "read-only" or rommable data -- string constants, tables, "const" stuff. This is the only kind of data we like in BREW. On a desktop machine the .data section contains intialized variables, and the .bss section ("block start symbol" or "block storage start" annacronistic Unix name) contains a bunch of zeros for uninitailzed global variables (actually, it contains how big such a section should be, the bytes aren't actually in the file -- the OS loader allocs that memory and zeros it when it is preparing the program to run).

The idea of ELF as a "standard" is that many operating systems can load and run those files (Linux, Symbian, BREW). ELF is part of a larger notion of an Application Binary Interface (ABI) that specifies how executables, object files and libraries/archive files should be structures so compilers/linkers/debuggers (collectively: "toolchains") can interoperate with each other. ARM Limited blessed an ABI around 1998. This is called the "Extended/Embedded ABI" or EABI. Tool makers (including GNU) have been modifying their tools to track this. (It appears even this spec is slighty revised over time, since the latest EABI paper I have from ARM is dated January 2006. Early GNU and early ARM compilers (including ADS 1.2) did not follow the EABI spec and thus do not interoperate. The latest RVDS 2.x series of ARM compiles and recent GNU compilers (late v3 and v4) do support it and, in theory, interoperate.

On an operating system with a Memory Management Unit (MMU) (most Linux, all Win32) an application can be run at what seems (to the application) to be a fixed address. Thanks to the miracle of virtual address space mapping to physical hardware, each application can believe it is running at the same address in memory (0x80000000 for example). This makes a linker's job very easy, all it has to do is start the program counter at that number count up from there. Further, it can figure out the absolute addresses of every symbol in the program (both code and data) and stomp absolute values into the code where required. This process is called "locating" the executable.

Now with libraries (DLL, dynamic shared objects, etc.) and on BREW it gets more complicated. In the case of BREW, there is no MMU and the application will run in real memory at possibly any address. The BREW applet loader would like to be as simple and stupid as possible and just allocate some heap memory, read the applet file into it, and call the first byte. This is, in fact, what it does. The problem is, if the linker does not know the run address, how does the executable image get located -- or as we say "fixed up" -- so it can run without crashing?

We now enter the thrilling realm of rellocations. When the linker slams together all the object modules it also outputs a rellocation section(s) (.reloc) that contains a list of places of that need to have their address adjusted in some way. A simple approach (and one kind, ABS32 reloc) is to simply add the 32 bit base load address of the applet to the location. Other rellocations involve calculating offsets and adding them. It all depends on the addressing mode of the instruction being fixed up and what is required to let it find the data or code it is after.

Recall that an ELF file has a header and all the "good stuff" is tucked away in sections. This is not something you can just suck into memory and jump to. To solve this, Qualcomm (QC) provides a utility in the VSAddins directory of the SDK called BREWelf2mod. This utility does several things:

  1. It strips off the ELF header of the input file
  2. It secretly runs the GNU objdump utility to get an ASCII list of relocations in the ELF file.
  3. It runs through the rellocations and "cooks them" so they are easy to apply. In fact, it gets it down to a sorted list of places in the image that need to have the actual loaded address added to them (the addenend)
    Note that in a big program, this data can be of fair size and is one of the reasons GNU gets a rap for big images on ARM. (There are tricks that could fix this with more development.)
  4. It slaps a short (0x9C byte) startup routine on the front of the output image. This is the code that first gets control when BREW jumps to location zero in an applet. It fixes up the rellocations in the image and after that, winds up executing AEEMod__Load().

If you are curious, the layout of this startup code looks like this (from my limited poking around):

   
+----------------------------------+
| jump to relocation fixup code    |
+----------------------------------+
| offset of data                   |
| size of image minus this startup |
| offset of data                   |
| offset of AEEMod_Load or place   |
|    make jump to it               |
+----------------------------------+
|                                  |
|                                  |
|     reloaction fixup code        |
|                                  |
|                                  |
+----------------------------------+
|    mystery pointer               |
|    mystery pointer               |
|  version of BREW stdlib          |
| ptr to stdlib interface          |
+----------------------------------+
| AEEMod_Load()                    |
+\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/+

Note there is no need to explicitly compile position independent code, because this scheme is postion independent. In fact, if you do compile position independent (-fPIC) it won't work because GNU implements this by creating some "thunk sections" -- the global offset table (or .got) these are not supported by elf2mod startup code. (Thumb "interworking" calls can generate thunks too -- to switch from 16 to 32 bit instructions and back -- but that's a different thing -- and something I have not gotten to work yet anyway.)

By contrast, the ADS toolchain only compiles position independent (-ropi). This means no startup code is required to fix it up at runtime, AEE can just call the first byte and off it goes, but you can't have tables of function pointers, or mutable global variables. It also means vtable dispatching and long jumps are inefficient, having to go through "jump veneers" if the target is too far away.

Note that the LightBlue project intends to make their own elf2mod utility that will allow compiling applets that are not -ropi and fixing them up at load time as is done with GNU. This will being LightBlue vtable support, global variables, function pointer tables, string tables and efficient vtable dispatching to ADS just as is now available with GNU.

For the really curious, here is my disassembled and annotated version of the BREW gnu startup code:

\winarm\bin\arm-elf-objdump -D -m arm -b binary myApplet.mod

Module startup Code:

       0:   ea000003    b   0x14                          ; jump past 4 pointers that follow

       4:   0000e258    offset of .data section
       8:   0000eca0    size of image minust this header
       c:   0000e258    offset of .data section
      10:   00000000    offset of AEEMod_Load or place to build a jump to it

      14:   e92d00f0    stmdb   sp!, {r4, r5, r6, r7}   ; save r4-47 on the stack
      18:   e24f4020    sub r4, pc, #32 ; 0x20          ; set r4 to the first instruction
      1c:   e284509c    add r5, r4, #156    ; 0x9c      ; set r5 to pointer storage at end of startup code
      20:   e5143008    ldr r3, [r4, #-8]               ; get stdlib version number into r3?
      24:   e5053008    str r3, [r5, #-8]               ; store pointer to stdlib at end of startup code
      28:   e5143004    ldr r3, [r4, #-4]               ; get pointer to stdlib number into r3?
      2c:   e5053004    str r3, [r5, #-4]               ; store stdlib potr at end of startup code
      30:   e51f3034    ldr r3, [pc, #-52]  ; 0x4       ; get reloc table base offset into r3?
      34:   e0833005    add r3, r3, r5                  ; add offset to reloc table to end of startup ptr to get ptr in r3
      38:   e51f4038    ldr r4, [pc, #-56]  ; 0x8       ; get end of reloc offset into r4?
      3c:   e0844005    add r4, r4, r5                  ; get pointer to end of table in r4
apply_rellocations:
      40:   e1530004    cmp r3, r4                      ; cur < end ?
      44:   b4936004    ldrlt   r6, [r3], #4            ; if less than, get reloc offset into r6, then cur++
      48:   b7967005    ldrlt   r7, [r6, r5]            ; if less than, get addr to fixup into r7 [ offset/r6 + image base/r5 ]
      4c:   b0877005    addlt   r7, r7, r5              ; if less than, add base to reloc
      50:   b7867005    strlt   r7, [r6, r5]            ; if less than, put fixed-up (absolute address) back
      54:   bafffff9    blt 0x40                        ; if not done all relocs, loop

      58:   e51f305c    ldr r3, [pc, #-92]  ; 0x4       ; reset r3 to offset to base of relocs
      5c:   e0833005    add r3, r3, r5                  ; point r3 at base of relocs [ offset/r3 + image base/r5 ]
      60:   e51f4060    ldr r4, [pc, #-96]  ; 0x8       ;  get other offset  in r4 ?? end ptr? (0xeca0)
      64:   e51f6060    ldr r6, [pc, #-96]  ; 0xc       ; get end? offset to (secondary?) reloc table in r6 ? (0xe258)
      68:   e1540006    cmp r4, r6                      ; cur < end ?
      6c:   c1a04006    movgt   r4, r6                  ;  put biggest offset in r4
      70:   e0844005    add r4, r4, r5                  ; turn into pointer, add image base
      74:   e3a06000    mov r6, #0  ; 0x0               ; zero r6
zero_reloc_table_and_mebee_bss:                         ; ??? 
      78:   e1530004    cmp r3, r4                      ; is cur < end ?
      7c:   b4836004    strlt   r6, [r3], #4            ; if less than, *cur++ = 0
      80:   bafffffc    blt 0x78                        ; while !done, loop

      84:   e51f307c    ldr r3, [pc, #-124] ; 0x10      ; load offset offset of zero ptr into r3
      88:   e0833005    add r3, r3, r5                  ; add image base to make pointer
      8c:   e8bd00f0    ldmia   sp!, {r4, r5, r6, r7}   ; pop stack
      90:   e12fff13    bx  r3                          ; jump forward to AEEMod_Load, (mebee thumb OK?)
    ...
 AEEMod_Load:
      9c:   e1a0c00d    mov ip, sp
      a0:   e92dd800    stmdb   sp!, {fp, ip, lr, pc}
      a4:   e24cb004    sub fp, ip, #4  ; 0x4
      a8:   e24dd008    sub sp, sp, #8  ; 0x8
      ac:   e1a0c001    mov ip, r1

CONCLUSION

Hopes this helps and does not mislead. Good luck with your projects.

Ward Willats
Fullpower / MotionX
Makers of MotionX-GPS and MotionX-GPS-Drive for iPhone

If you'd like to e-mail me, please use <brew@wardco.com>

Valid XHTML 1.0 Strict