Intel HEX for AVR explained

Ughhh! It really took me 2 days to understand the whole picture here.  We all know that there is a hex file generated by the “Compiler” or the Arduino IDE (lets just say it now..allow me to use loose terms here, gotta explain a little below) and that HEX file is uploaded into the controller and it works like a charm.

Lets take an example, I open Arduino or any other Studio or Editor or any text editor and I write the code for LED blink.  I do some ABC-process and I got this HEX file, in case of arduino it is generated in /tmp directory which is obviously temporary, so majority of the users can’t catch it up and see it or use it.  As the uploader in Arduino IDE, uses it from that particular location and burns it all the ways in the controller’s Flash.  So many things happen in picture behind the scenes that we don’t know of, so I decided to know behind the scenes.  Here it is, how it goes.

<!-Well read only if you like Machine level /architecture level knowledge–>

My colossally wrong assumption was : The hex file so generated is transferred as it is to the controller’s flash and it works all the way.

Epiphany of Truth : “The Intel hex file is a file format that conveys binary information in ASCII form.”

And that simply means that HEX file is a container that represents the data to be uploaded in a well structured format.

Obviously I read things from Wikipedia, but that alone was not able to make me understand stuff.
http://en.wikipedia.org/wiki/Intel_HEX

I’ll try to explain things here by taking an example of Arduino’s Led blink which everyone uses for one board(bootloader type) or another.

Lets first use the Atmega 8/328 with optibootloader, and chose the board accordingly.

Compile the Led blink program and now go to /tmp Directory in your file system.  There will be a lot of folders depending on your system.  One file by the name of build*********, where * represents text depending on the file you compile.  Go inside that directory.  You will find quite a number of files including the blink’s cpp file, .o file, .elf file, .hex file etc

There is our .HEX file in that folder, so open it up in Text Editor like vim,gedit or anyone.  The simple Blink.cpp.hex file will be looking something like this

:100000003FC04EC04DC04CC04BC04AC049C048C0A4
:1000100047C0C8C045C044C043C042C041C040C042
:100020003FC03EC03DC0000000003700340031003A
:1000300000000000380035003200000000003600EB
:100040003300300004040404040404040202020225
:10005000020203030303030301020408102040808B
:100060000102040810200102040810200000000012
:100070000000000000030405000000000000000074
:1000800011241FBECFE5D4E0DEBFCDBF10E0A0E657
:10009000B0E001C01D92A936B107E1F737D141C1E7
:1000A000AFCF8DE061E023D008958DE061E043D0D3
:1000B00068EE73E080E090E0E4D08DE060E03BD05B
:1000C00068EE73E080E090E0DCD00895843039F091
:1000D000853049F0833051F48FB58F7702C08FB5EA
:1000E0008F7D8FBD089585B58F7D85BD089590E086
:1000F000FC01E85AFF4F2491FC01EC5BFF4FE491B7
:10010000EE23C1F0F0E0EE0FFF1FEA5DFF4F859197
:100110009491DC01662341F49FB7F8948C9120956B
:1001200082238C939FBF08959FB7F8948C91822B64
:100130008C939FBF08950F931F93DF93CF930F92DC
:10014000CDB7DEB7282F30E0F901E459FF4F849195
:10015000F901E85AFF4F14912C5B3F4FF9010491CC
:100160000023D1F0882319F06983B0DF6981E02F83
:10017000F0E0EE0FFF1FE05DFF4F85919491DC01F1
:100180009FB7F894662321F48C911095812302C0C7
:100190008C91812B8C939FBF0F90CF91DF911F91FA
:1001A0000F9108951F920F920FB60F9211242F9363
:1001B0003F938F939F93AF93BF93809164009091EF
:1001C0006500A0916600B0916700309168000196CB
:1001D000A11DB11D232F2D5F2D3720F02D57019626
:1001E000A11DB11D20936800809364009093650069
:1001F000A0936600B09367008091600090916100C9
:10020000A0916200B09163000196A11DB11D809381
:10021000600090936100A0936200B0936300BF916F
:10022000AF919F918F913F912F910F900FBE0F90A3
:100230001F9018959FB7F89420916000309161004D
:10024000409162005091630082B708B600FE06C07C
:100250008F3F21F02F5F3F4F4F4F5F4F9FBF542F76
:10026000432F322F2227280F311D411D511D82E0BF
:10027000220F331F441F551F8A95D1F7B901CA01B8
:100280000895EF92FF920F931F93CF93DF937B011B
:100290008C01D0DFEB010EC0CDDF6C1B7D0B83E04A
:1002A000683E780738F00894E108F108010911095F
:1002B000C851DC4FE114F1040105110569F7DF9124
:1002C000CF911F910F91FF90EF900895789483B78D
:1002D000826083BF83B7816083BF89B7816089BF34
:1002E0001EBC8EB582608EBD8EB581608EBD8FB511
:1002F00081608FBD85B5846085BD85B5806485BD11
:10030000329A319A309A379A1AB80895CF93DF9378
:10031000DDDFC7DEC0E0D0E0C8DE2097E9F370DEA5
:06032000FBCFF894FFCFB3
:00000001FF

Now, lets take the first line and see what it says,
:100000003FC04EC04DC04CC04BC04AC049C048C0A4

Now as wiki says,

A line of text consists of six fileds (parts) that appear in order from left to right:

  1. Start code, one character, an ASCII colon ‘:’.

  2. Byte count, two hex digits, indicating the number of bytes (hex digit pairs) in the data field. The maximum byte count is 255 (0xFF). 16 (0x10) and 32 (0x20) are commonly used byte counts.

  3. Address, four hex digits, representing the 16-bit beginning memory address offset of the data. The physical address of the data is computed by adding this offset to a previously established base address, thus allowing memory addressing beyond the 64 kilobyte limit of 16-bit addresses. The base address, which defaults to zero, can be changed by various types of records. Base addresses and address offsets are always expressed as big endian values.

  4. Record type (see record types below), two hex digits, 00 to 05, defining the meaning of the data field.

  5. Data, a sequence of n bytes of data, represented by 2n hex digits. Some records omit this field (n equals zero). The meaning and interpretation of data bytes depends on the application.

  6. Checksum, two hex digits, a computed value that can be used to verify the record has no errors.

=> our line is something like this, I am using space as seperator.

: 10 0000 00 (3FC0 4EC0 4DC0 4CC0 4BC0 4AC0 49C0 48C0) A4

The six parts are just seperated and I have put the data part in brackets.   So the Bootloader or the AVRDUDE in real actually plays the real trick, when we try to burn the program.
A4 is the checksum which comes after adding up all the bytes above i.e.
10+00+00+00+3f+c0+4e+c0+4d+c0+4c+c0+4b+c0+4a+c0+49+c0+48+c0=85C

so discard the 8, and get the two’s complement of 5C.
i.e. 5C ~ 0101 1100
ones compliment = 1010 0011
Two complement = adding 1 to ones compliemnt = 1010 0100 = A4

So, the fact is that the whole HEX file is read and only  the real data part in the controllers’s flash is uploaded.

I am saying this because I have tested the Flash of the controller by reading it with external controller, which I am going to show in consequent posts.

Okay, so uptil now is so much for nothing, as it is quite understandable by reading the HEX file and wikipedia.  What bothered me was this what is this data and how is it my led blink program ??

So, I started asking questions over a lot online places and finally @westfw & Sir @Nick Gammon made me understand the whole concept.

The cpp file is compiled then assembled and converted to .o file, then to .elf file using compiler, assembler and avr-objcopy and the whole toolchain.
The actual step by step process for how a program is compiled will be covered in another post.  One can see all the mid-way or in situ files in the /tmp/build** folder.

The elf file is having all the symbolic definitions and directors.  So the hexdump is of not much use directly as it contains hell lot of other data too.  Better to convert the elf file and extract the main binary data inside it.

avr-objcopy from the avr toolchain is our friend for this.  Now, issue the command like this :

beyond@beyond-HP-Pavilion:~/Downloads/Burner_AVR/build_opti_8$ avr-objcopy -O binary -R .eeprom Blink.cpp.elf Blink.bin

which will produce a binary file containing the exact data that the Flash have for running the program.

//Hexdump of Blink.bin for atmega 8 says,

beyond@beyond-HP-Pavilion:~/Downloads/Burner_AVR/build_opti_8$ hexdump -vx Blink.bin
Address : ——————DATA————————
0000000 c03f c04e c04d c04c c04b c04a c049 c048
0000010 c047 c0c8 c045 c044 c043 c042 c041 c040
0000020 c03f c03e c03d 0000 0000 0037 0034 0031
0000030 0000 0000 0038 0035 0032 0000 0000 0036
0000040 0033 0030 0404 0404 0404 0404 0202 0202
0000050 0202 0303 0303 0303 0201 0804 2010 8040
0000060 0201 0804 2010 0201 0804 2010 0000 0000
0000070 0000 0000 0300 0504 0000 0000 0000 0000
0000080 2411 be1f e5cf e0d4 bfde bfcd e010 e6a0
0000090 e0b0 c001 921d 36a9 07b1 f7e1 d137 c141
00000a0 cfaf e08d e061 d023 9508 e08d e061 d043
00000b0 ee68 e073 e080 e090 d0e4 e08d e060 d03b
00000c0 ee68 e073 e080 e090 d0dc 9508 3084 f039
00000d0 3085 f049 3083 f451 b58f 778f c002 b58f
00000e0 7d8f bd8f 9508 b585 7d8f bd85 9508 e090
00000f0 01fc 5ae8 4fff 9124 01fc 5bec 4fff 91e4
0000100 23ee f0c1 e0f0 0fee 1fff 5dea 4fff 9185
0000110 9194 01dc 2366 f441 b79f 94f8 918c 9520
0000120 2382 938c bf9f 9508 b79f 94f8 918c 2b82
0000130 938c bf9f 9508 930f 931f 93df 93cf 920f
0000140 b7cd b7de 2f28 e030 01f9 59e4 4fff 9184
0000150 01f9 5ae8 4fff 9114 5b2c 4f3f 01f9 9104
0000160 2300 f0d1 2388 f019 8369 dfb0 8169 2fe0
0000170 e0f0 0fee 1fff 5de0 4fff 9185 9194 01dc
0000180 b79f 94f8 2366 f421 918c 9510 2381 c002
0000190 918c 2b81 938c bf9f 900f 91cf 91df 911f
00001a0 910f 9508 921f 920f b60f 920f 2411 932f
00001b0 933f 938f 939f 93af 93bf 9180 0064 9190
00001c0 0065 91a0 0066 91b0 0067 9130 0068 9601
00001d0 1da1 1db1 2f23 5f2d 372d f020 572d 9601
00001e0 1da1 1db1 9320 0068 9380 0064 9390 0065
00001f0 93a0 0066 93b0 0067 9180 0060 9190 0061
0000200 91a0 0062 91b0 0063 9601 1da1 1db1 9380
0000210 0060 9390 0061 93a0 0062 93b0 0063 91bf
0000220 91af 919f 918f 913f 912f 900f be0f 900f
0000230 901f 9518 b79f 94f8 9120 0060 9130 0061
0000240 9140 0062 9150 0063 b782 b608 fe00 c006
0000250 3f8f f021 5f2f 4f3f 4f4f 4f5f bf9f 2f54
0000260 2f43 2f32 2722 0f28 1d31 1d41 1d51 e082
0000270 0f22 1f33 1f44 1f55 958a f7d1 01b9 01ca
0000280 9508 92ef 92ff 930f 931f 93cf 93df 017b
0000290 018c dfd0 01eb c00e dfcd 1b6c 0b7d e083
00002a0 3e68 0778 f038 9408 08e1 08f1 0901 0911
00002b0 51c8 4fdc 14e1 04f1 0501 0511 f769 91df
00002c0 91cf 911f 910f 90ff 90ef 9508 9478 b783
00002d0 6082 bf83 b783 6081 bf83 b789 6081 bf89
00002e0 bc1e b58e 6082 bd8e b58e 6081 bd8e b58f
00002f0 6081 bd8f b585 6084 bd85 b585 6480 bd85
0000300 9a32 9a31 9a30 9a37 b81a 9508 93cf 93df
0000310 dfdd dec7 e0c0 e0d0 dec8 9720 f3e9 de70
0000320 cffb 94f8 cfff
0000326

And the real file contents in the flash are exactly the same.

Now what is this DATA that I talk about, the REAL language that machine understands.  Okay, for that just keep in mind and if you doesn’t reember, recall it.
(even I don’t remember, its highly specific to the architecture)

At the end whatever language we use to write the code, it breaks down to the machine language.  So, your C code breaks down to simpler assembly depending on the architecture.

For ex- AVR being much like ideal RISC(obviously they are not perfectly) have got these assembly instructions as LOAD/STORE architecture, so all our program is converted to simple RISC instructions.

A little view of Objdump of our ELF file helps us understanding the Machine DATA which we have seen above in HEX file as well as BIN file both(obviously)

I am going to colour the DATA/the pertinent code that you saw above in BIN/HEX file with its machine interpretation.
Here comes the best part:

Blink.cpp.elf: file format elf32-avr

Disassembly of section .text:

####Watch the addresses here 3f c0, 4e c0 etc are same as in our DATA or bin file, as obvious being in the very start they refer to isr address of 2 bytes and vectors.####

 

00000000 <__vectors>:
0: 3f c0 rjmp .+126 ; 0x80 <__ctors_end>
2: 4e c0 rjmp .+156 ; 0xa0 <__bad_interrupt>
4: 4d c0 rjmp .+154 ; 0xa0 <__bad_interrupt>
6: 4c c0 rjmp .+152 ; 0xa0 <__bad_interrupt>
8: 4b c0 rjmp .+150 ; 0xa0 <__bad_interrupt>
a: 4a c0 rjmp .+148 ; 0xa0 <__bad_interrupt>
c: 49 c0 rjmp .+146 ; 0xa0 <__bad_interrupt>
e: 48 c0 rjmp .+144 ; 0xa0 <__bad_interrupt>
10: 47 c0 rjmp .+142 ; 0xa0 <__bad_interrupt>
12: c8 c0 rjmp .+400 ; 0x1a4 <__vector_9>
14: 45 c0 rjmp .+138 ; 0xa0 <__bad_interrupt>
16: 44 c0 rjmp .+136 ; 0xa0 <__bad_interrupt>
18: 43 c0 rjmp .+134 ; 0xa0 <__bad_interrupt>
1a: 42 c0 rjmp .+132 ; 0xa0 <__bad_interrupt>
1c: 41 c0 rjmp .+130 ; 0xa0 <__bad_interrupt>
1e: 40 c0 rjmp .+128 ; 0xa0 <__bad_interrupt>
20: 3f c0 rjmp .+126 ; 0xa0 <__bad_interrupt>
22: 3e c0 rjmp .+124 ; 0xa0 <__bad_interrupt>
24: 3d c0 rjmp .+122 ; 0xa0 <__bad_interrupt>

00000026 <port_to_mode_PGM>:
26: 00 00 00 00 37 00 34 00 31 00 ….7.4.1.

00000030 <port_to_output_PGM>:
30: 00 00 00 00 38 00 35 00 32 00 ….8.5.2.

0000003a <port_to_input_PGM>:
3a: 00 00 00 00 36 00 33 00 30 00 ….6.3.0.

00000044 <digital_pin_to_port_PGM>:
44: 04 04 04 04 04 04 04 04 02 02 02 02 02 02 03 03 …………….
54: 03 03 03 03 ….

00000058 <digital_pin_to_bit_mask_PGM>:
58: 01 02 04 08 10 20 40 80 01 02 04 08 10 20 01 02 ….. @…… ..
68: 04 08 10 20

0000006c <digital_pin_to_timer_PGM>:

74: 00 03 04 05 00 00 00 00 00 00 00 00 …………

00000080 <__ctors_end>:
80: 11 24 eor r1, r1
82: 1f be out 0x3f, r1 ; 63
84: cf e5 ldi r28, 0x5F ; 95
86: d4 e0 ldi r29, 0x04 ; 4
88: de bf out 0x3e, r29 ; 62
8a: cd bf out 0x3d, r28 ; 61

0000008c <__do_clear_bss>:
8c: 10 e0 ldi r17, 0x00 ; 0
8e: a0 e6 ldi r26, 0x60 ; 96
90: b0 e0 ldi r27, 0x00 ; 0
92: 01 c0 rjmp .+2 ; 0x96 <.do_clear_bss_start>

00000094 <.do_clear_bss_loop>:
94: 1d 92 st X+, r1

00000096 <.do_clear_bss_start>:
96: a9 36 cpi r26, 0x69 ; 105
98: b1 07 cpc r27, r17
9a: e1 f7 brne .-8 ; 0x94 <.do_clear_bss_loop>
9c: 37 d1 rcall .+622 ; 0x30c <main>
9e: 41 c1 rjmp .+642 ; 0x322 <_exit>

000000a0 <__bad_interrupt>:
a0: af cf rjmp .-162 ; 0x0 <__vectors>

###This is Your actual SETUP function, and this is how It looks like to Machine

000000a2 <setup>:
a2: 8d e0 ldi r24, 0x0D ; 13
a4: 61 e0 ldi r22, 0x01 ; 1
a6: 23 d0 rcall .+70 ; 0xee <pinMode>
a8: 08 95 ret

###This is Your actual LOOP function, and this is how It looks like to Machine

000000aa <loop>:

aa: 8d e0 ldi r24, 0x0D ; 13
ac: 61 e0 ldi r22, 0x01 ; 1
ae: 43 d0 rcall .+134 ; 0x136 <digitalWrite>
b0: 68 ee ldi r22, 0xE8 ; 232
b2: 73 e0 ldi r23, 0x03 ; 3
b4: 80 e0 ldi r24, 0x00 ; 0
b6: 90 e0 ldi r25, 0x00 ; 0
b8: e4 d0 rcall .+456 ; 0x282 <delay>
ba: 8d e0 ldi r24, 0x0D ; 13
bc: 60 e0 ldi r22, 0x00 ; 0
be: 3b d0 rcall .+118 ; 0x136 <digitalWrite>
c0: 68 ee ldi r22, 0xE8 ; 232
c2: 73 e0 ldi r23, 0x03 ; 3
c4: 80 e0 ldi r24, 0x00 ; 0
c6: 90 e0 ldi r25, 0x00 ; 0
c8: dc d0 rcall .+440 ; 0x282 <delay>
ca: 08 95 ret