20
1/20 Andrea Michelotti - Atmel Toolchain Overview & Demo hArtes 2010 Final Review Toolchain Overview & Demo Andrea Michelotti, Atmel WP2 Leader

hArtes 2010 Final Review Toolchain Overview & Demo

  • Upload
    angeni

  • View
    37

  • Download
    0

Embed Size (px)

DESCRIPTION

hArtes 2010 Final Review Toolchain Overview & Demo. Andrea Michelotti, Atmel WP2 Leader. Agenda. Key Achievements Developing an Application for a Heterogeneous Platform Initialization Sharing resources Calling a DSP function Targeting Execution Expressing parallelism - PowerPoint PPT Presentation

Citation preview

Page 1: hArtes  2010 Final Review  Toolchain Overview & Demo

1/20

An

dre

a M

ich

elo

tti -

Atm

elT

oo

lch

ain

Ove

rvie

w &

Dem

o

hArtes 2010 Final Review

Toolchain Overview & Demo

Andrea Michelotti, Atmel

WP2 Leader

Page 2: hArtes  2010 Final Review  Toolchain Overview & Demo

2/20

An

dre

a M

ich

elo

tti -

Atm

elT

oo

lch

ain

Ove

rvie

w &

Dem

o

Agenda• Key Achievements

• Developing an Application for a Heterogeneous Platform

– Initialization

– Sharing resources

– Calling a DSP function

– Targeting Execution

– Expressing parallelism

• Conclusions

Page 3: hArtes  2010 Final Review  Toolchain Overview & Demo

3/20

An

dre

a M

ich

elo

tti -

Atm

elT

oo

lch

ain

Ove

rvie

w &

Dem

o

Key Achievements (1)

The hArtes toolchain…

• dramatically minimizes learning curve

• hides heterogeneous complexity: no expert knowledge of the platform,

tools or new programming languagesuse ‘C’ or tools that produce ‘C’

(Scilab/Nutech)

• automatically speeds-up application (in respect to GPP-only execution) by exploiting Processor Element capabilities

Page 4: hArtes  2010 Final Review  Toolchain Overview & Demo

4/20

An

dre

a M

ich

elo

tti -

Atm

elT

oo

lch

ain

Ove

rvie

w &

Dem

o

Key Achievements (2)

Easily retarget new platforms:basic OMAP porting took ~1 monthother popular platforms like GPUs can be

targeted as wellmust conform to Molen machine

Quickly evaluate how an application behaves on different architectures

important for time to market

Page 5: hArtes  2010 Final Review  Toolchain Overview & Demo

5/20

An

dre

a M

ich

elo

tti -

Atm

elT

oo

lch

ain

Ove

rvie

w &

Dem

o

Targeted Platforms/Architectures

Heterogeneous Platforms:Atmel DEB (Diopsis Evaluation Board):

ARM9 GPP + mAgicV DSP

UniFE’s hArtes HW Platform: ARM9 GPP + mAgicV DSP + Xilinx FPGA

Scaleo’s hArtes Emulation Platform: ARM9 GPP + mAgicV DSP + Altera FPGA

TI Experimenter OMAPL138: ARM9 GPP + C67 TI DSP

Page 6: hArtes  2010 Final Review  Toolchain Overview & Demo

6/20

An

dre

a M

ich

elo

tti -

Atm

elT

oo

lch

ain

Ove

rvie

w &

Dem

o

Developing an Application for a Heterogeneous Platform in a nutshell

Before hArtes, toolchains for heterogeneous hardware (Diopsis and OMAP) consist of separate subchains: one for GPP and one for DSP.

MAIN PROBLEMS:

• High learning curve: target tools, architecture, APIs….

• Without knowledge of the underlying software/hardware it’s difficult to use and to benefit from the existence of PEs;

• Code maintainability: two distinct projects must be kept aligned;

• Code portability: usually GPP code contains specific APIs to load, execute and access PEs resources; An identical C-code cannot be produce correct result across all the platforms;

• Debugging: not an unified image, not an unified debugger.

Page 7: hArtes  2010 Final Review  Toolchain Overview & Demo

7/20

An

dre

a M

ich

elo

tti -

Atm

elT

oo

lch

ain

Ove

rvie

w &

Dem

o

Writing an application for a heterogeneous platform in a nutshell

Suppose we want to port our legacy code on a powerful but heterogeneous architecture like OMAP or DIOPSIS…

void main(){

unsigned *shared_array=malloc(SIZE);

my_fft(int param,shared_array…);

another_kernel(…);

}

C code running on my host PC

Seems easy, but, after a while, becomes a nightmare! …

Page 8: hArtes  2010 Final Review  Toolchain Overview & Demo

8/20

An

dre

a M

ich

elo

tti -

Atm

elT

oo

lch

ain

Ove

rvie

w &

Dem

o

InitializationOMAP toolchain (through dsplink)OMAP toolchain (through dsplink)

int main(int argc,char**argv){… if (DSP_SUCCEEDED (status)) { status = PROC_setup (NULL) ; }if (DSP_SUCCEEDED (status)) { status = PROC_attach (processorId, NULL) ; if (DSP_FAILED (status)) { RDWR_1Print ("PROC_attach failed. Status: [0x%x]\n", status) ; } } else { RDWR_1Print ("PROC_setup failed. Status: [0x%x]\n", status) ; } if (DSP_SUCCEEDED (status)) { args [0] = strNumIterations ; { status = PROC_load (processorId, dspExecutable_myfft, NUM_ARGS, args) ; } if (DSP_FAILED (status)) { RDWR_1Print ("PROC_load failed. Status: [0x%x]\n", status) ; } } if (DSP_SUCCEEDED (status)) { status = PROC_start (processorId) ; if (DSP_FAILED (status)) { RDWR_1Print ("PROC_start failed. Status: [0x%x]\n", status) ; } }

Use of specific API to access DSP.

The DSP image is loaded as a file.

The DSP has its own main

GPP code

void myfft();int main(int argc,char**argv){

myfft();}

DSP code

Page 9: hArtes  2010 Final Review  Toolchain Overview & Demo

9/20

An

dre

a M

ich

elo

tti -

Atm

elT

oo

lch

ain

Ove

rvie

w &

Dem

o

Initialization Diopsis toolchain caseDiopsis toolchain case

int main (int argc, char *argv[]){

....

ret = mAgicV_load_PM("myfft.bin",_m_fd_extm);if(ret!=0) return ret;

ret = mAgicV_load_DM(“myfft_datamem.bin");if(ret!=0) return ret;

ret = mAgicV_load_XM(“myfft_extmem.bin",(0x365890));if(ret!=0) return ret;

ret = mAgicV_init_PMU();if(ret!=0) return ret; magicV_start();..magicV_wait();.... }

GPP code

void myfft();

int main(int argc,char**argv){…myfft();…}

Very low API to access DSP.

The DSP image is binary loaded as a file. The

DSP has its own main

DSP code

Page 10: hArtes  2010 Final Review  Toolchain Overview & Demo

10/20

An

dre

a M

ich

elo

tti -

Atm

elT

oo

lch

ain

Ove

rvie

w &

Dem

o

Initialization hArtes toolchain casehArtes toolchain case

int main (int argc, char *argv[]){

....}

GPP/Application codeThe hArtes Toolchain

and hArtes Runtime take care of hiding all the initialization details.

JUST one code.

Page 11: hArtes  2010 Final Review  Toolchain Overview & Demo

11/20

An

dre

a M

ich

elo

tti -

Atm

elT

oo

lch

ain

Ove

rvie

w &

Dem

o

Sharing resources OMAP toolchain caseOMAP toolchain case Use of specific API

to create a shared area, translate the address for DSP and then pass the translated address

to DSP via messages.

Memory Layout must be configured

by recompiling drivers!!

int main(int argc,char**argv){ SMAPOOL_Attrs poolAttrs poolAttrs.bufSizes = (Uint32 *) &size ; poolAttrs.numBuffers = (Uint32 *) &numBufs ; poolAttrs.numBufPools = NUM_BUF_SIZES ; poolAttrs.exactMatchReq = TRUE ; volatile unsigned* my_shared_array; status = POOL_open (POOL_makePoolId(processorId, SAMPLE_POOL_ID), &poolAttrs) ; if (DSP_FAILED (status)) { MPCSXFER_1Print ("POOL_open () failed. Status = [0x%x]\n", status) ; } } if (DSP_SUCCEEDED (status)) { status = POOL_alloc (POOL_makePoolId(processorId, SAMPLE_POOL_ID), &my_shared_array, SIZE, DSPLINK_BUF_ALIGN)) ; /* Get the translated DSP address to be sent to the DSP. */ if (DSP_SUCCEEDED (status)) { status = POOL_translateAddr ( POOL_makePoolId(processorId, SAMPLE_POOL_ID), &dspCtrlBuf, AddrType_Dsp, (Void *) &my_shared_array_from_dsp, AddrType_Usr) ;

GPP code

int main(int argc,char**argv){

DSPlink_init();….}

DSP code

Page 12: hArtes  2010 Final Review  Toolchain Overview & Demo

12/20

An

dre

a M

ich

elo

tti -

Atm

elT

oo

lch

ain

Ove

rvie

w &

Dem

o

Sharing resources Diopsis toolchain caseDiopsis toolchain case

Very raw access, sharing directly

addresses that are manually mapped,

using local copy and write back.

Use of specific APIs and

Specific compiler directives.

NO LINKER used, many problems of source alignment,

debug

#define MY_SHARED_ARRAY_ADDR 2

int main(int argc,char**argv){....unsigned local_my_shared_array[];mAgicV_read_buff(local_my_shared_array,MY_SHARED_ARRAY_ADDR,sizeof(my_shared_array));....// modify local copy, write backmAgicV_write_buff(local_my_shared_array,MY_SHARED_ARRAY_ADDR,sizeof(my_shared_array));…

GPP code

#define MY_SHARED_ARRAY_ADDR 2volatile long chess_storage(DATA:MY_SHARED_ARRAY_ADDR) my_shared_array;volatile long chess_storage(DATA:MY_SHARED_ARRAY_ADDR+SIZEOF(my_shared_array) my_other_variable;

int main(int argc,char**argv){ // access the variable}

DSP code

Page 13: hArtes  2010 Final Review  Toolchain Overview & Demo

13/20

An

dre

a M

ich

elo

tti -

Atm

elT

oo

lch

ain

Ove

rvie

w &

Dem

o

Sharing resources hArtes toolchain casehArtes toolchain case

Very natural access

int main(int argc,char**argv){

....

Unsigned* my_shared_array;my_shared_array = malloc(MYSIZE);#pragma map call_hw dsp0 dsp_func(my_shared_array);

GPP codeIntuitive and portable.

DSE turns automatically malloc into hmalloc (hArtes API), that allocates and traces memory in a shared physical space of the target platform.

Page 14: hArtes  2010 Final Review  Toolchain Overview & Demo

14/20

An

dre

a M

ich

elo

tti -

Atm

elT

oo

lch

ain

Ove

rvie

w &

Dem

o

Calling a DSP routineOMAP/Diopsis toolchain casesOMAP/Diopsis toolchain cases

int main(int argc,char**argv){… // initialization, see main

if (DSP_SUCCEEDED (status)) { status = PROC_start (processorId) ; if (DSP_FAILED (status)) { RDWR_1Print ("PROC_start failed. Status: [0x%x]\n", status) ; } }…

GPP code

DSP code

void my_fft(int pp, float*…);

int main(int argc,char**argv){

… // initialization, see mainmy_fft();…}

int main(int argc,char**argv){… // initialization, see main

mAgicV_start()…

GPP code

There is not concept of DSP call from GPP, the GPP can start a DSP process that executes the desired function.

The call can be inefficient or maybe cannot be executed correctly on the DSP. The programmer must know the underlying architecture!

For example the DSP in the DIOPSIS architecture the type int is 16 bit wide. Typically is 32 bit.

Page 15: hArtes  2010 Final Review  Toolchain Overview & Demo

15/20

An

dre

a M

ich

elo

tti -

Atm

elT

oo

lch

ain

Ove

rvie

w &

Dem

o

Calling a DSP routinehArtes toolchain casehArtes toolchain case

GPP code

void my_fft(…){…}int main(int argc,char**argv){

..#pragma call_hw dsp 0my_fft();..

}

Intuitive and portable.

DSE checks if the my_fft function can be executed on the target DSP (checks parameters, used stack memory). It also estimates the cost of the call to decide if it’s convenient to move the execution on the DSP.

To call the DSP function, the DSE adds a pragma to the function call and generate a C-source that can be compiled by the DSP toolchain.

Page 16: hArtes  2010 Final Review  Toolchain Overview & Demo

16/20

An

dre

a M

ich

elo

tti -

Atm

elT

oo

lch

ain

Ove

rvie

w &

Dem

o

Expressing Parallelism OMAP/Diopsis toolchainOMAP/Diopsis toolchain

NOT KNOWN/NOT IMPLEMENTED

Page 17: hArtes  2010 Final Review  Toolchain Overview & Demo

17/20

An

dre

a M

ich

elo

tti -

Atm

elT

oo

lch

ain

Ove

rvie

w &

Dem

o

Expressing Parallelism hArtes toolchain casehArtes toolchain case

Void main(){

#pragma omp parallel sections

{

#pragma omp section

{

#pragma call_hw dsp 0

my_fft();

}

#pragma omp section

{

another_kernel(…);

}

}

}

Intuitive and portable.

hArtes supports some openMP construct to express parallelism.

DSE in some case automatically detects kernels that can go in parallel and adds openMP annotations to the c-source.

The parallelism can be also explicited via POSIX threads

#pragma call_hw dsp 0void my_fft(…);

int main(int argc,char**argv){… // initialization, see hthread_create(my_fft()…);Another_kernel();hthread_join();}

Page 18: hArtes  2010 Final Review  Toolchain Overview & Demo

18/20

An

dre

a M

ich

elo

tti -

Atm

elT

oo

lch

ain

Ove

rvie

w &

Dem

o

Target Execution (under Linux) OMAP/Diopsis toolchainOMAP/Diopsis toolchain

Two separate binaries, not common symbols, not unified debugger, I/O

messages (printf) often relies on jtag connection.

RUN and HOPE!

bash$./my_fft_arm.elf <my_fft_dsp.bin>

Page 19: hArtes  2010 Final Review  Toolchain Overview & Demo

19/20

An

dre

a M

ich

elo

tti -

Atm

elT

oo

lch

ain

Ove

rvie

w &

Dem

o

Target Execution (under Linux) hArtes toolchainhArtes toolchain

Single ELF binary, common symbols, unified debugger, I/O messages (printf)

on the target.

RUN!

$bash ./my_fft.elf

Page 20: hArtes  2010 Final Review  Toolchain Overview & Demo

20/20

An

dre

a M

ich

elo

tti -

Atm

elT

oo

lch

ain

Ove

rvie

w &

Dem

o

Conclusions

• Although the original “Brain to Bit” (B2B) objective was very ambitious, the hArtes toolchain fulfilled its original promise: to support software development of heterogeneous hardware without expert knowledge of the target platform, and therefore allowing developers to achieve high-performance applications through complete automated solutions and by abstracting low-level hardware details.

• Areas of improvement regards mainly data flow analysis, automatic parallelization, AET integration in Eclipse, debugging capabilities.