Tuesday 26 July 2016

Talend Job Design - Performance Optimization Tips

1.Remove Unnecessary fields/columns ASAP using tFilterColumns component.


It is very important to remove the data from the Job flow which is not required as soon as possible. e.g. we have a huge lookup file having more than 20 fields but we only need two fields (Key, Value) while performing the lookup operation. Now if we do not filter the columns before join then the whole file will be read into memory for performing lookup hence occupying unnecessary space. However, if we filter fields and only keep two required columns then the memory occupied by lookup data is much less i.e. in this example 10 times less.



2. Remove Unnecessary data/records ASAP using tFilterRows component.

Similarly, It is necessary to remove the data from the job flow which is not required in the Job. Having less data in your job flow will always allow your Talend Job to perform better.

3. Use Select Query to retrieve data from database


4. Use Database Bulk components -

5. Store on Disk Option -

6. Allocating more memory to the Jobs-

7. Parallelism -
  • Using the tParallelize component of Talend. (only available in Talend Integration Suite)
  • Running SubJobs in Parallel by using the Multithreaded Executions. This option is also available in Talend Open Studio. However, this option is disabled by default. You can enable this option from Job view. Visit the article “Parallel Execution Sub Jobs in Talend Open Studio” for more details and demonstration of Parallel execution of Sub Jobs in Talend Open Studio.
8. Use Talend ELT Components when required-


9. Use SAX parser over Dom4J whenever required -
When parsing Huge XML files try using the SAX parser in the Generation mode in the Advanced Settings of tFileInputXML component. However SAX parser comes with few downsides e.g. we can only basic XPATH expression and can not use expressions like Last , array selection of data [ ] etc. But if your requirement is getting accomplished using SAX parser, you must prefer it over Dom4J.

10. Index Database Table columns -


11. Split Talend Job to smaller Subjobs- Whenever possible, one should split the complex Talend job to smaller Subjobs. Talend operates pipe line parallelism i.e. after processing few records it passes to downstream components even if the previous component has finished processing all records. Hence if we will design a JOb having complex number of operations in single subjob then the performance of the job will reduce. It is advisable to bread the complex Talend job to smaller Subjobs and then control the flow of Job using Triggers in Talend. Thanks Guys for reading this post. I am looking forward to your expert comments.


COURTS : CASES : LAWYERS : JUDGES : ::::::::: VICTIMS : ACCUSED

  *We have got so many SMART people in our COUNTRY. *we have got so many IIT completed SMART students in our COUNTRY. * we have got so many ...