TALEND,POSTGRES,SHORT STORIES..................: IMPORTANT CONCEPTS OF BODI /BODS

1) Is it necessary to put Input Primary key column in Table Comparison Transform …A) It is mandatory to select primary key column

2) Parallel Processing

3) Soure table Partitioning in Data Integrator , How will u load the data?

4) Performance tuning of a job

5) Parameters in Dataflow

6) SCD Type 2 flow

7) Audit Strategy Implementation in the project

8) Different logs available in the Data Integrator

9) Different Types of Embedded dataflows? Are they reusable.

10) Passing the parameters of one dataflow to other dataflows.

11) Degree of Parallelism. Where we will be setting this?

12) Bulk Loading

Bulk Loading :

You can bulk load to Oracle using an API or a staging file:
• If you select the API method, Data Integrator accesses the direct path engine of Oracle’s database server associated with the target table and connected to the target database. Using Oracle’s Direct-Path Load API, input data feeds directly into database files. To use this option, you must have Oracle version 8.1 or later.

Logs :

Using Data Integrator logs
• Examining trace logs
• Examining statistics logs
• Examining error logs

 Use the trace logs to determine where an execution failed, whether the execution steps occur in the order you expect, and which parts of the execution are the most time consuming.

 The statistics log (also known as the monitor log) quantifies the activities of the components of the job. It lists the time spent in a given component of a job and the number of data rows that streamed through the component.

 Data Integrator produces an error log for every job execution. Use the error logs to determine how an execution failed. If the execution completed without error, the error log is blank.

Examining target data

The best measure of the success of a job is the state of the target data.

Always examine your data to make sure the data movement operation
produced the results you expect. Be sure that:

• Data was not converted to incompatible types or truncated.
• Data was not duplicated in the target.
• Data was not lost between updates of the target.
• Generated keys have been properly incremented.
• Updated values were handled properly.

Embedded Data Flow

An embedded data flow is a data flow that is called from inside another data flow. Data passes into or out of the embedded data flow from the parent flow through a single source or target. The embedded data flow can contain any number of sources or targets, but only one input or one output can pass data
to or from the parent data flow.

There are four types of embedded data flows:

One input ---------Add an embedded data flow at the end of a data flow

One input and one output---------- Add an embedded data flow in the middle of a data flow.

One output --------Add an embedded data flow at the beginning of a data flow

No input or output---------- Replicate an existing data flow.

Partitioning :

You can set Data Integrator to perform data extraction, transformation, and loads in parallel by setting parallel options for sources, transforms, and targets.

In addition, you can set individual data flows and work flows to run in parallel by simply not connecting them in the workspace.

If the Data Integrator Job Server is running on a multi-processor computer, it takes full advantage of available CPUs.

Parallel Execution in data flows

* Table partitioning
• Degree of parallelism
* Combining table partitioning and a degree of parallelism
• File multi-threading

Table Partitioning

Data flow with source partitions only
Data flow with target partitions only
Dataflow with source and target partitions

Data Integrator instantiates a source thread for each partition, and these threads run in parallel. The data from these threads later merges into a single stream by an internal merge transform before processing the query.

Data Integrator inserts an internal Round Robin Splitter (RRS) transform after the Query transform, which routes incoming rows in a round-robin fashion to internal Case transforms. The Case transforms evaluate the rows to determine the partition ranges. Finally, an internal Merge transform collects the incoming rows from different Case transforms and outputs a single stream of rows to the target threads. The Case, Merge, and the target threads execute in parallel.

Degree Of Parallelism :

You can run transforms in parallel by entering a number in the Degree of Parallelism box on a data flow’s Properties window. The number is used to replicate transforms in the data flow which run as separate threads when the Job Server processes the data flow.

*** Entering a number in the Degree of Parallelism box on a data flow’s Properties window.

*** The number is used to replicate transforms in the data flow

TALEND,POSTGRES,SHORT STORIES..................

Monday, 20 July 2015

IMPORTANT CONCEPTS OF BODI /BODS

Are You Thinking?????????????????????