Translate topic in your language

Friday 1 February 2019

What is Array fetch and Batch update size parameters in ODI ?


Answer:-



There are two parameters in Oracle Data Integrator topology that can be tweaked to improve performance of loading. These two parameters are, 






  • Array Fetch size 

  • Batch Update size




The Array Fetch size parameter can be tweaked for the source data server. This defines the number of records read from the source at a time and this data is stored in the agent. 



The Batch Update size parameter may be tweaked for the target data server. This defines the number of records fetched in one batch to be written to be server stored in the agent. 



When you increase number of records per batch, the time taken to read and write the total records will be less. This would mean you have lesser batches in total, hence improving performance. Since more data is stored in RAM, a good RAM size would be a necessity. 



Oracle recommends the following, 



For good networks with little latency values less than 30 can be kept. However with a poor network the parameters have to be set as greater than 100. 

The typical values as per oracle is between 50 and 500. 









Explanation :-



When data flows through the agent, it is usually read on a source connection (typically via JDBC) by batches of records, and written to the target by batches. The following parameters can help you tune this data flow:






  • Array Fetch in the source data server definition. This parameter defines the size of the batches of records read from the source at a time and stored within the agent.






  • Batch Update in the target data server definition. This parameter defines the size of the batches of records written to the target.






Big Array Fetch/Batch Update values have the following consequences:



The same number of rows will be read in fewer batches, thus reducing the network overhead caused by the client/server communication. This is useful when the client/server protocols are very verbose. Another consequence of this approach is that the agent stores a bigger amount of data, and requires more RAM.



The Batch Update/Array Fetch configuration is a compromise between network and agent overhead.With a highly available network, you can keep low values (less than 30, for example). With a poor network,you can use larger values (100 and more), or you can consider using another method, for example using un-loaders/loaders and low bandwidth protocol such as FTP. Array Fetch/Batch Update typical values are between 30 and 500





Standard definition :-



Array Fetch Size (Row prefetching):- 

What ODI calls Array Fetch Size is commonly known as row prefetching:



"The concept of row prefetching is straightforward. Every time an application asks the driver to retrieve a row from the database, several rows are prefetched with it and stored in client-side memory. In this way, several subsequent requests do not have to execute database calls to fetch data. They can be served from the client-side memory".



For ODI this means that the higher you set this value the more rows will be retrieved per request to the source and then stored in the memory of the agent.



The main advantage is that you don’t have to re-read the same block multiple times from the source and as a result reduce the number of consistent gets. A database block typically holds more than one row, if you have set your Array Fetch Size to a low number, e.g. 1 and you have 5 rows on a block the database needs to read this block five times to retrieve all rows. Another advantage is that you need fewer round-trips to the server.



Batch Update size  (Array Interface):- 

What ODI calls Batch Update size is commonly known as the Array Interface:



“The array interface allows you to bind arrays instead of scalar values. This is very useful when a specific DML statement needs to insert or modify numerous rows. Instead of executing the DML statement separately for each row, you can bind all necessary values as arrays and execute it only once, or if the number of rows is high, you can split the execution into smaller batches.”



The advantage is that you need fewer round trips to the database and decrease network traffic.



This is applicable for any of the Knowledge Modules that use binding between source and target, e.g. LKM SQL to Oracle.

No comments:

Post a Comment

Note: only a member of this blog may post a comment.