Thursday, 7 April 2011

Working With Change Data Capture

Changed Data Capture

Purpose of CDC is to enable applications to process changed data only. CDC enables ODI to track changes in source data caused by other applications. When running integration interfaces, ODI can avoid processing unchanged data in the flow. Loads will process only changes since the last load. The volume of data to be processed is dramatically reduced.

Reducing the source data flow to only changed data is useful in many contexts, such as data synchronization and replication. It is essential when setting up an event-oriented architecture for integration. In such architecture, applications make changes in the data ("Customer Deletion", "New Purchase Order") during a business process. These changes are captured by Oracle Data Integrator and transformed into events that are propagated throughout the information system.

CDC Techniques

1)Trigger based:
ODI will create and maintain triggers to keep track of the changes.

2) Logs based: ODI retrieves changes from the database logs (Oracle, AS/400).

3) Time stamp based: Processes written with ODI can filter the data by comparing the time stamp value with the last load time (cannot process deletes)

4) Sequence number: If the records are numbered in sequence, ODI can filter the data based on the last value loaded (cannot process updates and deletes).

Changed Data Capture is performed by journalizing models. Journalizing a model consists of setting up the infrastructure to capture the changes (inserts, updates and deletes) made to the records of this model's datastores.
Oracle Data Integrator supports two journalizing modes:

•Simple Journalizing tracks changes in individual datastores in a model.
•Consistent Set Journalizing tracks changes to a group of the model's datastores, taking into account the referential integrity between these datastores. The group of datastores journalized in this mode is called a Consistent Set.

The Journalizing Components

The journalizing components are:
•Journals: Where changes are recorded. Journals only contain references to the changed records along with the type of changes (insert/update, delete).

•Capture processes: Journalizing captures the changes in the source datastores either by creating triggers on the data tables, or by using database-specific programs to retrieve log data from data server log files.

•Subscribers: CDC uses a publish/subscribe model. Subscribers are entities (applications, integration processes, etc) that use the changes tracked on a datastore or on a consistent set. They subscribe to a model's CDC to have the changes tracked for them. Changes are captured only if there is at least one subscriber to the changes. When all subscribers have consumed the captured changes, these changes are discarded from the journals.

•Journalizing views: Provide access to the changes and the changed data captured. They are used by the user to view the changes captured, and by integration processes to retrieve the changed data.
These components are implemented in the journalizing infrastructure

Setting up Journalizing:

This is the basic process for setting up CDC on an Oracle Data Integrator data model. Each of these steps is described in more detail below.

1.Set the CDC parameters
2.Add the datastores to the CDC
3.For consistent set journalizing, arrange the datastores in order
4.Add subscribers
5.Start the journals

Journalizing Tools:

Oracle Data Integrator provides a set of tools that can be used in journalizing to refresh information on the captured changes or trigger other processes:

•SnpsWaitForData waits for a number of rows in a table or a set of tables.

•SnpsWaitForLogData
waits for a certain number of modifications to occur on a journalized table or a list of journalized tables. This tool calls SnpsRefreshJournalCount to perform the count of new changes captured.

•SnpsWaitForTable waits for a table to be created and populated with a pre-determined number of rows.

•SnpsRetrieveJournalData retrieves the journalized events for a given table list or CDC set for a specified journalizing subscriber. Calling this tool is required if using Database-Specific Processes to load journalizing tables. This tool needs to be used with specific knowledge modules.

•SnpsRefreshJournalCount refreshes the number of rows to consume for a given table list or CDC set for a specified journalizing subscriber.

Implementing Changed Data Capture:

Step:1) Import the appropriate JKM in the project.
Click the Projects tab. Expand the Procedure-Demo > Knowledge Modules node, right-click Journalization (JKM), and select Import Knowledge Modules.



Step:2) In the Models tab, create a new model named Oracle_relational_01. For Technology, enter: Oracle. Select the logical schema Sales_Order. Click the Reverse Engineer tab and set Context to development. Verify the setting, as shown in the following screen. Click the Journalizing tab.



Step: 3) In the Knowledge Module menu, select JKM Oracle Simple. Procedure-Demo, as shown in the following screen. Click the Save to save your model and then close the tab.


Step: 4) Reverse-engineer the model Oracle_Relational_01. Expand this model and verify its structure as follows.


Step: 5) Set up the CDC Infrastructure. You will start the CDC on the EMPLOYEE table in the Oracle_Relational_01 model.

To add the table to CDC, expand the Oracle_Relational_01 model, right-click the EMPLOYEE table, and select Change Data Capture > Add to CDC. Click Yes to confirm.


Step: 6) Click the Refresh icon. The small yellow clock icon is added to the table.


Step: 7)
Right-click the EMPLOYEE table again and select Changed Data Capture > Start Journal.



Step: 8) you use the default subscriber SUNOPSIS. For that reason, you do not have to add another subscriber. Click OK to confirm that your subscriber is SUNOPSIS. In the Information window, click OK again. Wait several seconds, then click Refresh and verify that the tiny clock icon at the EMPLOYEE table is green now. This means that your journal has started properly.


Step: 9) Click the ODI Operator icon to open the Operator. Click Refresh. Select All Executions and verify that the EMPLOYEE session executed successfully.


Step: 10) View the data and the changed data.
In the Designer window, open the Models tab. Right-click the EMPLOYEE datastore and select Data.


Step: 11) Select the row with Employee_Key = 10. Change the value of the NAME2 column to “Symond”. Similarly, select the row with Employee_Key = 15, and then change the value to “jacob”. Save your changes and close the tab.


Step: 12) Right-click the table again and select View Data. Scroll down, and verify that the rows are modified. Close the tab.


To verify that your changed data is captured, right-click EMPLOYEE, and select Change Data Capture > Journal Data. Find the captured changed records in the journal data. Close the tab.



Done !

17 comments:

  1. Hi Gurus,

    My requirement is,

    1. Whenever a new row inserted into table( Oracle Application R12), immediately concurrent program has to run automatically. Is it possible by ODi. Please let me know. If yes, Please let me know the process / steps.

    2. Whenever a new row inserted into TABLE_1, immediately we have to extract that row and we have to insert into TABLE_2. Please provide me process / steps in details.

    I am beginner in ODI. Please help me by providing steps in details.

    Regards,
    Sreehari.

    ReplyDelete
    Replies
    1. Hi Sree you can perform this by using package. There is one ODI tool ODIwaitfordata that will check for the update in source table by the time interval you define and execute the interface link to this ODI Tool.

      that interface need to have source with JKM
      let me know if still not clear

      Thanks And Regards

      Delete
  2. Hi Sree,
    It is possible through ODI Even there are different way available to perform this operation.
    You can create a package and set up OdiWaitForData API where you can set up time interval after which package will execute automatically and load update row to target.

    for more info. please check essbase data laod automation

    http://www.scribd.com/aloo_a2/d/60359501-Automating-Essbase-Data-load-through-ODI-%E2%80%93-Part-I

    Thanks
    With Regards
    Gourav Atalkar

    ReplyDelete
  3. Hi Gourav,

    Thanks a Lot. Its really helpful.

    You mentioned, instead of using ODI, other methods also available to this.

    Please let me know ,how i will call UTL File package automatically ( we are using Oracle Apps R12 ) to extract newly inserted row without creating data base triggers and without using ODI.

    Regards,
    Sreehari.

    ReplyDelete
  4. Hi,
    I have followed exactly same steps shown in this post. However, only updates work and for some reason, deletes are not working. Could you please throw some light on it?
    Thanks,
    ~Naveen

    ReplyDelete
  5. Hi,
    I found out the issue and it is resolved now. I was wondering, if you have anything for my next step: When a change is captured on a table, instead of deleting the data on target table, i want to update a column (say isDeleted='y') on target table. Please share your thoughts, whether it can be done by JKM or should we build a custom KM?

    ~Thanks,
    Naveen

    ReplyDelete
  6. Hi,

    Gourav, Can u suggest how to implement the cdc techniques in odi.

    Regards,
    Ravi

    ReplyDelete
  7. what is the use of diagrams in models in odi

    ReplyDelete
    Replies
    1. Users can for example use Common Format Designer to create operational datastores, datamarts, or master data canonical format by assembling heterogeneous sources.

      CFD enables a user to modify an existing model and automatically generate the DDL scripts for synchronizing differences between a data model described in Oracle Data Integrator and its implementation in the data server.

      You will be able to retrieve and convert table structures from other technologies, ODI will automatically convert the data types as required.

      Once you have designed your new tables, ODI can generate a DDL that is context independent.

      Delete
  8. Any one work on CDC consistent.Can anyone has screenshots?

    ReplyDelete
  9. Hi,

    Can anyone provide the screnshots for "Log based" -Change data capture

    Regards,
    Hemamalini

    ReplyDelete
  10. can anyone please provide the steps for log based Change data capture ???

    Thanks,
    Hemamalini

    ReplyDelete
  11. Hi Gourav,

    I am following the same process as shown above , but I notice that the updated data ( change data ) is not captured . The journal data remain empty even after journalising. Due to this the incremental load is not working however the full load works fine . Could you please help ??

    ReplyDelete
  12. Hi Ashwini,

    Can please let me know what IKM you are using for the process and did you add some data to source after executing the journalising first time ?

    Thanks
    Regards

    ReplyDelete
  13. Hi ,
    I have a small question..My source is coming from Turboimage technology(HP files).
    Here which JKM have to import coz im not able to find any JKM related to this technology.
    please can you suggest me..
    Thanks

    ReplyDelete
  14. Hi Experts,

    How to create customized trigger(insert, update and delete) for the table instead of CDC trigger.

    Thanks in advance...:)

    ReplyDelete
  15. Hello there,
    Thanks for the post!
    I have a question:
    How can I write the mapping once (ODI 12.1.3), and reuse it for first population and for reading the changes without need of checking – unchecking “Journalized Data only” within the mapping itself?
    Is there a central way in an ODI Package (for instance) to tell ODI to use the Jv$ tables instead?

    ReplyDelete