Ab Initio Interview Questions

6 min readAug 13, 2021

1. Explain what is de-partition in Abinitio?
Answer: De-partition is done to read data from multiple flow or operations and is used to re-join data records from different flows. There are several de-partition components available which include Gather, Merge, Interleave, and Concatenation. know more at Abinitio online training

2. Explain what is SANDBOX?
Answer: A SANDBOX is referred for the collection of graphs and related files that are saved in a single directory tree and behaves as a group for navigation, version control, and migration.

3. What do you mean by the overflow errors?
Answer: While processing data, bulky calculations are often there and it is not always necessary that they fit the memory allocated for them. In case a character of more than 8-bits is stored there, these errors result simply.

4. What is data encoding?
Answer: Data needs to be kept confidential in many cases and it can be done through this approach. It simply makes sure of information remains in a form which no one else than the sender and the receiver can understand. know more at ABinitio training

5. What is the use of aggregation when we have rollup as we know rollup component in abinitio is used to summarize a group of data record? Then where we will use aggregation?
Answer: Aggregation and Rollup both can summarize the data but rollup is much more convenient to use. To understand how a particular summarization being rollup is much more explanatory compared to aggregate. Rollup can do some other functionality like input and output filtering of records. Aggregate and rollup perform the same action, rollup display intermediate result in main memory, Aggregate does not support intermediate result.

6. What does dependency analysis mean in Ab Initio?
Answer: Dependency analysis will answer the questions regarding data lineage. That is where does the data come from, what applications produce and depend on this data etc.
We can retrieve the maximum (surrogate key) from the existing data, the by using the scan or next_in_sequence/reformat we can generate a further sequence for new records.

7. Describe the elements you would review to ensure multiple scheduled batch jobs do not collide with each other?
Answer: Because every job depends upon another job for example if your first job result is successfully then another job will execute otherwise your job doesn’t work.

8. How to create a repository in abinitio for stand-alone system(LOCAL NT)?
Answer: If you are trying to install the Ab -Initio on a stand-alone machine, then it is not necessary to create the repository, While installing It creates automatically for you under abinitio folder ( where you installing the Ab-Initio) If you are still not clear please ask your Question on the same portal.

9. Describe the process steps you would perform when defragmenting a data table. Does this table contain mission-critical data?
Answer:
There are several ways to do this:
1) We can move the table in the same or other tablespace and rebuild all the indexes on the table. know more at Abinitio online course
alter table move this activity reclaims the defragmented space in the table
analyze table table_name compute statistics to capture the updated statistics.

2)Reorg could be done by taking a dump of the table, truncate the table and import the dump back into the table.

10. Why might you create a stored procedure with the with recompile option?
Answer: Recompile is useful when the tables referenced by the stored proc undergoes a lot of modification/deletion/addition of data. Due to the heavy modification activity, the execute plan becomes outdated and hence the stored proc performance goes down. If we create the stored proc with recompile option, the sql server won’t cache a plan for this stored proc and it will be recompiled every time it is run.

11. State the first_defined function with an example?
Answer: This function is similar to the function NVL() in Oracle database
It performs the first values which are not null among other values available in the function and assigns to the variable
Example: A set of variables, say v1,v2,v3,v4,v5,v6 are assigned with NULL.
Another variable num is assigned with value 340 (num=340)
num = first_defined(NULL, v1,v2,v3,v4,v5,v6,NUM)
The result of num is 340.

12. Explain PDL with an example?
Answer: To make a graph behave dynamically, PDL is used
Suppose there is a need to have a dynamic field that is to be added to a predefined DML while executing the graph
Then a graph level parameter can be defined
Utilize this parameter while embedding the DML in the output port.
For Example: define a parameter named field with a value “string(“ | “) name;”
Use ${mystring} at the time of embedding the DML in out port.
Use $substitution as an interpretation option.

13. Describe the Evaluation of Parameters order?
Answer:
Following is the order of evaluation:

The host setup script will be executed first
All Common parameters, that is, included, are evaluated
All Sandbox parameters are evaluated
The project script — project-start.ksh is executed
All form parameters are evaluated
Graph parameters are evaluated
The Start Script of the graph is executed

14. Explain what is Sort Component in Abinitio?
Answer: The Sort Component in Abinitio re-orders the data. It comprises of two parameters “Key” and “Max-core”.
Key: It is one of the parameters for sort component which determines the collation order
Max-core: This parameter controls how often the sort component dumps data from memory to disk.

15. Explain the methods to improve the performance of a graph?
Answer:
The following are the ways to improve the performance of a graph:

• Make sure that a limited number of components are used in a particular phase
• Implement the usage of the optimum value of max core values to sort and join components.
• Utilize the minimum number of sort components
• Utilize the minimum number of sorted join components and replace them by in-memory join/hash join, if needed and possible
• Restrict only the needed fields in sort, reformat, join components
• Utilize phasing or flow buffers when merged or sorted joins
• Use sorted join, when two inputs are huge, otherwise use hash join

16. What are the types of data processing you are familiar with?
Answer: The very first one is the manual data approach. In this, the data is generally processed without the dependency on a machine and thus it contains several errors. In the present time, this technique is not generally followed or only a limited data is to proceed with this approach. The second type is the Mechanical data processing. The mechanical devices have some important roles in this approach. When the data is a combination of different formats, this approach is adopted. The next approach is the Electronic data processing which is regarded as fastest and is widely adopted in the current scenario. It has top accuracy and reliability. know more at Abinitio online training in hyderabad

17. Explain the difference between the truncate and delete commands?
Answer:
Truncate:
It is a DDL command, used to delete tables or clusters. Since it is a DDL command hence it is auto commit and Rollback can’t be performed. It is faster than delete.

Delete:
It is DML command, generally used to delete a record, clusters or tables. Rollback command can be performed, to retrieve the earlier deleted things. To make deleted things permanently, “commit” command should be used.

18. What is BROADCASTING and REPLICATE?
Answer: Broadcast — Takes data from multiple inputs, combines it and sends it to all the output ports.
Eg — You have 2 incoming flows (This can be data parallelism or component parallelism) on Broadcast component, one with 10 records & other with 20 records. Then on all the outgoing flows (it can be any number of flows) will have 10 + 20 = 30 records
Replicate — It replicates the data for a particular partition and send it out to multiple outports of the component, but maintains the partition integrity.

Eg — Your incoming flow to replicate has a data parallelism level of 2. with one partition having 10 recs & other one having 20 recs. Now suppose you have 3 output flows from replicate. Then each flow will have 2 data partitions with 10 & 20 records respectively.

19. We know rollup component in Abinitio is used to summarize a group of data record then why do we use aggregation?
Answer:
• Aggregation and Rollup, both are used to summarize the data.
• Rollup is much better and convenient to use.
• Rollup can perform some additional functionality, like input filtering and output filtering of records.
• Aggregate does not display the intermediate results in main memory, whereas Rollup can.
• Analyzing a particular summarization is much simpler compared to Aggregations. know more at Abinitio online course

20. How data is processed and what are the fundamentals of this approach?
Answer: There are certain activities which require the collection of the data and the best thing is processing largely depends on the same in many cases. The fact is data needs to be stored and analyzed before it is processed. This task depends on some major factors are they are

1. Collection of Data
2. Presentation
3. Final Outcomes
4. Analysis
5. Sorting

These are also regarded as the fundamentals that can be trusted to keep up the pace in this matter.

Ab Initio Interview Questions

Written by lella keerthi

No responses yet