SSIS or in other words, SQL Server Integration Services, is a significant part of Microsoft SQL Server, which can be utilized to achieve an extensive range of data transformation and migration jobs.
The significant parts in the SSIS package are: Data flow, Package Explorer, Event handler and Control flow.
Solution Explorer in SSIS stands for a screen where you can take a look and access all the data sources, projects, and other sorts of related files.
Data flow in SSIS stands for the flow of data from the matching sources to the anticipated destinations.
A task in SSIS is nothing but the technique of any programming language that stands for an individual component of work.
An SSIS package is a prearranged set of connections like data flow elements, event handlers, variables, control events, and configurations. You collect them by the tools provided by SSIS.
Different types of connection that support SSIS are: ODBC, Flat File, XML, Excel, OLEDB, .net SQLClient.
In SSIS, a container represents a reasonable grouping of tasks, and it allows dealing with the scope of a task collectively. The kinds of containers in SSIS are Sequence container and Task host container.
Precedence Constraint in SSIS allows you to identify the reasonable sequence of tasks, and in the arrangement, they should be implemented.
Variable in SSIS are utilized to store values as the system variable and user variable.
Checkpoint in SSIS permits the project to start again from the point of breakdown. Checkpoint stores the details about the package execution; if the package runs effectively the checkpoint file is removed.
While collecting data from different sources and writing it for a purpose, connection managers are useful. Connection manager eases the connection to the system that incorporates informations like data provider, server name, etc.
SSIS breakpoint allows you to suspend the implementation of the package in the business intelligence expansion stage.
Event logging in SSIS permits you to choose any particular event of a job and it is advantageous at what time you are getting trouble in your package.
SSIS packages and all the connected jobs have a property known as logging mode. This property allows three possible values in the shape of Enabled, Disabled and Parent Setting.
SSIS operates by means of buffers; it is a sort of an in-memory practical table to hold data.
Checkpoint data is not saved for Loop containers.
Conditional split transformation in SSIS verifies the prearranged condition, on the basis of the condition assessment.
Different types of data viewers are Grid, Column Chart, Histogram, and Scatter Plot.
What are the possible locations for saving the SSIS package?
With the aid of the Event Handlers tab in SSIS, workflows can be easily configured to act in response to package events.
You can add a Send Mail Task in the event handlers. Also, you can set the warning in the SQL Agent at what time the package fails to run.
Logging in SSIS can be done by means of events like on Warning, on Error, etc. to the manifold options like a SQL server table, XML, etc.
In order to deploy the SSIS package, we have to implement the manifest files and need to settle on whether to deploy this onto SQL Server of into File System.
Late Arriving Dimension is inevitable; to manage these we need to build a model dimension with business key and maintain the remaining of the attributes as default.
The greatest and top way to perform incremental load is by utilizing a Timestamp column in the supply table and storing the final ETL timestamp.
Three data flow constituents are SSIS are Source, Transformation, and Destination.
Checkpoint utilized in SSIS permits a package to start again at the point of breakdown.
Different alternatives for dynamic configuration are Customer variables, XML file, and Database per set with the variables
Data conversion is the most excellent method to transmit the data from one category to another. On the other hand, you need to ensure that you have well-matched data in the column.
Some of the other tasks that SSIS handles are data profiling, effectively streamline batch operations and system filing.
This factor consists of using the logical connectors between the tasks and thus graphically control the flow of data in the system.
When a series of rules or functions are applied to the extracted data from the source so that it can be loaded onto the end result, it is known as data transformation.
Yes, variables can be created in SSIS.
The two variables that can be created in SSIS are task level variables and global variables.
The types of containers that can be used are sequence containers and loop containers.
When a package is transformed from the development mode into the executables mode, it is known as the Deployment process.
When the package is transferred using wizard on the file system and the SQL server, it is known as the Manifiest file.
When the data is moved from its source to its destination, this mechanism is known as the data flow task.
When source data is analyzed to understand it better in terms of hygiene, patterns, numbers, etc it is known as data profiling task.
The task of data profiling is generally carried out at the start of the development cycle so that any teething concerns can be immediately addressed.
In order to maintain data integrity, packages use processes called transactions to bind the database actions.
This is mainly used to provide different input to the connection manager about the different properties.
Control flow is what determines the flow or process of the package. Data flow is a subset of the control flow. Without a control flow, the data flow cannot function.
The functionality or property in SSIS which allows the project (in case of failure) to restart from that point itself, is known as Checkpoint.
This is an upgraded version that allows the seamless merging of data from numerous sources. It is also used to clear up and analyze the existing data.
The four important components of the SSIS package are… Data Flow, Control Flow, Package Explorer, Event Handler
The solution explorer in SSIS is a screen in which all the data, data sources, projects, etc. can be viewed.
Just as in any other programming language, a Task is what executes an individual unit of work
There are two units of Tasks in SSIS… Control Flow tasks and Database Maintenance Tasks
This process enables users to define the logical sequence of tasks to be executed.
As the name suggests connection managers help facilitate the seamless connection of data from its source to the destination.
When a package is getting uploaded in the business intelligence system and is suddenly stopped midway, that point is known as the breakpoint.
This is the term given when a specific task or event needs to be logged or selected. This is especially useful during any troubleshooting issues.
In SSIS, this is a virtual, in-memory table that holds data.
The four data viewers in SSIS include Grid, Histogram, Scatter Plot, Column Chart.
The three locations where the SSIS package cane be potentially saved are SQL Server, Package Store, File System.
The three data flow components in SSIS are Source, Transformation and Destination
The three options for dynamic configuration in SSIS are… XML File, Customer Variables and one Database per environment in sync with the variables.
Some disadvantages of SSIS are that the vision and strategy are sometimes unclear. It can create problems in a non-Windows environment. It also does not yet provide support for alternative data integration methods.
There are 3 types of debugging Execute Package partially By using breakpoints By using Data Viewers
Using checkpoint we can restart the package from a failure of the task
The configuration provides user interaction to the package, so a user can control the package without editing package from solution explorer.
XML file configuration SQL server DB configuration Parent package variable configuration Environment variable Registry entry.
DTEXEC DTEXECUI DTUTIL
Merge will combine only two sorted dataset values. Union ALL will combine data from multiple data sets
For loop container – we can loop only for a specific number of times For each loop container – Loop will be repeated based on the number of files in the folder and we can traverse to subfolders with this container. Sequence container – to group the certain the tasks How to implement transaction control? By setting the Transaction Option property to Required or Supported
By using SQL Server Agent.
FULL cache – gets the complete reference data set from the database and places in memory and compares it with source data. Partial Cache – Here we can specify how much amount of memory that the reference dataset has to occupy. No Cache – Here data will not be cached. It is not recommended.
Everything is different except both are a product of Microsoft.
Using the MaxConcurrent Executable property of the package. Its default is -1 and it is calculated as a number of processors +2
Flat file SQL Server table XML SQL Profiler
After data has read out of a table with CDC enable, this transform send data that should be deleted inserted and updated down a different path.
Execute an OLEDB command for each row in the data flow.
it is a table which contains details data.
there are two different types of transformations Synchronous transformation’s Asynchronous transformation’s
Using CDC we can Maintains and interacts with the change capture feature form SQL Server.
In Merge join, we can implement left outer join, right outer join and full outer. Whereas in lookup, we can perform inner join only.
An event handler is like a trigger in our SQL databases which will be executed automatically when an action takes place. Onerror Onwarning OnPreexecute OnPostExecute OnVariableValueChanged OnQueryCancled
Control flow is process-oriented and Data flow is information oriented. Data will not be moved from one task to the other task in control flow whereas data will be moved from one transformation to another transformation in the data flow. In control flow, we have Precedence constraints to set the task execution whereas in data flow we have data flow paths.
File system MSDB
Flat file Excel Ole DB source ADO.NET
Logging configuration provides log the details of various events an inbuilt feature.
It is a platform including packages that provide extract, building high-performance data integration solutions transformation and load processing for DWH.
Merge: can take only 2 data sets, it requires the data to be sorted Union: can take multiple inputs and doesn’t need the data to be sorted