Data Factory Interview Questions for Freshers || Data Factory Interview Questions and Answers for Freshers & Experienced

How do you handle this scenario?

You need to make sure your Virtual Machines are able to communicate securely with each other to ensure security.

Solution - Azure Virtual Network enables Azure resources to communicate with each other, the internet, or on-premises networks securely.

Users can create their own private networks
It provides users with an isolated and highly secure environment for applications
All traffic stays within the Azure network
It allows users to design their own networks

Posted Date:- 2021-10-29 06:17:24

What kind of storage is best suited to handle unstructured data?

Blob Storage provides storage capacity for data. It places data into different tiers based on how often theyâ€™re accessed.

* Any type of unstructured data can be stored
* Data integrity is maintained every time an object is changed
* It helps to increase app performance and reduces bandwidth consumption

Posted Date:- 2021-10-29 06:16:22

How can Azure handle this situation?

A client wants the front end of his/ her application to be hosted on Azure, but wants the database to be hosted on-premises.
Solution - The ideal solution in this scenario is to use Azure VNET based â€œPoint to Siteâ€. Itâ€™s best suited for scenarios where there are only a limited number of resources that need to be connected

Posted Date:- 2021-10-29 06:15:24

What are the two kinds of Azure Web Service roles?

A cloud service role is a set of managed and load-balanced virtual machines that work together to perform tasks. The two kinds of Azure Web Service roles are:

Web Roles

* It is a cloud service role that is used to run web applications developed in programming languages supported by IIS (Internet Information Services) like ASP.NET, PHP, etc.

* It automatically deploys and hosts applications through the users IIS

Worker Roles

* It runs applications and other tasks that don't require IIS. It performs supporting background tasks along with web roles.

* It doesnâ€™t use IIS and runs user applications standalone

Posted Date:- 2021-10-29 06:14:01

How has integrating hybrid cloud been useful for Azure?

The Hybrid Cloud boosts productivity by using Azure and the Azure stack for building and deploying applications for the cloud and on-premises applications. Integrating hybrid cloud been useful for Azure in the following ways:

* It obtains greater efficiency with a combination of Azure services and DevOps processes and tools.

* Users can take advantage of constantly updated Azure services and other Azure Marketplace applications.

* It enables it to be deployed regardless of its location, the cloud, or on-premises.

* This enables applications to be created at a higher speed.

Posted Date:- 2021-10-29 06:10:40

What are the advantages of the Azure Resource Manager?

Azure Resource Manager enables users to manage their usage of application resources. Few of the advantages of Azure Resource Manager are:

* ARM helps deploy, manage and monitor all the resources for an application, a solution or a group.

* Users can be granted access to resources they require.

* It obtains comprehensive billing information for all the resources in the group.

* Provisioning resources is made much easier with the help of templates.

Posted Date:- 2021-10-29 06:09:20

What are the advantages of Scaling in Azure?

Azure performs scaling with the help of a feature known as Autoscaling. Autoscaling helps to deal with changing demands in Cloud Services, Mobile Services, Virtual Machines, and Websites. Below are a few of its advantages:

1. Maximizes application performance
2. Scale up or down based on demand
3. Schedule scaling to particular time periods
4. Highly cost-effective

Posted Date:- 2021-10-29 06:05:01

Which one amongst Microsoft Azure ML Studio and GCP Cloud AutoML is better?

When we compare both in terms of services, Azure ML Studio wins the verdict since it has Classification, Regression, Anomaly Detection, Clustering, Recommendation, and Ranking features.

On the other hand, GCP Cloud AutoML has Clustering, Regression, and Recommendation features. Moreover, Azure has a drag and drop options that make the process easier to carry out.

Posted Date:- 2021-10-29 06:03:14

How Does Microsoft Azure Compare to Aws?

This might be a matter of opinion for you, so answer as you see fit. In general, people say Azure is a better choice because itâ€™s a Microsoft product, making it easier for organizations already using Windows Server, SQL Server, and Exchange to move to the cloud. In addition, because of Microsoftâ€™s deep knowledge of developer tools, Azure offers multiple app deployment options for developers, which makes it stand out against AWS.

Posted Date:- 2021-10-29 06:01:34

How do I access data by using the other 80 dataset types in Data Factory?

* The Mapping Data Flow feature currently allows Azure SQL Database, Azure SQL Data Warehouse, delimited text files from Azure Blob storage or Azure Data Lake Storage Gen2, and Parquet files from Blob storage or Data Lake Storage Gen2 natively for source and sink.

* Use the Copy activity to stage data from any of the other connectors, and then execute a Data Flow activity to transform data after itâ€™s been staged. For example, your pipeline will first copy into Blob storage, and then a Data Flow activity will use a dataset in source to transform that data.

Posted Date:- 2021-10-29 05:56:49

How does ADF Pipeline execution work?

A pipeline run can be defined as an instance of a pipeline execution. For example, you have a data factory pipeline to copy data from blob to file share that runs by event grid trigger. Each pipeline run has a unique id known as pipeline run id. This pipeline run id is a GUID that uniquely identifies each pipeline run. You can run a pipeline either by using some trigger or manually. For more visit Pipeline Execution.

Posted Date:- 2021-10-29 05:55:16

How do I access data by using the other 80 dataset types in Data Factory?

* The Mapping Data Flow feature currently allows Azure SQL Database, Azure SQL Data Warehouse, delimited text files from Azure Blob storage or Azure Data Lake Storage Gen2, and Parquet files from Blob storage or Data Lake Storage Gen2 natively for source and sink.

* Use the Copy activity to stage data from any of the other connectors, and then execute a Data Flow activity to transform data after itâ€™s been staged. For example, your pipeline will first copy into Blob storage, and then a Data Flow activity will use a dataset in source to transform that data.

Posted Date:- 2021-10-29 05:54:20

Any Data Factory pipeline can be executed using three methods. Mention these methods

* Under Debug mode.

* Manual execution using Trigger now.

* Using an added scheduled, tumbling window or event trigger.

Posted Date:- 2021-10-29 05:53:31

Which Data Factory version do I use to create data flows?

Use the Data Factory V2 version to create data flows.

Posted Date:- 2021-10-29 05:52:04

How do I gracefully handle null values in an activity output?

You can use the @coalesce construct in the expressions to handle the null values gracefully.

Posted Date:- 2021-10-29 05:51:24

Which Data Factory activities can be used to iterate through all files stored in a specific storage account, making sure that the files smaller than 1KB will be deleted from the source storage account?

* ForEach activity for iteration
* Get Metadata to get the size of all files in the source storage
* If Condition to check the size of the files
* Delete activity to delete all files smaller than 1KB

Posted Date:- 2021-10-29 05:49:51

Data Factory supports three types of triggers. Mention these types briefly

* The Schedule trigger that is used to execute the ADF pipeline on a wall-clock schedule.

* The Tumbling window trigger that is used to execute the ADF pipeline on a periodic interval, and retains the pipeline state.

* The Event-based trigger that responds to a blob related event, such as adding or deleting a blob from an Azure storage account.

Posted Date:- 2021-10-29 05:47:57

What is the difference between mapping data flow and Power query actvity (data wrangling)?

Mapping data flows provide a way to transform data at scale without any coding required. You can design a data transformation job in the data flow canvas by constructing a series of transformations. Start with any number of source transformations followed by data transformation steps. Complete your data flow with a sink to land your results in a destination. Mapping data flow is great at mapping and transforming data with both known and unknown schemas in the sinks and sources.

Power Query Data Wrangling allow you to do agile data preparation and exploration using the Power Query Online mashup editor at scale via spark execution. With the rise of data lakes sometimes you just need to explore a data set or create a dataset in the lake. You aren't mapping to a known target.

Posted Date:- 2021-10-29 05:44:52

Data Factory supports two types of compute environments to execute the transform activities. Mention these two types briefly

On-demand compute environment, using a computing environment fully managed by the ADF. In this compute type, the cluster will be created to execute the transform activity and removed automatically when the activity is completed.

Posted Date:- 2021-10-29 05:44:07

What is the difference between the Mapping data flow and Wrangling data flow transformation activities in Data Factory?

Mapping data flow activity is a visually designed data transformation activity that allows us to design a graphical data transformation logic without the need to be an expert developer, and executed as an activity within the ADF pipeline on an ADF fully managed scaled-out Spark cluster.

Wrangling data flow activity is a code-free data preparation activity that integrates with Power Query Online in order to make the Power Query M functions available for data wrangling using spark execution.

Posted Date:- 2021-10-29 05:43:27

What is Data Factory Integration Runtime?

Integration Runtime is a secure compute infrastructure that is used by Data Factory to provide the data integration capabilities across the different network environments and make sure that these activities will be executed in the closest possible region to the data store.

Posted Date:- 2021-10-29 05:42:21

Why mapping data flow preview failing with Gateway timeout?

Please try to use larger cluster and leverage the row limits in debug settings to a smaller value to reduce the size of debug output.

Posted Date:- 2021-10-29 05:38:08

In ADF, can I calculate value for a new column from existing column from mapping?

You can use derive transformation in mapping data flow to create a new column on the logic you want. When creating a derived column, you can either generate a new column or update an existing one. In the Column textbox, enter in the column you are creating. To override an existing column in your schema, you can use the column dropdown. To build the derived column's expression, click on the Enter expression textbox. You can either start typing your expression or open up the expression builder to construct your logic.

Posted Date:- 2021-10-29 05:37:03

Is there a way to write attributes in cosmos db in the same order as specified in the sink in ADF data flow?

For cosmos DB, the underlying format of each document is a JSON object which is an unordered set of name/value pairs, so the order cannot be reserved.

Posted Date:- 2021-10-29 05:36:10

Does the data flow compute engine serve multiple tenants?

Clusters are never shared. We guarantee isolation for each job run in production runs. In case of debug scenario one person gets one cluster, and all debugs will go to that cluster which are initiated by that user.

Posted Date:- 2021-10-29 05:35:10

Is the self-hosted integration runtime available for data flows?

Self-hosted IR is an ADF pipeline construct that you can use with the Copy Activity to acquire or move data to and from on-prem or VM-based data sources and sinks. The virtual machines that you use for a self-hosted IR can also be placed inside of the same VNET as your protected data stores for access to those data stores from ADF. With data flows, you'll achieve these same end-results using the Azure IR with managed VNET instead.

Posted Date:- 2021-10-29 05:33:48

How do I access data by using the other 90 dataset types in Data Factory?

The mapping data flow feature currently allows Azure SQL Database, Azure Synapse Analytics, delimited text files from Azure Blob storage or Azure Data Lake Storage Gen2, and Parquet files from Blob storage or Data Lake Storage Gen2 natively for source and sink.

Use the Copy activity to stage data from any of the other connectors, and then execute a Data Flow activity to transform data after it's been staged. For example, your pipeline will first copy into Blob storage, and then a Data Flow activity will use a dataset in source to transform that data.

Posted Date:- 2021-10-29 05:33:09

How do I gracefully handle null values in an activity output?

You can use the @coalesce construct in the expressions to handle null values gracefully.

Posted Date:- 2021-10-29 05:32:42

Can an activity in a pipeline consume arguments that are passed to a pipeline run?

Yes. Each activity within the pipeline can consume the parameter value that's passed to the pipeline and run with the @parameter construct.

Posted Date:- 2021-10-29 05:32:14

Sate some of the features & roles not supported by Azure VM?

Following are some of the roles & features not supported by Azure VM:

* Wireless LAN Service
* Network Load Balancing
* Dynamic Host Configuration Protocol
* BitLocker Drive Encryption

Posted Date:- 2021-10-29 05:28:25

What is Cloud Environment?

Cloud Environment is an advanced storage space offered by cloud providers. Customers can opt for any of their suitable cloud environments and start running their software applications on a sophisticated infrastructure. Some of the examples of cloud environment providers are Amazon Web Services, Google Cloud Platform, and Microsoft Azure.

Posted Date:- 2021-10-29 05:25:02

What do you know about Fault Domains?

A Fault Domain is defined as underlying hardware that is logically grouped. It shares a common network switch and common power source, same as racks in on-premises data-centres. All the VMs you create in the availability set are automatically distributed by Azure across these fault domains. The fault domains improve the process efficiency by minimizing the network outages, potential physical hardware failures, or power interruptions.

Posted Date:- 2021-10-29 05:24:11

Define the Availability Set in Azure

The Availability set is nothing but a logical grouping of Virtual machines and helps Azure to understand the architecture of your applications. The ideal number of VMs recommended to create in an availability set is two or more. This provides high availability of applications and meets the maximum percentage of Azure SLA standards. When there is only one VM is used with Azure Premium Storage, and the unplanned VMs are applied with Azure SLA.

Posted Date:- 2021-10-29 05:22:44

What are the different types of roles available in Azure?

Following are the three different types of roles in Microsoft Azure:

Worker Role
VM Role
Web Role

Worker Role: This is defined as a help to the web role and used to execute background processes.

VM Role: it allows users to schedule tasks and various window services. Using this VM role we can also make customizations to the machines on top of which worker and web role is running.

Web Role: A web role is typically used to deploy a website by making use of languages such as PHP, .NET, etc. You can configure and customize it to run web applications.

Posted Date:- 2021-10-29 05:22:00

List some of the web applications that can be deployed with Azure

The web applications that are deployed along with Azure are ASP.NET, WCF, and PHP.

Posted Date:- 2021-10-29 05:20:32

Name the services used in Azure to manage resources?

Following are the four different services used in Azure to manage resources:

* Application Insights
* Azure Portal
* Azure Resource Manager
* Log Analytics

Posted Date:- 2021-10-29 05:20:11

What is Azure Advisor Service?

Azure Advisory service provides you with a complete overview of your Azure landscape. It helps you identify your system needs and guides you to bring cost-efficiency. It offers the following features:

* High Availability: It guides you with possible solutions to improve the continuity of critical business applications.
* Security: This would help you detect the wide range of threats in advance and saves you from data breaches.
* Performance: It helps you with the ways to speed up your application performance.
* Cost: this helps you with the tips to minimizing spending.

Posted Date:- 2021-10-29 05:19:26

What are the different storage types available in Azure?

Azure offers a suite of storage services which are as follows:

* Azure Blobs
* Azure Queues
* Azure Files
* Azure Tables

Posted Date:- 2021-10-29 05:18:07

What is the procedure to migrate an SQL server database to Azure SQL?

It is a typical task to migrate an SQL Server to Azure SQL. In order to execute this process, we use the SQL Server Management Studio (SSMS) import and export features.

Posted Date:- 2021-10-29 05:17:15

What is Azure Database Migration?

Azure Database Migration is an advanced tool that eliminates the roadblocks associated with traditional systems and creates a streamlined way using which you can simplify, guide, and automate any database migration to Azure. It allows you to migrate data, objects, and schema from a variety of sources to the cloud.

Posted Date:- 2021-10-29 05:16:54

How can you use SDKs in a Data Factory?

If you are a well-experienced candidate and wish to develop a programmatic interface, Data Factory helps you with a rich set of the software development kit (SDKs) using which you can author, manage, or monitor pipelines by using any of these languages such as .Net, Python, PowerShell, and REST.

Posted Date:- 2021-10-29 05:16:29

How many levels of security do we have in ADLS Gen2 and what are they?

We have two levels of security levels in ADLS Gen2 and they are as follows:

* Role-Based Access Control (RBAC)
* Access Control Lists (ACLs)

Role-Based Access Control (RBAC): RABC comes with the in-built roles which include contributor, reader, custom, or owner roles. There are two typical reasons behind assigning RBAC. one reason is to allow the use of built-in data explorer tools and the other reason is to specify the candidates who can manage the services.

Access Control Lists (ACLs): This security level defines the data objects which a user is allowed to read, write, or execute the desired structure. ACLs work as a complement to POSIX which is familiar to those with Linux or Unix background.

Posted Date:- 2021-10-29 05:15:34

Will it is possible to define default values for Pipeline parameters

Yes, it is possible. You can define the default values for the parameters in the pipelines.

Posted Date:- 2021-10-29 05:14:32

Will it be possible to pass parameters to a pipeline run?

Yes, we can pass parameters to a pipeline run. Parameters are top-level and first-class concepts in Data Factory. At the pipeline level, you can define the parameters and pass arguments as you execute the pipeline.

Posted Date:- 2021-10-29 05:13:57

What are the top-level concepts associated with Azure ADF?

Below mentioned are the top-level concepts in Azure Data Factory:

* Pipeline: A Pipeline is defined as a logical group of activities that execute a task together.

* Activities: Activities are nothing but a sequence of steps that takes place in Pipeline. The activity may be like transferring data between different sources or data querying data sets.

* Datasets: It is nothing but a data source. Dataset is a structure that holds data in a predefined way.

* Linked services: It is nothing but a piece of connection information required for Data Factory to connect to external resources.

Posted Date:- 2021-10-29 05:00:28

What is the procedure to set up a Pipeline in ADF?

Following is the simple procedure to set up a Pipeline:

* You can make use of window triggers or scheduler triggers to schedule a pipeline.

* The Trigger takes the help of a wall-clock calendar schedule to schedule pipelines periodically.

Posted Date:- 2021-10-29 04:59:27

What is a pipeline in an Azure Data Factory?

A Pipeline is defined as a logical group of activities that execute a task together. It helps you to manage all the tasks as a group instead of each task separately. You can develop and deploy a pipeline to accomplish a bunch of tasks.

Posted Date:- 2021-10-29 04:58:48

What is Azure Data Lake?

Azure Data Lake is an advanced mechanism that simplifies the data storage and processing tasks for the developers, analysts, and data scientists. Moreover, it also supports data processing and analytics across multiple languages and platforms. It eliminates the roadblocks associated with data storage and makes it easier to perform batch, steam, and interactive analytics. Azure Data Lake comes with the features that solve the challenges associated with scalability and productivity and meets ever-growing business needs.

Posted Date:- 2021-10-29 04:58:30

Define Blob Storage in Azure?

Azure Blob storage is a powerful feature from azure that helps you develop data lakes for meeting your analytics needs and provide storage to design and build advanced cloud-native and mobile applications. It offers high flexibility and enables easy scaling options for high computational needs and to support machine learning workloads. Using this Azure Blob storage one can store application data privately or make data available to the general public.

Posted Date:- 2021-10-29 04:57:55

Are there any limitations on the number of run-time integrations in Azure Data Factory?

There are no limitations to use the number of integration runtime instances in a data factory. But there is a limit for using VM cores by the integration run-time.

Posted Date:- 2021-10-29 04:57:35

Explain each Integration Runtime types in detail

Following are the 3 types of Integration Runtime types we have in ADF:

Azure Integration Run Time: It performs the data copy tasks between the cloud data stores and delivers the activities to a wide range of computing services which include SQL server or Azure HDinsight where the data transformation happens.

Self Hosted Integration Runtime: Here the self-hosted runtime can copy tasks between a data store in a private network and a cloud data store. After which it delivers transform tasks against computing resources in Azure virtual network or on-premises network. You need to have a virtual machine or an on-premises machine inside a private network in order to install self-hosted integration run-time.

Azure SSIS Integration Run Time: It allows the execution of SSIS packages in a fully managed Azure compute environment. If you wish to lift and shift the current SQL Server Integration Services workload, you can natively execute SSIS packages by creating an Azure-SSIS IR.

Posted Date:- 2021-10-29 04:57:12

What is Integration runtime in Azure Data Factory?

The Integration Runtime (IR) is defined as a computational infrastructure utilized by Azure Data Factory and supports various data integration capabilities across multiple network environments. Azure Data Factory supports 3 types of Integration Runtime (IR), which are Azure, Self-hosted, and Azure-SSIS.

Posted Date:- 2021-10-29 04:56:45

Why do we need Azure Data Factory?

Following are the reasons for using this ADF:

* To move the huge data sets to the cloud.

* To channelize the data in the cloud, delete unnecessary data, and to store it in the desired format.

* To eliminate the issues associated with data transformation and to automate the data flow process.

* To make the entire data orchestration process more manageable or to convert it into a well-organized way.

Posted Date:- 2021-10-29 04:56:19

What is Azure Data Factory?

Azure Data Factory is an advanced, cloud-based, data-integration ETL tool that streamlines and automates the data extraction & transformation process. This tool simplifies the process to create data-driven workflows that help you to transfer the data between on-premises and cloud data stores easily. Using Data Flows in the Data factory, we can process and transform data.

Azure Data Factory is a highly flexible tool and supports multiple external computational engines for hand-coded data transformations by deploying compute services such as Azure HDInsight, Azure Databricks, and SQL Server Integration Services. You can use this Azure Data Factory either with Azure-based cloud service or on self-hosted compute environments such as SQL Server, SSIS, or Oracle.

Posted Date:- 2021-10-29 04:55:06

Data Factory Interview Questions for Freshers/Data Factory Interview Questions and Answers for Freshers & Experienced

Search

R4R Team