What is ETL ?
If you’ve ever discussed data warehousing, you’ve probably heard the term “ETL.”
What Is ETL?
ETL stands for extract, transform, and load. The term is an acronym for the actions an ETL tool performs on a given set of data in order to accomplish a specific business goal.
- Extract: The ETL tool takes data directly from wherever it lives. This is the first step in processing the data and tells the tool how the data is stored. From there, the tool can issue the appropriate queries to read the data and understand whether any data has changed since the last extract.
- Transform: Here’s where the ETL tool does something to the data it just extracted. Maybe it changes information inside some table cells. Or perhaps it merges the data from one source with another source. In some cases, the tool might add a new entry in the same format as all the other data.
- Load: Now that the data has been transformed, the ETL tool loads it into the destination system. More often than not, this means storing it in a data warehouse or data lake, and the tool will optimize the load to make storage as efficient as possible. Destination systems are designed for analytical workloads, and bulk or parallel loads can reduce the total time required to load data.
To better understand how ETL tools work, imagine working at a large department store during the holidays; your job is to take merchandise from under the counter, wrap it in green and red wrapping paper, and put it into a customer’s shopping bag.
First, you extract the merchandise from its original source: a shelf below the checkout counter. Let’s say it’s a baseball cap. Then you transform the baseball cap by wrapping it in sparkly paper covered with a candy cane pattern. After that, you load the wrapped present into a bag that the customer takes home.
ETL tools do the same thing — they transform data.
Thank you for reading till the end !!