Posts

Showing posts with the label file format

Parquet File format - Storage details

Image
Columnar storage has revolutionized Big data processing, since its inception.  Its power can be realized from the fact that Google Big Query, Hbase, Amazon Redshift, Azure SQL Data Warehouse and many more, all utilize columnar storage. As you are here today reading this article, it is obvious that you are curious to learn how Columnar storage internally works. Well, just stay tuned for next few minutes, as  I'll explain all the details of the  most popular columnar file format called Parquet.  We'll learn the details  by  actually opening  a parquet file and by  going deep, at the level of how things work under the hood at disc level.  We'll also learn why Parquet would save cost, when it is used as underlying file format with A WS services such as Athena . When to use Parquet format  -  Just check this less than 30 seconds conversation, going on between a Data Engineer(male) and a  BI Analyst(female) . When to us...