Skip to main content

Posts

Showing posts from 2025

Understanding Data File Formats: CSV, Avro, Parquet & ORC Explained

Understanding Data File Formats: CSV, Avro, Parquet & ORC Explained When building data pipelines and analytics platforms (Snowflake, Databricks, Athena, Hive), the file format you choose affects performance, cost, and reliability. This guide explains the four common formats — CSV , Avro , Parquet , and ORC — when to use each, and why concepts like schema and streaming matter. Row-Based vs Columnar Formats (Quick) Type How it stores data Examples Best for Row-based Stores each record (row) consecutively CSV, Avro Streaming, single-row operations, event pipelines Columnar Stores values of each column together Parquet, ORC Analytical queries, compression, large-scale reads File Format Comparison (Detailed) Format What it is Schema Serializat...

What is a Stage in Snowflake?

What is a Stage in Snowflake? A stage in Snowflake is a temporary or permanent storage location for data files before they are loaded into Snowflake tables or after they are unloaded. Think of it as a data landing zone or buffer area . Stages can store various file types such as: CSV JSON Parquet Avro or ORC Types of Stages in Snowflake Snowflake provides three main types of stages: Stage Type Category Description Example User Stage Internal Automatically created for each user. Accessible only by that user. @~ Table Stage Internal Automatically created for each table. Used to load/unload data specific to that table. @%my_table Named Stage Internal / External Explicitly created by users. Can point to internal Snowflake storage or external cloud storage....