Published on

Big Data Pipeline Cheatsheet for AWS, Azure, and Google Cloud

Authors

Each platform offers a comprehensive suite of services that cover the entire lifecycle. The Cheatsheet is a convenient tool for you in your daily work.

Image

1 - Ingestion: Collecting data from various sources

2 - Data Lake: Storing raw data

3 - Computation: Processing and analyzing data

4 - Data Warehouse: Storing structured data

5 - Presentation: Visualizing and reporting insights

AWS uses services like Kinesis for data streaming, S3 for storage, EMR for processing, RedShift for warehousing, and QuickSight for visualization.

Azure's pipeline includes Event Hubs for ingestion, Data Lake Store for storage, Databricks for processing, Cosmos DB for warehousing, and Power BI for presentation.

GCP offers PubSub for data streaming, Cloud Storage for data lakes, DataProc and DataFlow for processing, BigQuery for warehousing, and Data Studio for visualization.

Author

AiUTOMATING PEOPLE, ABN ASIA was founded by people with deep roots in academia, with work experience in the US, Holland, Hungary, Japan, South Korea, Singapore, and Vietnam. ABN Asia is where academia and technology meet opportunity. With our cutting-edge solutions and competent software development services, we're helping businesses level up and take on the global scene. Our commitment: Faster. Better. More reliable. In most cases: Cheaper as well.

Feel free to reach out to us whenever you require IT services, digital consulting, off-the-shelf software solutions, or if you'd like to send us requests for proposals (RFPs). You can contact us at [email protected]. We're ready to assist you with all your technology needs.

ABNAsia.org

© ABN ASIA

AbnAsia.org Software