Big Data Frameworks
- neovijayk
- Jul 6, 2020
- 1 min read
Big Data
Difference between structured, unstructured and semi structured data (coming soon)
What is Big Data?
What is Internet of Things?
Hadoop Framework :
Distribute system, Parallel computing.
About Hadoop Framework. Hadoop 1 Vs Hadoop 2. Good sources to start with Hadoop. (coming soon)
What is Hadoop Distributed File System
Hive, Pig
Spark Framework :
About Spark Framework. Good sources, courses to start with Hadoop. (coming soon)
Installing Spark on Google Cloud Platform to use it from Jupyter notebook in Anaconda

Cheatsheet for PySpark basics
Cloudera and Hortonworks:
Introduction about Cloudera and Hortonworks. (coming soon)
Google BigQuery:
About Google BQ. Steps to Create a dataset and a table on google BQ
Basic BQ query, table and dataset formation using Python. (coming soon)
Data pulling from BQ to Google Cloud Storage using Google API in Python code. (coming soon)
Good sources, courses to start with BQ. (coming soon)
NoSQL:
HBase, MongoDB (coming soon)
Data Ingestion
Sqoop
Introduction to Sqoop (coming soon)
Simple example implementation using Sqoop (coming soon)
Talend Open Studio
Example Data ingestion using Talend (coming soon)
Apache Kafka
Some interesting articles want to share:
Massively Parallel Computations using DataProc. (DataProc is Google Cloud’s Apache Hadoop managed service)
Comments