top of page

Cloud Computing in Data Science

  • neovijayk
  • Jul 6, 2020
  • 3 min read

From my experience in Data Science projects I find Cloud Computing platforms providing on demand easy to create, use and delete virtual machines for computing, storage, etc that helps data analysts or scientists to focus on the project or data science model development and deployment and less on the infrastructure maintenance part. This is very important for the pure data science start up organisations having less resources (technical experts, finance and infrastructure).

What is Cloud Computing? And what are it’s benefits?

  1. Cloud computing is the on-demand availability of computer system resources, especially data storage and computing power, without direct active management by the user.

  2. The term is generally used to describe data centers available to many users over the Internet.

  3. Large clouds, predominant today, often have functions distributed over multiple locations from central servers.

  4. Cloud computing relies on sharing of resources to achieve coherence and economies of scale.

  5. Cloud computing allows companies to avoid or minimise up-front IT infrastructure costs.

  6. Cloud computing allows enterprises to get their applications up and running faster, with improved manageability and less maintenance, and that it enables IT teams to more rapidly adjust resources to meet fluctuating and unpredictable demand.

  7. Cloud providers typically use a “pay-as-you-go” model, which can lead to unexpected operating expenses if administrators are not familiarised with cloud-pricing models.

  8. The availability of high-capacity networks, low-cost computers and storage devices as well as the widespread adoption of hardware virtualisation, service-oriented architecture and autonomic and utility computing has led to growth in cloud computing.

Source: Wikipedia

In Machine Learning or Deep Learning projects we require high computing power processors or servers during development of the model. But maintain these high computing power servers on premise is not easy and require resources that many of the Start ups are not capable of maintaining. Hence as the solution we can use the cloud computing platforms that provides us the high computing power servers whenever we require and after use we can stop or delete these virtual machine instances/servers allocated as the cloud service so that we will be charged only on the usage basis. The maintenance part is handled by these cloud computing platform companies. As of now there are many cloud computing platforms that one can use examples are Google Cloud, AWS (Amazon Web services), Digital Ocean, Microsoft Azure, etc.

Dig: Why cloud computing is important in data science, Cloud computing make accessible the storage and computing platforms on ad-hoc basis


I have briefly explained the the comparison between some top companies that providing cloud based services: Comparison of AWS Vs Azure Vs Google Cloud (coming soon)

As a data analyst in a start up company I have to setup the virtual machine server instances, do the installation of packages and software (Data Engineer’s job) by my self ;D before using them in Data Science projects. But this provided me the opportunity to learn many cloud based services (GCP, GCS, BQ, Googel MySQL), how to enable them, how to setup hardware and software, how to enable networking, changes in Firewall rules to enable communication from the remote machine to the server, how to use Google cloud services APIs through Python programs and many more things that I find very useful in deciding and setting up hardware and software for a new Data Science Project implementation. Therefore on this page I am going to share some useful tools, techniques and concepts related to Google Cloud Computing Platform.

Google Cloud:

Google Cloud Platform (GCP) Virtual Machine Instances:

Desktop GUI Screen for a Ubuntu VM instance:

Attach External IP address to the VM instance:

  1. To know more about Static External IP address and how to create it please refer this article: Attaching Static External IP address to the Ubuntu VM instance on GCP

Google Cloud Storage: Storage Facility on Google Cloud

Google Big Query (BQ): Cloud data warehouse with an in-memory BI Engine and machine learning built in (can get data insights using SQL)

Sharing some good articles, blogs:

In Data science project once we identified the hardware requirements based on data, data size and computing requirements and done with setting up necessary services that is required on the cloud computing platform then comes the Data fetching/extraction part which we will take a look at in the Data Pipeline page.

コメント


Subscribe to BrainStorm newsletter

For notifications on latest posts/blogs

Thanks for submitting!

  • Twitter
  • Facebook
  • Linkedin

© 2023 by my-learnings.   Copy rights Vijay@my-learnings.com

bottom of page