Six definite ways to improve efficiency and reduce load times

Sqoop is a tool offered by the Apache foundation that is used commonly in the Big Data world to import-export millions of records between heterogeneous relational databases (RDBMS) and Hadoop Distributed File System (HDFS). This data transfer can lead to varying load times ranging from a couple of minutes to…

Regression and EDA on personal health data to determine factors contributing to treatment


Linear regression is one of the most important algorithms under the supervised learning category in Machine Learning. It is also the simplest and commonly used model for predictive analysis. Using this we explore the personal health dataset and predict treatment and insurance costs.

What is a Linear Regression?

In the simplest terms, when a relationship…

A Comparative Analytics Study Benchmarking Popular Programming Languages and Execution Engines.


Have you ever wondered which programming languages and execution engines are the quickest or the slowest at processing files? Are you in a dilemma as to which programming language should you code in to solve your business problem efficiently? Well look no further, here’s your answer.

We take a look…


Clustering Neighborhoods of London and Paris using Machine Learning


A Tale of Two Cities, a novel written by Charles Dickens was set in London and Paris, which takes place during the French Revolution. These cities were both happening then and now. …

Thomas George Thomas

Data Analytics Engineering Graduate Student at Northeastern. Ex Senior Data Engineer & IBM Certified Data Scientist.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store