Published inThe Startup·PinnedMember-onlyPerformance Tuning Apache SqoopSix definite ways to improve efficiency and reduce load times — Sqoop is a tool offered by the Apache foundation that is used commonly in the Big Data world to import-export millions of records between heterogeneous relational databases (RDBMS) and Hadoop Distributed File System (HDFS). This data transfer can lead to varying load times ranging from a couple of minutes to…Big Data5 min readBig Data5 min read
Published inAnalytics Vidhya·Mar 21, 2021Predicting Patient treatment costs using Machine Learning.Regression and EDA on personal health data to determine factors contributing to treatment — Introduction Linear regression is one of the most important algorithms under the supervised learning category in Machine Learning. It is also the simplest and commonly used model for predictive analysis. Using this we explore the personal health dataset and predict treatment and insurance costs. What is a Linear Regression?Machine Learning6 min readMachine Learning6 min read
Published inGeek Culture·Feb 6, 2021File Processing in Big Data Systems: Which is Quicker? Which is better?A Comparative Analytics Study Benchmarking Popular Programming Languages and Execution Engines. — Introduction Have you ever wondered which programming languages and execution engines are the quickest or the slowest at processing files? Are you in a dilemma as to which programming language should you code in to solve your business problem efficiently? Well look no further, here’s your answer. We take a look…Data Science6 min readData Science6 min read
Published inAnalytics Vidhya·Aug 19, 2020A Tale of Two CitiesClustering Neighborhoods of London and Paris using Machine Learning — Introduction A Tale of Two Cities, a novel written by Charles Dickens was set in London and Paris, which takes place during the French Revolution. These cities were both happening then and now. …Data Science13 min readData Science13 min read