...

SQL for Marketers: Dominate Data Analytics, Data Science, and Big Data

This is an annoucement along with free and discount coupons for my new course, SQL for Marketers: Dominate data analytics, data science, and big...

...

How to run distributed machine learning jobs using Apache Spark and EC2 (and Python)

This is the age of big data. Sometimes sci-kit learn doesn’t cut it. In order to make your operations and data-driven decisions scalable -...

...

Automation: For loops in bash (for loops on the command line)

If you have to run a script that processes data for a particular file for a particular day, i.e. your file is on hadoop with the date in the path,...

...

Pig Error -> ERROR 1070: Could not resolve count using imports

Or: ERROR 1070: Could not resolve sum using imports COUNT() and SUM() are case sensitive, you need to capitalize...

...

How to kill a hadoop job

Hadoop job -list List running hadoop jobs and their job ids. Hadoop job –kill...