SQL for Marketers: Dominate Data Analytics, Data Science, and Big Data
This is an annoucement along with free and discount coupons for my new course, SQL for Marketers: Dominate data analytics, data science, and big...
How to run distributed machine learning jobs using Apache Spark and EC2 (and Python)
This is the age of big data. Sometimes sci-kit learn doesn’t cut it. In order to make your operations and data-driven decisions scalable -...
Automation: For loops in bash (for loops on the command line)
If you have to run a script that processes data for a particular file for a particular day, i.e. your file is on hadoop with the date in the path,...
Pig Error -> ERROR 1070: Could not resolve count using imports
Or: ERROR 1070: Could not resolve sum using imports COUNT() and SUM() are case sensitive, you need to capitalize...
How to kill a hadoop job
Hadoop job -list List running hadoop jobs and their job ids. Hadoop job –kill...