## Posts

Showing posts from 2020

I wanted to download some course material on RL shared by the author via Google drive using the command line.  I got a bunch of stuff using wget a folder in google drive was a challenge. I looked it up in SO which gave me a hint but no solution. I installed gdown using pip and then used: gdown --folder --continue https://drive.google.com/drive/folders/1V9jAShWpccLvByv5S1DuOzo6GVvzd4LV if there are more than 50 files you need to use --remaining-ok and only get the first 50. In such a case its best to download using the folder using the UI and decompress locally. Decompressing from the command line created errors related to unicode but using the mac UI I decompressed without a glitch.

### Random Thoughts on Linear Regressions

Regression Analysis TLDR Regression is the oldest and most powerful tool in a data scientist's toolbox. Under ideal conditions multiple linear regression would be the best and only tool a data scientist would want to use... In reality you would use a modern nonparametric variant or a different algorithm. Still, the main ideas I . discuss here  will pop up in many other models and algorithms. This post is my brain dump on regression - I'll update it as time allows to cover the many aspects of this technique.

### SQL - Selection v.s. Projection

Selection and projection are two high level processes taking place when SQL queries are executed. Selection is choosing some records (rows) from a table and leaving others out. e.g. rows having name='oren'. Projection is the choosing some columns from each record and leaving others out. e.g. only name. So the select keyword performs projection while the where and keyword performs selection. Clearly the choice of using the keyword select for projection (choosing columns) rather than choosing rows, is an unfortunate flaw in the design of SQL, but this oversight is too well established to be fixed.

### Polyglot Data Science - GraalVM installation

GraalVM - polyglot data science GraalVM allows polyglot data science for example switching between R and python within a single Jupyter kernel.  "GraalVM is a universal virtual machine for running applications written in JavaScript, Python, Ruby, R, JVM-based languages like Java, Scala, Groovy, Kotlin, Clojure, and LLVM-based languages such as C and C++."

### Avoid cross site scripting errors with a Jupyter local runtime

So the trick is to use --NotebookApp.allow_origin and --no-browser and get the token from the command line when connecting to Google collab. jupyter notebook --NotebookApp.allow_origin='https://colab.research.google.com' \ --port=9090 --no-browser References the docs

### SQL Dojo

TLDR: Imagine just before your DS interview - you are NEO your coach is Morpheus, and you will be practice SQL in rapidly changing schemas. Now here is a little project I thought up: Despite any number of excellent SQL based projects I have created I tend to get rusty in SQL as I don't use it on a regular basis. I decided it might be worthwhile  to setup a virtual space to practice, hence the dojo. The dojo lets a student practice analytical sql primarily queries analysts use. Ultimately I'd like to to use it in an agile manner as an LMS with a minimal UI.  This would require creating a story for each query and a test that the query returns a good answer. Also to make things interesting the tasks should be related and proceed from easy to more challenging and cover a number of techniques like filtering, aggregation and subqueries. However, initially I want to have things up and running quickly and to collect questions and answers that reflect how to do create views