SQL Dojo

January 15, 2020

TLDR:
Imagine just before your DS interview - you are NEO your coach is Morpheus, and you will be practice SQL in rapidly changing schemas.

Now here is a little project I thought up:

Despite any number of excellent SQL based projects I have created I tend to get rusty in SQL as I don't use it on a regular basis. I decided it might be worthwhile to setup a virtual space to practice, hence the dojo.

The dojo lets a student practice analytical sql primarily queries analysts use.
Ultimately I'd like to to use it in an agile manner as an LMS with a minimal UI. This would require creating a story for each query and a test that the query returns a good answer. Also to make things interesting the tasks should be related and proceed from easy to more challenging and cover a number of techniques like filtering, aggregation and subqueries.

However, initially I want to have things up and running quickly and to collect questions and answers that reflect how to do create views on a small number of databases from courses or books. Also this system can also be used to see how well things work on different dbms with a goal of doing things in a portable fashion.

I thought I might share some specifics. The POC features should be:

Run server in docker - easy to install/restart/migrate (done)
Agile access - e.g. using visual studio code + pluging. (done)
Rich clients - MySQLWorkBench (done)
SquirelSQL - supports more RDBS systems. (done)
Access from Jupyter (done - but less agile)

Beyond the POC

Migrate db to AWS (more & bigger databases).
Create a web interface to

Awitch RDBMS
Log in,
Enter, and run queries
Show the output log
and the query output.
store queries history
keep score
indicate progress in units.
Feedback and discussion.
allow users to add stories and queries.
support non-sql dbs as well like

Develop small learning units to practice techniques.

[OK] basics
[OK] filtering
[OK] aggregation
[OK] subqueries
[] cleaning data & SQL wrangling
[] OLAP
[] design and ddl
[OK] CRUD + stored procedures python.
[] CRUD + stored procedures R.
[] CRUD + stored procedures Java.
[] transaction

[] create queries for a bi dashboard.
[] create queries for a marketing automation project.

Migrate queries to database

Show schema for the database.
Make things secure.
Isolated user.
reset DB.
Use serverless backends too

aws athena.
google bigquery.

Use noSQL dbs - mongo, neo4,
Connect to a dedicated environment like MySQLWorkBench
Connect to a BI environment or Tableau / Power BI.
Use a freemium-hosted database like bigquery.

First snag:

Accessing MySQL v>8.0 requires a new protocol. I had to re-enable the old one using some obscure command to allow user + password connection or change to the mysql.connector.connect connector instead

TODO: find this, snag it, and record.
TODO: add this hack to the MySQL docker image.
TODO: automate the docker image to run a script to create and load data from a folder.
TODO: add a docker image for Postgres with equivalent capabilities.
TODO: put the docker images @ AWS
TODO: Get a docker image with the MySQL sample database, which is used in many tutorials.
TODO: migrate the project to Trello.

Updates:

I installed Squirrel SQL to access multiple DBS via rich client.
I installed GraalVM to do polyglot data science in a notebook.
I created a Jupyter Notebook to access the MySQL database.

This is good for accessing a local database.

I plan to update this to practice Polyglot data wrangling. i.e., get data from db into R and Python data frames and do some quick explorations.
Spidering & Indexing.

Spinoff DOJOs

ETL

ELK stack

BIG DATA

Search This Blog

Bayesian.Ninja

downloading folders from google drive.

SQL Dojo

First snag:

Spinoff DOJOs

Comments

Post a Comment

Popular posts from this blog

Random Thoughts on Linear Regressions

Big Data Analytics Israel - New Year, New Data Scientist Job: 5 Things To Think About

downloading folders from google drive.