Posts

downloading folders from google drive.

I wanted to download some course material on RL shared by the author via Google drive using the command line.  I got a bunch of stuff using wget a folder in google drive was a challenge. I looked it up in SO which gave me a hint but no solution. I installed gdown using pip and then used: gdown --folder --continue https://drive.google.com/drive/folders/1V9jAShWpccLvByv5S1DuOzo6GVvzd4LV if there are more than 50 files you need to use --remaining-ok and only get the first 50. In such a case its best to download using the folder using the UI and decompress locally. Decompressing from the command line created errors related to unicode but using the mac UI I decompressed without a glitch.

Random Thoughts on Linear Regressions

Regression Analysis TLDR Regression is the oldest and most powerful tool in a data scientist's toolbox. Under ideal conditions multiple linear regression would be the best and only tool a data scientist would want to use... In reality you would use a modern nonparametric variant or a different algorithm. Still, the main ideas I . discuss here  will pop up in many other models and algorithms. This post is my brain dump on regression - I'll update it as time allows to cover the many aspects of this technique.

SQL - Selection v.s. Projection

Selection and projection are two high level processes taking place when SQL queries are executed. Selection is choosing some records (rows) from a table and leaving others out. e.g. rows having name='oren'. Projection is the choosing some columns from each record and leaving others out. e.g. only name. So the select keyword performs projection while the where and keyword performs selection. Clearly the choice of using the keyword select for projection (choosing columns) rather than choosing rows, is an unfortunate flaw in the design of SQL, but this oversight is too well established to be fixed.

Polyglot Data Science - GraalVM installation

Image
GraalVM - polyglot data science GraalVM allows polyglot data science for example switching between R and python within a single Jupyter kernel.  "GraalVM is a universal virtual machine for running applications written in JavaScript, Python, Ruby, R, JVM-based languages like Java, Scala, Groovy, Kotlin, Clojure, and LLVM-based languages such as C and C++."

Avoid cross site scripting errors with a Jupyter local runtime

Image
So the trick is to use --NotebookApp.allow_origin and --no-browser and get the token from the command line when connecting to Google collab. jupyter notebook --NotebookApp.allow_origin='https://colab.research.google.com' \ --port=9090 --no-browser References the docs

SQL Dojo

Image
TLDR: Imagine just before your DS interview - you are NEO your coach is Morpheus, and you will be practice SQL in rapidly changing schemas. Now here is a little project I thought up: Despite any number of excellent SQL based projects I have created I tend to get rusty in SQL as I don't use it on a regular basis. I decided it might be worthwhile  to setup a virtual space to practice, hence the dojo. The dojo lets a student practice analytical sql primarily queries analysts use. Ultimately I'd like to to use it in an agile manner as an LMS with a minimal UI.  This would require creating a story for each query and a test that the query returns a good answer. Also to make things interesting the tasks should be related and proceed from easy to more challenging and cover a number of techniques like filtering, aggregation and subqueries. However, initially I want to have things up and running quickly and to collect questions and answers that reflect how to do create view...

AWS CloudFormation Pros and Cons

AWS CloudFormation Pros and Cons So I'm building a PAAS product that does ML based optimisations and that means doing work in the cloud. The ML is a neat feature but without the basic product nothing will happen and to bootstrap this project on AWS I tried to make use of CloudFormation a service that automates creation and destruction of service stacks. Based on a week's worth of experimenting with CloudFormation I will try to answer the question: "Is learning CloudFormation worth the effort?" Despite the rant CloudFormation support creation, updating and deletion of entire stacks of services. SAM is built on top of CloudFormation and It has a visual editor. The way CloudFormation is described, is that you can copy paste snippets to create resources and build a library of reusable components. This is a simplistic point of view. In reality you need to bring properties, specify dependencies, and introduce signalling mechanisms to ensure your template works. T...

Android Coding Conundrums 1 Fragment Constructors

Image
While researching using the factory design pattern for  fragment creation I couldn't help but notice how that fragment creation is a long term source of bugs. Why is fragment creation error prone? Perhaps because the API for fragment has been changed so frequently that so much of the advice is dated. In the real world fragment is deprecated in favour of a decedent in the app support library but that has been deprecated as well in-favour of androidX support libraries.  Perhaps it is because many newcomers to Android are Java developers who follow the Java idiom of constructor overloading to pass parameters at creation for use in Activity.onCreate(). However, this is not a good idea it is an example of bug pattern. Using a constructor will usually appear to work fine until Android destroys the activity and looks for a default constructor. If there isn't one the app crashes with a runtime exception . This is because behind the scenes the default constructor is called r...

Big Data Analytics Israel - New Year, New Data Scientist Job: 5 Things To Think About

Image
Data science interviews can be over whelming  New Year, New Data Scientist Job: 5 Things To Think About My notes: https://www.meetup.com/Big-Data-Analytics-Israel/events/253124286/ The first talk was by: Raya Belinsky -  "New job - yes or no?" The talk about finding your next job or reinventing your current jobs. Miss Belinsky's humour and background as an executive life-coach made this talk both pleasant and worth-while. She covered her operational definition of job burnout Linkin profile - complete the profile (it tells you what to do) The CV - ask 2 people to prepare it The Interview - e.g. prepare 3 questions Each had at least a couple of points worth taking care of in your next round of job search. Check out the talk and slides when they go online. Second talks by: Nathaniel Shimoni - "Life story" Mr Shimoni is an experienced story and had a compelling story to tell and his own twiting path to  becoming  a da...

Paratroopers Puzzle

Puzzle: Two paratroopers are dropped onto a practically infinite railway track. Both were given a note with the identical instructions... They both follow the instructions and eventually meet up. What did the note tell them to do? Answer: To drop their para-shoots on the track. Then they should run north 10 steps then switch and run 3 times to the south and switch again and triple and do not stop until they meet or reach the other parachute... The fun answer: The standard random walk has the properties related to the normal distribution (which Bernoulli approximates as N approaches infinity). For the random walk the mean position for the random walker is his or her starting point. The variance however grows with the root of the time. So pretty much any random walk would work as a rendezvous strategy - whenever they run past a pub, pop in and do not leave until you are punch drunk is probably as good randomising strategy for the above answer. For more details you can lo...

PyData 13

Image
1st speaker JP Morgan Continuous Delivery in Python on a Massive Scale, by Or Been-Zeev (JP Morgan) delivery at JP Morgan Abstract:  J.P. Morgan has one of the largest Python codebases in the world.  We will discuss the challenges of working with millions of lines of Python and how one can deal with those. We will also show you how Python makes it easy to achieve continuous delivery and ”push to production” approaches regardless of scale. My notes:  CD = CI + Push to production 20 million lines of code - use a monolithic code base... time to market is the KPI  but how to avoid breaking the code many times a day? Python simplifies the typical CI pipeline as there is no compile or build They have a single head but not clear about how they are merging changes - they have shared staging layers to handle this issue. Speaker separation in the wild, and the industry's view - Rapahel Cohen (Chorus.ai) Abstract: Audio recordings are a data source of g...

Insight into progressive web apps

Image
Some notes from a Meetup on PWAs in January 2016. I feel quite knowledgable on PWA but I wanted to learn more on implementing service worker. I ended up adding some research and collecting some great resources. However I ended up getting more detailed materials on the service worker based on google's developers docs. Also the resources have been expanded. https://www.meetup.com/The-Future-is-Javascript/events/246346861/ Service worker Service Workers Its just a simple JavaScript file that sits between you and the network – It runs in another thread – It has no access to DOM – It intercepts every network request (including cross domain) Entry point: self.caches (in service worker) or window.caches (on page)   Registering a Service Worker • Works with promises • Re-registration works fine In main.js navigator.serviceWorker.register('/sw.js').then(function(reg){ console.log('regsitered'); }.cat...

Popular posts from this blog

Moodle <=< Mediawiki SUL integration - first thoughts

downloading folders from google drive.

Big Data Analytics Israel - New Year, New Data Scientist Job: 5 Things To Think About