downloading folders from google drive.

I wanted to download some course material on RL shared by the author via Google drive using the command line.  I got a bunch of stuff using wget a folder in google drive was a challenge. I looked it up in SO which gave me a hint but no solution. I installed gdown using pip and then used: gdown --folder --continue https://drive.google.com/drive/folders/1V9jAShWpccLvByv5S1DuOzo6GVvzd4LV if there are more than 50 files you need to use --remaining-ok and only get the first 50. In such a case its best to download using the folder using the UI and decompress locally. Decompressing from the command line created errors related to unicode but using the mac UI I decompressed without a glitch.

Business analytics - Direct marketing data


Working with Direct marketing data (Section 2.2)

This section discusses exploring data using data transformation, trellis graphs and using a scatter-plot matrix  explore direct marketing data. My own experience doing direct marketing campaigns for real estate was that we did not have such data-sets to work with - just lists of leads based on internet based research which yielded extensive data sets but of a sparser nature.

Contacting the Postal authority did offer to us a service where based on a confidential dataset they allowed clients to send direct mail to clients according to some many criteria. However they could not provide the size of samples in real time nor did they provide the addresses - only the ability to send them letters for a fee. However their dataset was stale (3+ years old at best and they were going to renew it only in six months). Since the data was old and since there would be no way to check the reliability of their service we passed them over.

Anyhow working with this sample does provide some insights on how to setup analytics for a direct marketing campaign.

The data set has information on 10 variables: (Age, Gender, Home, Married, Location, Salary, Children, History, Catalogs and Amount Spent).

Of these only Salary Catalogs (Catalogs received) and Amount Spent (in USD) are numerical, the rest are categorical. The book then species a number of goals for maximizing sales.

The analysis uses scatterplot matrices for the three numeric variables. I had seen this type of graphic in a number of papers and discovered that there are many different techniques for creating these. The main difficulty was color coding of the correlations and appending the asterisks to indicate the significance level.

Next the data is studied by segmenting it according to different variables using trellis graphs. This is a type of chart which was formalized by researchers at bell labs in the 1990es.


What might be useful at this point might be to add a bit of more detail by showing another segment using color coding or a different type of point shape. But these are introduced a little later.

Further reading

  • Trellis Plots presentation
  • old (1996) S-Plus manual on Trellis graphics by +Richard A. Becker: & +William S. Cleveland

Comments

Popular posts from this blog

Moodle <=< Mediawiki SUL integration - first thoughts

downloading folders from google drive.

Insight into progressive web apps