Bayesian.Ninja

Posts

Showing posts from December, 2017

Automating web app development with Polymer and Yeoman

December 30, 2017

Yeoman lets you configure and stamp out sophisticated boilerplate projects from the command line. In a digital marketing agency the data science team may be asked to provide for each client's campaign fairly similar media reports and explanatory analytics dashboards for both external clients and internal clients. For longer term project we may be asked to also provide predictive analytics. The data comes from advertisers Google, Facebook, Taboola, OutBrain, phone tracking metrics API, outcomes are channels via segment into say Google Analytics which has both an API and polymer components. Usually there will be additional data science products prediction special segment data, funnels, market research, attribution charts which and long term data in BigQuery which has an API as well. Some vendors don't have an API so to access their data it is exported into a CSV and placed into google sheets which has an API and a Polymer component phone tracking. It takes too...

Serverless Big Data

December 27, 2017

I was at the first meeting of the p of the Big Data Analytics meetup. The first speaker was Avi Zloof CEO of EvaluteX who gave a talk titled "Serveless Big Data The Good, and the Great" The Speaker Avi Zloof EvaluateX which is located at " The Junction " in (Rothschild 9 Tel Aviv) is an outfit that has a chrome plugin which can optimize Google BigQuery SQL queries in the web interface. My Last BigQuery project however had abandoned the web interface and switched to 100% automation via the API. Also despite having massive queries there was little need to optimize them. I had been more concerned with comparing different editions of the projects to detect data discrepancies. The Big Data and GUI connection is often the primary challenge however this was not the subject of the talk. The talk introduced me to EvaluateX and their activity. Mr. Zloof shared many interesting professional insights as well as his point of view regarding serverless database platfor...

How to kill by name from the command line - ubuntu 17.10

December 27, 2017

Ubuntu Tip The pain So I am working on coding a react and redux component and I have a tight loop spinning in chrome. Chrome becomes unresponsive and won't stop. Soon it will eat up all the system memory and cause my machine to grind to a halt. For some reason chrome rarely detects the rapid resource growth. I used to open a terminal and run $ ps -A to look up chrome's pid but chrome has many pids one for each window and ne per extension. My machine is slowing. I next try: $ ps -A | grep chrome this is better, I choose the first pid (I might have to scroll) and $ kill -9 <pid> And thing go back to normal. But I still haven't fixed the bug and there has to be a better way... The remedy $ killall -9 chrome and this kill all chrome processes - one command and no lookups copy pastes etc. Note Probably nothing ubuntu 17.10 here ....

Simpla goes open source

December 06, 2017

Simpla the headless content management system has recently announced they are closing down and making their project open source. This project allows a developer to rapidly prototype a website and a editors to manage the content from the page's ui itself. The big change is that you will not be able to move your content from the simpla database and host it on github. Headless means that a CMS don't have a huge front end like wordpress to manage the code. Instead their backend is exposed as a simple API allowing developers to use whatever integration is best suited for each user story. Headless CMS are more suitable for working with multiple channels such as android, ios app alongside a website. Trying to setup a new project using simple is easier said than done. Once I'm up and running I'll add some more updates in this space.

Building a OCR using NLI Ephermera and manuscripts.

December 03, 2017

The OCR task can be broken down as follows. Acquire the image. Segment it into regions according to the following labels: Image, Text Areas with optional rotation Tabular Data with optional rotation Scale down very large text to suitable size glyphs Improve results by adding terms to better model the noise extant on page. Improve results by using lexical and grammatical knowledge into classifier. Ideally all this should be done by an end to end system. A common complex layout of Hebrew sacred texts with non rectangular columns with related but independent sequences Once text area are detected requires a page segmentation algorithm to break down text areas into lines and glyphs. Looking at some samples from the NLI ephemera database one would wish to add steps to clean up and rescale some elements whose fonts are too small. Also if one had a suitable model, perhaps add details to text that is too small. The challenges are...

Search This Blog

Bayesian.Ninja

Posts

downloading folders from google drive.

Automating web app development with Polymer and Yeoman

Serverless Big Data

How to kill by name from the command line - ubuntu 17.10

Simpla goes open source

Building a OCR using NLI Ephermera and manuscripts.

Popular posts from this blog

Random Thoughts on Linear Regressions

Big Data Analytics Israel - New Year, New Data Scientist Job: 5 Things To Think About

Moodle <=< Mediawiki SUL integration - first thoughts