Notice: Function _load_textdomain_just_in_time was called incorrectly. Translation loading for the twentytwentyone domain was triggered too early. This is usually an indicator for some code in the plugin or theme running too early. Translations should be loaded at the init action or later. Please see Debugging in WordPress for more information. (This message was added in version 6.7.0.) in /home1/moderna7/public_html/wp-includes/functions.php on line 6131

Warning: Cannot modify header information - headers already sent by (output started at /home1/moderna7/public_html/wp-includes/functions.php:6131) in /home1/moderna7/public_html/wp-includes/feed-rss2.php on line 8
SQL – Becoming A Data Scientist https://www.becomingadatascientist.com Documenting my path from "SQL Data Analyst pursuing an Engineering Master's Degree" to "Data Scientist" Mon, 19 Oct 2015 03:13:24 +0000 en-US hourly 1 https://wordpress.org/?v=6.9.4 Playing With Google Cloud Datalab https://www.becomingadatascientist.com/2015/10/18/google-datalab/ https://www.becomingadatascientist.com/2015/10/18/google-datalab/#comments Mon, 19 Oct 2015 02:59:47 +0000 https://www.becomingadatascientist.com/?p=686 Continue reading Playing With Google Cloud Datalab]]> This weekend, I played around with the newly-released Google Cloud Datalab. I learned how to use BigQuery and also played around with Google Charts vs Pandas+Matplotlib plots, since you can do both in Datalab.

datalab

I had a few frustrations with it because the documentation isn’t great, and also sometimes it would silently timeout and it wasn’t clear why nothing was running, but if I stopped all of the services, closed, restarted DataLab, and reopened, everything would work fine again. It’s clearly in Beta, but I had fun learning how to get it up and running, and it was cool to be able to write SQL in a Jupyter notebook.

I tried to connect to my Google Analytics account, but apparently you need a paid Pro account to do that, so I just connected to one of the built-in public datasets. If you view the notebooks, you will see I clearly wasn’t trying to do any in-depth analysis. I was just playing around and getting the queries, dataframes, and charts to work.

I hadn’t planned to get into too many details here, but wanted to share the results. I did jot down notes for myself as I set it up, which I’ll link to below, and you can see the two notebooks I made as I explored DataLab.

Exploring BigQuery and Google Charts
Version Using Pandas and Matplotlib
(These aren’t tidied up to look professional – please forgive any typos or messy approaches!)

Google Cloud Datalab Setup Notes (These are notes I jotted down for myself as I went through the setup steps. Sorry if they’re not intelligible!)

]]>
https://www.becomingadatascientist.com/2015/10/18/google-datalab/feed/ 1
Relative Year SQL https://www.becomingadatascientist.com/2015/05/25/relative-year-sql/ https://www.becomingadatascientist.com/2015/05/25/relative-year-sql/#comments Mon, 25 May 2015 17:28:41 +0000 https://www.becomingadatascientist.com/?p=482 I wrote this SQL code recently and wanted to share it here (in a modified form to simplify). This isn’t a “typical” SQL SELECT statement, because of how each row is checking the rest of the table relative to its own fiscal year value. (If you hover over the code, a menu will appear at the top with a button toward the right that allows you to open it in another window for better viewing.)

select fiscal_year, donorid,
    (case when DonorID IN (select donorid from Donations d1 where d1.fiscal_year = d.FISCAL_YEAR - 1 and HardCredit > 0) then 'Retained'
         when DonorID IN (select donorid from Donations d2 where d2.fiscal_year >= d.FISCAL_YEAR - 5 and d2.FISCAL_YEAR <= d.fiscal_year - 2 and HardCredit > 0) then 'Reactivated 2-5'
         when DonorID IN (select donorid from Donations d3 where HardCredit > 0 and d3.FISCAL_YEAR < d.FISCAL_YEAR - 5 ) then 'Reactivated Lapsed'   
         else 'Acquired Donor'
         end)  FY_DonorType
from Donations d 
    where Donor_Record_Type = 'A' and HardCredit > 0 
    group by fiscal_year, donorid,  (case when DonorID IN (select donorid from Donations d1 where d1.fiscal_year = d.FISCAL_YEAR - 1 and HardCredit > 0) then 'Retained'
         when DonorID IN (select donorid from Donations d2 where d2.fiscal_year >= d.FISCAL_YEAR - 5 and d2.FISCAL_YEAR <= d.fiscal_year - 2 and HardCredit > 0) then 'Reactivated 2-5'
         when DonorID IN (select donorid from Donations d3 where HardCredit > 0 and d3.FISCAL_YEAR < d.FISCAL_YEAR - 5 ) then 'Reactivated Lapsed'   
         else 'Acquired Donor'
         end)  
         ;

So in the Donations table, there is one row per donor per gift. There can be multiple gifts in a fiscal year, but as soon as the first gift is made, the donor can then be given a FY_DonorType using a CASE statement. If the donor also gave last year, then the donor is “Retained”. If they have given in the past, they’re “Reactivated”. I separated out those reactivated in the last 2-5 years and those that hadn’t given for longer, since once a donor had not given for 5 years, they are considered “Long Lapsed” and much harder to reactivate. If the donor had never appeared in the table before, he or she is a New donor and marked as “Acquired”. We are not looking at how many people gave last year but not this year (“Lost”), but only the breakdown of who this year’s donors are.

Since the code groups by fiscal year and donor ID, and the case statements include selects that look at previous years relative to each gift’s fiscal year, you can look at how the “current year” donors break down every year. Each year is relative to the previous years.

This allows us to make visualizations that give some insight into how the fundraising team performed each year. Some years, the fundraising organization was particularly good at reactivating lost donors. Some years they retained donors from the previous fiscal year at a high rate, but the ones that hadn’t given for more than a year must not have been targeted well and fell off more than usual. This is likely a result of how often they solicited each group, and what type of solicitations were used. (I purposely hid the Y-axis and other info since I’m using this for illustrative purposes and not trying to give away details related to the fundraiser.)

donor_retention_illustration

What database tables do you have that could be analyzed in this way?

]]>
https://www.becomingadatascientist.com/2015/05/25/relative-year-sql/feed/ 4