Building Data Driven Orgs
My organizers love their Google Forms. And I can't blame them. Our whole organization is already embedded in the Google ecosphere; you can create a new form in a matter of a few simple clicks; and the product is versatile enough to serve as a survey tool, application form, or event RSVP. When a user submits a response to the form, the organizer can see that response immediately exactly in the way they expect - one row per response with each field corresponding to the answer to their questions. They can also add columns to the right of the automatic spreadsheet that Google sets up where they can interact with their responses: checking people off as accepted to a program, adding notes on people, moving RSVPs into discussion group rooms. It is a brilliant, intuitive piece of software that empowers anyone to collect data.
But when our organization transitioned to a new CRM (EveryAction), the golden age of Google Forms came to an end. From the stand point of the Data Department, Google Forms did not sync to our data warehouse and it limited our ability to enforce data collection standards. We did not want to engineer a sync every time someone created a new Google Form, and we had just come out the end of a massive data cleaning project. Not wanting to ever face a situation again where we had 7 different variables for age data, my department brought the hammer down on Google Form creation.
And this was great! For the Data Department.
Our organizers, on the other hand, were struggling to adapt to the change. Whereas before they could freeform write any question they desired, organizers now had to deal with built in questions with locked response options and custom questions that mapped to other fields in EveryAction. In the beginning my team was able to engineer a solution, or at least some kind of hack, for the functionality our CRM did not natively provide. But there was one feature from Google Forms that eluded us...
Organizers need to be able to see and work with their form responses in as close to real time as possible. In Google Forms, users can generate a spreadsheet and watch their responses roll in. In our new CRM, beyond our comprehension, the standard fields like name and email could be viewed in one manually generated report while all the responses to any custom questions—which encompasses the main part of most forms that we create—were stored in a separate report. Organizers were unable to see all their responses in one place, they had to manually generate the report every time they wanted to check in on their form, and they could not interact with their data.
"Okay, we'll just engineer a sync."
Doing Things the Hard Way
The idea was this: take the backend data from EveryAction that syncs to our warehouse, wrangle it together into something that looks like the Google Form response spreadsheet, and push it to a Google Form. Sounds easy enough.
Let's take a look at how EveryAction stores the questions and responses to Online Forms in their commercial sync (which is called capital 'P' Pipeline). The Online Forms and the standard fields associated with the responses are in two, easy enough to join, tables. However, the responses to custom questions are stored in a table corresponding to the type of custom question.
Some sleuthing, a lot of bad left joins, and one big refactor later, we had our solution. But it wasn't pretty. For every EveryAction form that we want to engineer a sync for, we needed to write a block of SQL that looked like this, give or take a few tweaks depending on the type of custom question.
-- get responses to all custom questions type date
date_responses AS (
SELECT
responses.contactsonlineformid,
questions.onlineformid,
responses.onlineformquestionid AS questionid_date,
questions.onlineformquestionname AS question_name,
responses.responsevalue AS response
FROM sunrise_ea.tsm_tmc_contactsonlineformsresponses_date_sm AS responses
LEFT JOIN sunrise_ea.tsm_tmc_onlineformquestions_sm AS questions
ON responses.onlineformquestionid = questions.onlineformquestionid
WHERE questions.onlineformid = {{onlineformid}}
),
-- pivot date responses
pivot_date AS (
SELECT
contactsonlineformid,
MAX(CASE
WHEN question_name ilike '%{{internal_question_name}}%' THEN response
ELSE NULL
END) AS {{alias}}
FROM date_responses
GROUP BY 1
)
The first CTE takes the question name and response value for the custom question type, in this case "date", on a specific onlineformid. The second CTE pivots that data into something more pleasing to the eye. Each row in the second CTE corresponds to one response from a user. We would then join together the pivoted date responses to the pivoted shorttext responses to the pivoted multiple choice responses... you get the idea. The final CTE created the view that mimicked what you would see in Google Forms: one row per person, with their name email and other information all the responses to the forms custom questions across the columns.
You can see the full unwieldy code here. We launched this workflow to our organizing staff and were soon fielding requests from all sorts of teams who wanted EveryAction syncs to Google Sheets.
We quickly realized this was not going to work: First, writing the SQL required the engineer to know the custom question aliases set up in EveryAction. There was no organizational standard for creating aliases at the time, meaning a lot of the aliases were long and difficult to work with in SQL (and sometimes down right hilarious). To set up the sync, the engineer would need to log in to EveryAction, find the form on the front end, and then manually copy over the question names from the awkward interface. Second, the engineer had to assemble the sync from the template by hand. It was easy to break a join, insert a typo that could never be found, and screw up matching the question name to the question type.
In short, this solution was
cumbersome
slow
and prone to failure
And when you cannot deliver accurate results quick to organizers, you lose their buy in to your system. And when your organizers do not use your systems, your organizations becomes less data driven.
Our optimism slowly faded and eventually we had to face the music. Our solution was not going to cut it.
The Magic of Macros
I was very lucky to meet Claire Carroll, Community Manager at Fishtown Analytics, in the summer of 2020. She generously agreed to mentor me and along the way introduced me to dbt, a powertool for the T part of ELT.
Claire was able to refactor the original manual EveryAction/Google Sheet workflow as a macro in dbt. In dbt, macros are written in a Python templating language called Jinja. Jinja allows us to get funky with SQL: set variables, create lists and dictionaries, write for loops, and more.
What was once a several hour process to write, set up, and troubleshoot a sync was cut down to a few minutes with this command:
-- replace 1175 with the correct onlineformid
{{ get_form_sql(1175) }}
This commands calls upon the macro "get_form_sql" and passes the onlineformid in the parentheses. For every custom question in the Online Form corresponding to the onlineformid, the macro generates the SQL to get and pivot the responses for each custom question type. And, since we're in dbt, running this macro automatically generates a view in our data warehouse.
The only manual part of this workflow is finding the onlineformid and setting up the push to Google Sheets in Civis (a progressive warehousing tool). The Data Team no longer had to write out custom question names by hand or fiddle with joining together the correct CTEs. Our workflow was finally robust.
We did not think the rest of the organization would be as excited as we were about some refactored code, but we did announce to the staff again that the EveryAction sync had been improved with magic (and honestly, dbt is kind of like magic) and that we were expecting to process to go much smoother this time around. The requests for the syncs rolled in, we were able to set them all up with ease. We called the macro, passed along the onlineformid, and then used Civis's built in export to Google Sheets to move the data wherever our organizers needed it. And so far (*fingers crossed*), none of the syncs have failed.
So Really? No Google Forms?
I do not presume that I have transformed my entire organization into a data-driven one in just one year. My team still finds the rogue Google Form floating around, created by an organizer in a hurry or who has yet to hear our spiel about age data collection standards. I do think, however, that my team has been successful in finding small ways to intervene that give big rewards to our organizers.
Syncing form responses from our CRM to a Google Sheet may sound trivial, but at the end of the day my work is about serving our organizers. When we realized how important it was for organizers to see and interact with the responses to their forms, we were eager to do the heavy lifting in the backend to meet their needs. Our organizers may not understand the lengths we went through to get the form responses sync running, but when our systems work they reward us by using them. And when your organization is fully bought in to your systems, whether it's about using your CRM or naming your survey question aliases a certain way, you get one step closer to a data driven organization.
I would love to hear more about what your department "[bringing] the hammer down on Google Form creation" looked like.
My org uses FormAssembly which syncs data to Salesforce automatically, so the issue in this article isn't one we face. The issue we run into is that folks want to be able to create their own forms without waiting for the data team to support this form creation. In an ideal world, I'd just disable everyone at my org's permission to create Google Forms (I have talked to ops about the possibility of this), but short of that I'm struggling to get folks away from Google Forms
I love that you did this. And I'd like to figure out how to push this out to our team as well. It addresses an interesting problem which is growth and innovation. As an org progresses from multiple phases, which I'll loosely name as: 1 Everyone knows each other (10-20) 2 We don't know everyone, but we can probably get in touch if needed (30-150) 3 The org is larger than a small village (150+) These are important milestones. And once you get past 3 you really need a sizeable team that just manages internal communication and organization which by the way, fucking no one actually does in practice. Franchised volunteer orgs have a very interesting growth problem. What McDonalds has over The Sierra Club is that…