Online Methods (e.g., an FAQ)

There’s a wealth of incredibly interesting questions about questions, as you can imagine! We figured we’d take some of the most common ones we get, and condense them down into one big FAQ.

Do you record/account for question seniority?

The principle underlying this question is as follows: “Who’s in the room” varies along many axes outside of gender. These include things like academic seniority. Perhaps the population of question-askers is actually a smaller subset of who is literally in the room, along such an axis. For instance, maybe only faculty ask questions.
This is challenging for us to literally evaluate at ASHG on a per-question basis, as this would require identifying question-askers.
However, in smaller study environments, we’ve been able to do something which approximates this, which is to stratify “who’s in the room” along the axis of seniority. For instance, at the Biology of Genomes meeting, the abstract booklet contains PhD / non-PhD status. This means it’s possible to separate out faculty and postdocs, and look at both of those attendee proportions. As you can see, they are different (PhDs are less female), but not different enough to explain the observed efffect.
bog_15_demog_and_intervention_comparison-01.pngFigure: Question-askers (left) at Biology of Genomes 2015 (total qs: 147), as compared to proportions of attendees (right). BoG 2015 is chosen as this is prior to any publicization of data-collection or data gathering. See right that non-PhD holding attendees are somewhat more female than PhD-holding attendees; however, this is difference is substantially smaller than the difference required to explain the proportion of female questioners.)

Is your gender classifier accurate for names from other countries?

In short, yes, as much as possible. We use genderizer, available for both Python and R, which draws on hundreds of thousands of names from almost 100 countries. As a result, our classification is as complete as possible given this information, and we achieve a classification rate of about 70% (see below), which we use to estimate the proportions of women and men present.

How can you be sure your proportions estimated are correct?

Of course, we couldn’t be certain, unless we had a perfect ground truth. But luckily, we’re close! Since 2016, ASHG has internally allowed people to report gender on registration. We compare the inferred_v_reported genders for 2016 and 2017, and see that our pipeline estimates are extremely similar.

How are the people who ask questions chosen? Could the people choosing them be biased?

This question is undoubtedly informed by the large body of literature confirming that teachers in the classroom spend more time speaking to and interacting with male students. Correspondingly they also call on female students less and interrupt them more. (This is mostly the work of Sadker and Sadker and is described well in either David Sadker’s book or this broader textbook.

However, ASHG is remarkably equanimous, as there are self-selecting microphone lines. Admittedly, not during every session is there an opportunity for every person to ask all the questions they want. (However, we record these sessions.) We also record positioning of microphone and speakers at microphones. Since the lines are self-selecting, there’s no need for a moderator or any other potentially biased figure to be choosing hands amongst a crowd.

At the Biology of Genomes meeting, where we also collect data, the microphones are held by individuals and move. In this different scheme (which might be slightly more biased, as the individual with the microphone has to move towards someone soliciting it) we still record a similar magnitude of effect [binom(16,147,0.35), p=2e-11] prior to any intervention on our part.

How do you figure out that men ask men, and women ask women, if not all speakers and audiences are in the same room at the same time?

 In essence this question gets at the following idea. What if most women are in rooms with mostly female speakers, such as ELSI sessions. And what if most men are in rooms with mostly male speakers, such as Bioinformatics sessions.
Wouldn’t this create a (not-perfectly-symmetric) bias for women to ask questions towards women, and men to ask questions towards men?
Yes, that’s absolutely right, it would! To account for this, we wanted to test for consistent, within-category bias. In essence, imagine a contingency table for each category with frequencies, set up like this:
speaker // asker Male Female
Male p 1-p
Female 1-q q

What you’d expect is, regardless of the session, to see p and q (the male-to-male and female-to-female questions) having a little bit more weight than (1-q), (1-p).

In particular, you can measure this by looking at the difference, pq – (1-q)(1-p). This represents the difference between products of frequencies of same-to-same questions and different-to-different questions. Under a null, the expectation of this difference should be 0. If the difference was greater than zero, this would suggest there are more same-to-same questions.

To test this, we take the questions within each invited sub category, as follows, and re-assign them. So say you have 20 questions – we assign each one of them to come from a female asker to a female speaker, or female to male, etcetera and we do 10 thousand such permutations for each sub category. From this, we calculate a mean statistic and look at the distribution of those statistics.

Of course we calculate the same statistic for our own dataset, and as you can see, there’s a signficant skew observed in our data (pink) over the permuted sets (black)

Pasted image at 2017_10_11 01_31 PM

So we subsequently conclude there’s a session-stratification-controlled significant bias towards female-to-female and male-to-male (same-gender) questions, as opposed to female-to-male and male-to-female (cross-gender) questions (p=8.1e-5)

We verify the accuracy of this statistic by performing a similar test not on the frequencies but on the raw contingency tables of counts of questions in each category. We use the Mantel Haenszel (say that one three times, fast, out loud!) test to look at the combined odds ratio, and again, across sessions, we see the same consistent trend (p=0.004).

Since you’re crowdsourcing ASHG2017 collection, how do you know whether people are recording the same talks?

Great question! (Note: this answer pertains ONLY to the new crowdsourcing dataset). Participate at our crowdsourcing portal!

Each device that logs data into our database is anonymized and recorded (and controlled by a human, via CAPTCHA). This is how we build our question-entry leaderboard.

But wait. How do you match all the different recordings together for one talk?

(Note: this answer pertains ONLY to the new crowdsourcing dataset). Our entry-tracking means we can actually do a kind of string-alignment — something many of us geneticists should be familiar with — to ensure we’re matching questions correctly.

For example, imagine that user 1 records the whole question session. User 2 comes in for the next talk and starts recording midway, and user 3 leaves to go to another talk and stops early. As a result, you have something like this:

True String M M F M M F F M M M
User 1 M M F M M F F M M M
User 2 M M F M
User 3 F F M M M

You can even see that User 2 and User 3 don’t overlap at all!

However, in computational biology, we’ve developed a lot of methods to align strings and derive a consensus. And in fact, that’s actually what we do! We borrow standard Bioconductor packages to do a multiple sequence alignment and derive a “consensus” question string. As we continue