Email stats – Jeffrey Pomerantz

Inspired by Paul’s #noemail project, I’ve been wondering what the cost-benefit of the various media I employ is. Despite being solidly in the “I wish I could do that” group, I’m not quite ready to divest from email. For me, the most important consideration is that, as Paul points out, our shared institution relies heavily on email, and there is no real institutional solution other forms of communication and data sharing. As tempting as it is, some days, to cut myself off from many of my colleagues’ conversations, I’m not quite ready, or perhaps not quite brave enough, to do that. But the #noemail project got me thinking about how much time I spend on email, and other media, and what I get out of it. So I decided to do what any good academic would do: I decided to collect some data.

Following Paul’s lead, I started with email. Plus, email seemed like the easiest medium on which to collect data.

In the interest of ease of data collection, I decided to care about only 3 categories of email:

Received: Number of email messages I receive per day. This is individual messages, not threads: if I get 3 messages from the same person as part of one conversation, then I count 3 messages.
Read: Number of messages I read per day. I delete many messages unread: the Subject line and 2-line preview is my friend. I’m also going to be liberal about this one: if I start to read an email that turns out to be spam or otherwise irrelevant, I’m going to count it as read, even if I didn’t read the whole thing.
Replied: Number of messages to which I reply per day. This is not equal to the number of messages I send per day, because I initiate some email conversations… though as few as I can get away with. This is the number of received and read messages to which I reply.

These categories are, naturally, flawed. Received email includes spam. Spam is always an issue, but especially so since I was migrated to Exchange: I get far more spam now, in Exchange, than I did when I was forwarding my UNC email to Gmail. I’ve been setting up Inbox Rules, to filter out as much spam as I can. (Read: Exchange is a radical downgrade from Gmail.) But plenty still gets through. So the Received category includes spam. The Read category includes only those messages that I opened, even though I read the Subject line and 2-line preview of probably every email I receive, to decide if I want to read it in more depth. So that’s cheating a little, maybe. But opening an email and reading the whole thing takes more time than reading just the Subject line and preview, and that time spent is what I’m interested in. Finally, Replied includes only email that I received that I replied to, not email threads that I initiated. I try to do that as little as possible, but sometimes it’s unavoidable. So anyway, the usual limitations apply: These categories are flawed. It’s only one person’s data, it’s only 2 weeks worth of data. This data is only counts; I didn’t collect any data on the content of these emails. Et cetera.

How did I collect this data? Unfortunately, manually. I created a simple 3-field Google spreadsheet form. Every morning, I look through my Inbox and Trash folders, count up the 3 categories of emails above for the previous day, and fill in the form. Now that it’s summer, I’m sitting in front of a computer far less than during the semester, so most of my email is read & written on my iPhone. (I love the Dragon Dictation app.) I’ve read about iPhone usage forensic software, the name of which I no longer recall, which I’d love to use. But when I read about it, I looked into it, & I recall it being about $700, which is approximately 2 orders of magnitude more than I’m willing to spend, given that this data collection is a total whim on my part. I indulge my whims, perhaps too often, but not that much. (If anyone has a less expensive iPhone or Mac usage forensics solution, I’d love to hear about it.)

So: Obviously Replied will be a subset of Read, which will be a subset of Received. The question is, what are the percentages of these subsets?

I’ve collected data on this for 2 weeks now. Admittedly, 2 weeks isn’t much time… and anyway it’s summer, so email traffic is down in the halls of academia (though not as much as I’d wish). But I wasn’t going to wait until August to collect this data. So anyway, here it is. As my colleague Chuck says: A little mediocre data is better than no data at all.

The first thing to notice is that Thursdays are the big email day. I would have expected Monday, to be honest, but apparently I’d have been wrong. I never could get the hang of Thursdays.

The second thing to notice is that, predictably, weekend email volume is way down. Also, 5/30 was Memorial Day, which is why that day, though a Monday, had weekend-level numbers.

This data indicates that I receive approximately 30 email messages per weekday, ± approximately 10. I read approximately 50% of those, which honestly is a higher percentage than I had expected. I reply to less than 10% of the messages I receive, less than 20% of the messages I read.

As an aside, Paul reports that he gets “about 100+ new messages, post-spam filtering, hit my inbox every 12 hours.” So clearly I shouldn’t complain… I’m at a third of his volume. But then, I’m not as much of a network hub as Paul. Obscurity has its compensations, clearly.

Looking at this data the other way ’round, I delete unread approximately 50% of the emails I receive. So clearly I need better spam filters and a better killfile (or the Exchange equivalent). I’m not spending a lot of time deleting all those emails, but even a little time is too much. Also, more than 80% of the emails I read do not require a reply. Clearly much of the material in the email I receive could just as easily be handled by RSS feeds. I’ve moved as many of my mailing lists and table of contents subscriptions to RSS feeds as I can, over the past few years, so those are off the table. What’s left is, I’ll be honest, mostly institutional announcements, discussions among my colleagues (other than Paul, as of 3 days ago), and conversations with students. I don’t have much hope that the first 2 will move off email any time soon. I can do better with the third.

So there you have it: 2 weeks of email data. Problematic though it may be. This is perhaps just the start: I’m interested in how I use other media as well. Paul writes that “Twitter always returns more value for time spent for me than email.” I’m not actually sure that’s the case for me. But that’s fine: the point here is not to emulate Paul (worthy goal though that may be), but to look critically at my own media use. The problem is, I’m not sure how to collect data on my Twitter (or Facebook or whatever) use: not just what data to collect, but logistically, how to collect it… cf. previous comment about iPhone usage forensic software. Suggestions, anyone?

1 Comment

Mr. Gunn
13 March 2012
Hi Jeffrey, I was just reading your copyfight story and saw this link. If you’re still looking for something to self-archive and analyze your Twitter, FB, and Google+ usage, try http://thinkupapp.com/ I have been using it for a little over a year and I think it’s really useful. You do have to install it on a server, but I’d be happy to host an installation on my server since I’ve already gone through setting it up for myself, if you don’t have the time/inclination/ability to do it yourself.

Thanks for sharing your copyfight story.

Jeffrey Pomerantz

1 Comment

Mr. Gunn

Leave a Reply Cancel reply