Why computational methods are better than self-report
- cbihlmeyer
- Mar 2
- 3 min read
I’ve been comfort watching a lot of House lately and if you have not watched it, this is going to ruin every episode. House’s mantra: “People Lie” is the “twist” that solves every case in this early aughts medical drama.
Most often, when researchers want to know about something, they use surveys, or self-report measures – how often did you do this? How do you feel about that? To what extent do you agree that this other thing is reasonable?
This methodology of asking people questions is a pretty good way to understand how people feel about a topic. Especially because surveys are (usually) anonymous per protocol, we expect people to be (mostly) honest about how they feel. However, people are not as honest about the things that they do – their behaviors.
Some issues with accuracy in self-reported behaviors include:
- Inaccurate recollection (just don’t remember)
- Social desirability (answering in a way that makes you look better)
- We remember things better when there is special importance or significant meaning
Let’s think about the research example where I think about how to define a behavior which would represent the “success” of competitive award/professional development recipients.
Which of those accuracy factors listed above could most impact how people respond? And how can I overcome this survey bias?
People might only remember the connections that they made which were important or had some kind of personal significance to them. You might not exactly remember the connection request you got from someone you hardly talked to, but you will probably remember that you tried to connect with the CEO of a company or some famous researcher. You will probably also remember that friend you made and how much you wanted to stay in touch.

During a given time period, such as when you are interning at a prestigious firm or working on a project funded by some agency, you might accept or extend many LinkedIn requests. Particularly if you are in a temporary, high-status position, like a professional development/grant award, you will make a lot of new connections – not all of them will feel significant at the time, or, may lose significance in the time after the award.
To overcome the error of inaccurate memory, we can think about computational methods to measure observational data.
It would be impossible to literally observe each a persons’ real-time LinkedIn behaviors over the entire time of an award period – let alone, an entire research participant pool. However, with newer, user-controlled data each user has the ability to download their own digital behaviors on the platform.
So. I used myself as a research guinea pig.
As a Fulbright Foundation alum, I am a recipient of a highly competitive research grant. Using my own LinkedIn data, I asked, “How many connections with professionals in my field did I add on LinkedIn during my Fulbright?”
Behavior | Action Element | Target | Context | Time |
Networking | Add connections | With professionals in your field | On LinkedIn | During the time of your award period? |
Here, you can see my weekly connections on LinkedIn, of all the time I have been on the platform:

Above, you can see weekly connections per week rarely exceed 5 new connections added per-week. There is a noticeable spike in late 2022 with almost 30 connections in one week. This was not during my grant year!
My Fulbright grant period was August 1, 2021 – August 1, 2022.

The new connections I added during this time do not deviate from the non-award years. I add about 2 connections per week, with the highest networking week adding 4 new connecitons.
Answering the “action element” part of the behavioral analysis model – how many connections did I make in the grant award period – the total was 28.

The figure above considers the “target” part of the action-target-context-time behavioral analysis model. Did I make connections with professionals in my field?
That is hard to determine with the data provided; current LinkedIn data shows people’s positions as they are currently displayed when I download the data.
We can see the name of the companies where the connections I made during my award period work today, but not where they worked during the time I am trying to investigate.
For example, “Educational Service Center of Central Ohio” is my friend Claire – but that was not her company at the time we connected on LinkedIn, during our grant year together.
Computational methods can account for some gaps in memory, like inaccurate recollection or simply not remembering all of my connection behaviors.
However, there are methodological trade-offs, like changes in the digital footprint.
How can these limitations be overcome? Can multiple variables be combined, to give a better picture of my networks' professional fields - also called feature engineering)?
See my full analysis and code in GitHub to re-create your own in R using your own LinkedIn user-controlled data!
Comments