In my view, the most pernicious form of inequality in the developed world is data inequality.
Large corporations and financial institutions have the resources to identify, purchase, and exploit datasets on every imaginable topic. This isn’t news.
At the same time, private individuals, non-profits, and scrappy start-ups often don’t have the money to purchase data they need, the knowledge to know where to find the data they need, the network to access specialized datasets through partnerships, or the technology to collect, process, store, and operationalize publicly available data.
You do have public interest organizations like NYC Open Data, Sunlight Foundation, and Open Corporates have attempted enhance government and corporate transparency address by launching open-access web portals, which is a step in the right direction, but the fact remains: big business can purchase data about you and I that we don’t even know they have, and which we couldn’t afford to purchase even if we knew where to find it. This is wrong.
That’s why my fund was excited to back Randall Smith’s vision of democratizing access to data by participating in Lago Largo’s seed round.
Our interview with Randall, lightly edited for length and readability, follows.
As usual, if you would like to follow Special Situations please sign up for our email list or follow me on Twitter, @thechadlin.
who is randall
Chad Lin: Thanks for joining us on Special Situations, Randall. Data inequality is an important topic, and it deserves more attention.
Randall Smith: Thank you, Chad. Thank you for inviting me. Data inequality is often misunderstood by stakeholders in government, industry, and media, and it’s important for stakeholders to understand that democratizing access to data generates opportunities for everyone — it’s not a zero sum game.
CL: I love your passion Randall! But can you tell us a bit about yourself before we dive into the weeds?
RS: So at Lago Largo we want to create an environment where consumers have equal access to data. It’ll lower the cost of developing information services products, accelerating innovation, and reducing data inequality.
CL: I know, we all want to be in that lake Randall. And I know you don’t like to talk about yourself, but —
RS: Sure Chad. So, I did my undergraduate at Georgetown where I studied Chinese and Security Studies. After that, I worked at SeeingStone as a Deployment Strategist for two years.
CL: What did you do at SeeingStone?
RS: My team implemented on-site deployments of SeeingStone software for federal government clients.
CL: Federal government clients?
RS: Federal government clients.
CL: So I’m guessing you weren’t working with USPS.
RS: I really can’t comment. It’s kind of a Pete Buttigieg situation, where it sounds kind more interesting than it actually is.
CL: Got it. So what did you do after that?
RS: I worked in various roles at several US federal agencies for eight years.
CL: Can you tell us more about that?
RS: I helped design, develop, and implement programs for collecting, processing, and operationalizing novel sources of data for US federal agencies and select international partners.
CL: Sounds very secret squirrel. Can you give us any more details?
RS: I was tasked with objectives, I determined how to accomplish them, and then I did what needed to be done.
CL: I love that attitude. When I’m representing our fund on a portfolio company’s board, that’s exactly the dynamic I like to establish.
CL: So turning back to Lago Largo — Data inequality, data equality, what do these terms actually mean? Can you help our readers understand what’s at stake here?
RS: Sure Chad. Let me think on that. Well, your audience is finance, so I’ll use an analogy from that space.
Today, the data sector is pretty similar to the US financial services sector in the 80s. There was a lot of innovation. Private equity funds were making a killing off LBOs. Investment banks and hedge funds were making a killing off of junk bonds and derivatives.
But market participants who represented the little guy, like Citibank, JP Morgan, were left out because of over-regulation. What that meant Chad, is that elites with the resources to invest in hedge funds, private equity funds, and investment banks generated outsized returns, but hard-working people like our parents were locked out.
CL: And that was unfair.
RS: Exactly And we’re seeing the same situation today; Silicon Valley is making money off of our data. Hedge funds and private equity funds pay millions for specialized datasets based on ostensibly public information to inform their decision making. But the small seed-stage start-up can’t afford that data. Incumbents don’t have an incentive to sell that data to them even if they could afford it. And you and I, we have no chance at buying that data.
Everyone deserves equality opportunity to generate value from data that can be legally obtained.
CL: Very interesting Randall. So what’s your plan?
RS: We need to democratize access to information Chad. It’s really that simple. We need democratize access to innovation in the data and information services space the same way we democratized access to innovation in the finance space.
We did that in the financial services sector through Financial Services Modernization Act (FSMA) and Commodity Futures Modernization Act (CFMA). They allowed mom and pop commercial banks to engage in the same sort of lucrative activities as investment banks, hedge funds, etc.
And you know, it wasn’t just retail investors who benefited. Financial innovation in the form of securitized sub-prime mortgages and credit-default swaps drove down the price of housing. Everyone benefited.
CL: That’s a great analogy Randall. Our readers will appreciate that. You know, when I’m listening to a pitch it’s a red flag if the founders can’t weave in a good historical analogy. Good historical analogies demonstrate that you really understand the market and understand the product.
RS: Thanks Chad.
CL: So tell me about Largo Lago.
RS: So after we got our seed funding from In-Q-Tel in 2015, we -
CL: In-Q-Tel — that’s the CIA’s investment arm right? Very secret squirrel.
RS: I mean, it sounds much more interesting than it actually is. They invested in Gitlab, MongoDB…I wouldn’t make any inferences on our relationship to the government based on that.
CL: Sure, okay. Sorry to cut you off there, what were you saying?
RS: So what we want to do is set up a three-sided open data marketplace: (a) sellers of specialized datasets, (b) buyers of specialized datasets, and (c) organizations that specialize in generating insights through the synthesis of disparate datasets.
CL: So like, an AWS for data?
RS: I like to think of it as a bustling port town in a fantasy novel, filled with memorable characters. Where anything can be had for the right price. Like the market in Kings Landing.
CL: That’s familiar and makes me feel comfortable with this concept. Tell me more about who’s going to be participating in this market?
RS: Sure. First, you have the sellers. We have a prominent largest banking as-a service platform for anonymized transaction data. We have sellers who crawl social media and search engines to collect data from open sources — social media, public records, etc. We have crypto exchanges sending us transaction data. We have sellers who monitor dark web forums for data leaks and run persona ops to get data about stolen datasets. We have partners who run AI chat-bots with users involved in certain sectors in certain geographies. So and so forth.
CL: Wow. Wait, so you’re saying some of your sellers actually engage in conversations online with targeted individuals, and then resell that data? You can do that?
RS: The short answer is yes, for now.
You know Chad, when you’re an organization operating the cutting-edge, it’s your responsibility to set norms for activities like persona ops.
Earlier this year the Department of Justice released guidance on dark web intelligence collection. They basically said, don’t use stolen credentials, don’t break the law, and yeah, there’s a risk that your persona might end up being investigated if it’s interacting with people engaged in illicit activity.
CL: So definitely not illegal.
RS: Exactly. We choose In-Q-Tel to lead our seed round because of their experience in guiding innovative organizations through complex regulatory environments.
CL: Okay, so now you have all of this data. What’s a sample use case?
RS: There are two use cases. Entity reconciliation and predictive analytics. First, we are gathering enough “anonymous” data that organizations on the data synthesis side of the market place are able to reconcile identities, which is very valuable.
Putting a name to the browser cookie, a social security number to the iPhone, etc. Right now, big organizations are able to say “hey, such and such IP address visited my website. I’ll take that IP address, and hey, this iPhone and this IP address are co-located 8 hours a day. Whoa! This iPhone belongs to Jim! Let’s target Jim!”
But we can’t do that Chad. If I’m a blogger, I can’t take an IP address of someone who visited my blog, link it to a geographic location, and then send that person an unsolicited email telling them to sign up for my blog. And that’s just not fair. It’s not Chad.
CL: It’s not Randall. So this is a bit of a shift for me. But it makes sense. What you’re trying to do is replace data privacy with data equality.
RS: Exactly. Privacy is un-democratic Chad. Big companies have to disclose their activities to the SEC and everyone gets that information to make decisions off of. Politicians and lobbyists have to disclose activities that might be of interest to the public.
Or capitalism, really. What if restaurants kept the price of a meal private until you ordered?
We’re trying to democratize access to data.
CL: That makes sense. Who who are the buyers?
RS: First, financial services. A lot of people are trying to make money off of buying life insurance policies, you know. We can inform those trades by offering investors of all stripes access to boutique data. For instance, according to our partners in the analytics space, owning a GoPro increases the likelihood you die in the next five years by 5.58%. That’s material to an investor.
RS: And then you have underwriting. FICO is flawed. Next-generation lenders are trying to find alternative data that they can use to more accurately assess risk. They’ve actually been able widen the pool of eligible borrowers without reducing risk-adjusted returns.
They’re coming to us asking stuff like hey, can you tell us how old the primary vehicle of the person with this social security number is? Hey, can you tell us whether English is the first language of the person with this IP address?
And we say of course!
CL: What are they doing with that?
RS: Two things. Payday lenders are optimizing their ad spend — if you know how old someone’s car is, you can spend more on people with older cars, as those cars are more likely to break down, and one of their big acquisition channels is people who need cash for car repairs. On the other hand, you have underwriters who are trying to price loans, and can use that information to price risk more efficiently.
CL: So you’re saying you can charge people who don’t speak English as a first language more for loans? Doesn’t that violate the Equal Credit Opportunity Act, for discriminating on the basis of nationality?
RS: No, because it doesn’t discriminate on a protected characteristic. Believe it or not, there are Americans of all shades and creeds for whom English is not a first language.
CL: Great observation Randall.
RS: And then you have corporates. My favorite use case with corporates is helping them identify and mitigate data breaches.
CL: So you inform them about breaches?
RS: Our partners in persona ops find brokers advertising stolen datasets everyday.
CL: And they interact with them? Do they ever solicit them to steal datasets?
RS: Our partners do whatever is necessary to help our clients understand their threat environments.
CL: But they’re not say, going to brokers, asking them to hack someone, and then telling the company “hey someone hacked you.”
RS: I don’t want to speak for our partners, but look Chad. Some people might consider that blackmail. I look at it is marketing.
Remember Binoculars Advanced Protection?
CL: Of course! (jangles keychain with FOB dongle).
RS: I wonder how many people bought those dongles because they received an alert from Binoculars warning that they were “targeted by government sponsored attackers.” Did they release any evidence?
I’m not making any accusations. They probably had the data they said they had. All I’m saying is that a small, boot-strapped start-up should be able to access the data they need to engage in the same type of marketing practices as big-tech. So your boot-strapped cybersecurity company should also have data access to the same data on the activities of state-sponsored as the threat intelligence teams at big tech companies.
CL: That totally tracks for me.
RS: And then you have government. Higher education faces unprecedented challenges, especially in the age of the coronavirus. Why shouldn’t they have access to specialized datasets to help them make decisions about admissions and financial aid? Or identify potential school shooters?
RS: Our biggest growth area right now is actually foreign governments.
RS: Yes. We have international partners who are very interested in answering questions like, “out of the users of this VPN service how many signed signed this petition on Facebook? Or watched this politically charged video on Youtube? Or, based on geolocation data, attended this rally?
CL: Very interesting. If big tech can do it, why not everyone else?
CL: So, you know, I’m pretty comfortable with post-moral markets Randall. I have to ask though, since some of our readers are probably wondering — is this legal?
RS: Chad, look. I’m a patriot. I believe in the rule of law. I believe in America. And Lago Largo will comply with all applicable federal, state, and local rules, regulations, and laws.
CL: Right, but with PII —
RS: I’m a patriot Chad, and Lago Largo will comply with all applicable federal, state, and local rules, regulations, and laws.
CL: That settles that!
RS: Look Chad. It’s easy to to get distracted by all of the things that Lago Largo could do. We need to focus on what we are doing. I’ll leave you with this example.
Chase Mitchell is a 16 year old in Lansing, Michigan. His house suffered several package thefts in less than three weeks.
RS: But Chase didn’t take it lying down Chad. Chase went on Github. Chase went on Stack Overflow. Chase learned about machine vision. And Chase built his own doorbell camera. It used facial recognition and racial recognition to identify potential threats to his property while they were still on the sidewalk. It sent an SMS alerts to neighbors when individuals who met targeting criteria were close to the house. Without our marketplace, he wouldn’t have had the data he needed to build that tool.
CL: Wow, what an inspirational story. Did the tool work?
RS: So it actually resulted in somewhat of a confrontation, between a postman who met the targeting criteria and some neighbors, but look — we surface wrinkles like that more quickly when we put this technology in the hands of a wider group of users. It’s better to iron out those wrinkles on the block level, than the city level, or the state level, or the federal level. That’s federalism.
CL: That’s true. Democracy is supposed to be a marketplace of ideas, right?
RS: Right. And that ideas marketplace needs an open data marketplace.
CL: Well Randall, thank you. This was a great conversation. I look forward to hearing what happens next with Lago Largo.
RS: Of course! I look forward to querying Lago Largo for what happens next with Special Situations.
Thanks for reading. As always, please sign up for our email list if you enjoyed the interview. If you don’t want another newsletter, follow me on Twitter @thechadlin to get details on future interviews.