WSJ Data Transparency Code-a-Thon

Problem statement:

Alessandro Acquisti discussed the tension between the two frames users have of the internet: the internet is “free” versus the frame that it is not free at all--users "pay" by subjecting their behavior to ubiquitous, redundant, and robust tracking and advertising.

The “free” internet is an example of “deceptive framing,” the presentation of an “incomplete and biased representation of a decision problem that misleads their perception and analysis of the problem, and thereby misleads their entire decision-making process.” Deceptive framing enables one to, “present a narrow way of thinking that focuses on only one or a few aspects of a more complex decision problem…” [1]

To help shift this frame, we developed a simple counter for the browser that calculates the value of ad impressions that the user experiences. In Acquisti's terms, we wish to change the "frame" from "free" to one where the user better understands the value exchanges behind their use of the web.

Our approach in identifying cost-per-impressions (CPM) is based upon the website visited, the type of advertisement, the availability of demographic information on the user, and behavioral data.

What is significant:

Ultimately, as users understand how much value is generated through their browsing, we hope that this could lead to a choice where individuals could opt to just pay sites instead of having an ad-supported experience.


Our method for estimating CPM considers (in order) the following factors:

1) We start with the relevance and popularity of the website (pubTier): Different websites can command different CPMs. We have thus divided the top 10,000 websites into three tiers. Tier 1 sites can command the highest CPMs (between $7 and $15) because of their relevance and professionally-developed, high quality content. Generally speaking, Tier 1 sites are the most popular sites on the internet (roughly correlating with the Quantcast top 50). We assign Tier 1 websites a base CPM of $11.50 (expressed in the code as 0.0115). Tier 2 are high quality sites that sell their ad inventory via a combination of direct sold + ad networks, and thus receive lower, blended CPMs (between $2 and $7). Tier 2 roughly correlates to the Quantcast 51-250. We assign Tier 2 sites a $4.50 base CPM. Tier 3 sites are either less popular or populated with user-generated content. These sites have remnant advertising, commanding the lowest CPMs (between $0.20 and $2). We assign Tier 3 sites a $0.40 base CPM. [ COMPLETED ]

2) Whether the website is likely to have Demographic information about the user (DomainInferences). The more the website “knows” about the user, the more it can charge for targeted advertising. Thus, having information about the user’s Age (adds $0.50 to the CPM); Gender ($0.50); House-Hold Income (HHI) ($0.50); [ COMPLETED ]

3) Whether the website is likely to know whether you’re “In-Market” / ready-to-purchase an item ($2.50). [ COMPLETED ]

4) Whether the website knows the browser’s “interest”, such as golf or snowboarding ($1.00) based on prior browsing behavior (visiting or [ TBD]

5) Whether the user sees high-quality, pre-roll video advertising (adType). The CPMs for this advertising is multiplied by 2.

These variables can be adjusted by the user by editing the priceoffree.js file. One can also substitute our methods entirely by adjusting adRevenueModel.

Who benefits:

Internet users who are trying to understand the value of their attention and time on the internet.

What is your sustainability model:

We think that a number of current browser plugins would welcome The Price of Free as a feature. Furthermore, the user can customize the methods (see below) used and change the assumptions of the revenue models analyzed.

What license is it available under:

We are using the same license as PrivacyBucket, as much of our project is based upon their infrastructure.


Contact person

Chris Hoofnagle (

Your team:

Chris Petro
Michael Pisula (Xaxis)
Andrew Sudsbury (Abine)
Chris Conley (ACLU of Northern California)
Chris Hoofnagle (UC Berkeley Law)

Whom is it for:

People who use the internet

Your goal for this weekend:

Something that goes "cha-ching" when the counter adds up to a $1 in ad revenue :)

Your starting point:

We think many groups here have aspects of their project that could be integrated with this feature.

Limitations of our methods:

There are important limitations to our approach. We can only approximate the value flows from users’ interaction with websites, and here focus on the narrow issue of CPM. Of course, different websites have different revenue models, or a mix of revenue models including cost-per-click (CPC) and cost-per-acquisition (CPA). Some websites have house advertising, or no advertising at all. Others sell the entire site or the homepage to a single advertiser for a premium. Our model cannot address these variations.

A number of factors make it very difficult to determine the full value of these interactions. For instance, on a basic level, the value of user information is entirely in the control of the businesses who capture it from the consumer.[2] A website can decide to employ user data only for its own purposes or sell it to third parties.

Websites and their associates also control the security of the data—they can choose to overinvest or underinvest in protections. If stolen, the data can taken on an entirely different value, and can harm the consumer directly.

Furthermore, businesses sometimes buy data about their own users, a practice known as “appending” or “enhancing” data. Once a consumer provides any information—even just name and zip code—the merchant can buy other data about the consumer that she chose not to reveal.[3] Thus, a merchant can add value to information provide, out of the view of the consumer, to the detriment of the consumer’s privacy.

[1] David M. Boush, Marian Friestad & Peter Wright, Deception in the Marketplace: The Psychology of Deceptive Persuasion and Consumer Self-Protection 62-64 (Routledge, New York 2009).
[2] Oscar H. Gandy Jr., Coming to Terms with the Panoptic Sort (1996) in D.
132–155). Univ. of Minnesota Press (“It is not the use to which the data subject might put the information that generates meaningful consequences, but the use to which it might be put by others.”)
[3] See, e.g., Pineda v. Williams-Sonoma Stores, Inc., 246 P.3d 612, 615 (Cal. 2011) (“Plaintiff visited one of defendant's California stores and selected an item for purchase. She then went to the cashier to pay for the item with her credit card. The cashier asked plaintiff for her ZIP code and, believing she was required to provide the requested information to complete the transaction, plaintiff provided it. The cashier entered plaintiff's ZIP code into the electronic cash register and then completed the transaction. At the end of the transaction, defendant had plaintiff's credit card number, name, and ZIP code recorded in its database. Defendant subsequently used customized computer software to perform reverse searches from databases that contain millions of names, e-mail addresses, telephone numbers, and street addresses, and that are indexed in a manner resembling a reverse telephone book. The software matched plaintiff's name and ZIP code with plaintiff's previously undisclosed address, giving defendant the information, which it now maintains in its own database. Defendant uses its database to market products to customers and may also sell the information it has compiled to other businesses.”).

0 Favorites




We've joined the Mashery family. Read the announcement.