Publishers need to stop using insecure HTTP

The https-checker project hopes to open a dialogue with some of the largest scholarly publishers.
244 readers like this.
Publishers need to stop using insecure HTTP

Thomas Hawk via Flickr. Modified by Jen Wike Huger. CC BY-NC 2.0

Academic publishers play a major role in the dissemination of scholarly information. As a society, we need to be able to rely on these publishers to provide information securely, accurately, and with content integrity. We also want to ensure that our personal information (e.g., a site password) is secure, and scholarly publishers have a responsibility to the community to protect our data.

I've been surprised how often scholarly publishers' pages are published as HTTP, which (unlike HTTPS) doesn't encrypt data in transit. Implementing HTTPS has become much easier with initiatives such as Let's Encrypt and Certbot (but I recognize legacy systems can make it more difficult).

As a scholar, I am concerned with content integrity. This is essential when conducting systematic reviews, meta-analyses, or simply reading research and planning new studies. I am also concerned about the security of my and my colleagues' login credentials. Given how often passwords are reused, HTTP-based published pages threaten the security credentials of people visiting scholarly publishers' websites.

In order to hold the disseminators of scholarly information accountable, we need to be able to recognize whether this is a widespread issue and where improvements can be made. For example, Science magazine, one of the most acclaimed journals, apparently considers HTTP good enough and makes no statement about why it has not upgraded. Many other publishers are forgoing the same responsibility towards their users.

Publishers that take a negligent or dismissive position to the situation belittle the security of users and their role in accurate content presentation. In the long run, it will hurt the publishers too: Chrome is starting to label pages as not secure if they use HTTP. Given that users have no choice but to use these sites if the articles are copyrighted, and there is no other way to share the materials, publishers have a significant responsibility to the extended scholarly community.

The https-checker project

The https-checker project aims to address this problem by checking the websites of publishers indexed by CrossRef, the main metadata store for scholarly publications, to get a sense of the overall scope. There are approximately 10,000 members (i.e., publishers) in CrossRef, of which ~7,500 are actively publishing (meaning they published in 2017).

This project began by canvassing the scholarly publishers' landscapes for those that use (and don't use) HTTPS. By identifying publishers that publish the largest body of work in an unsecure way, we can start a dialogue with them to improve the situation. Previously, I had a constructive dialogue with Collabra, which upgraded its webpage to default to HTTPS after I contacted them.

Https-checker just completed its initial data collection phase. By using pshtt, an open source HTTPS testing tool, and a set of calls to the CrossRef API, it was relatively easy to script an initial canvas of the publishers' security practices.

Active, default HTTPS Active, not default HTTPS Inactive
1,923 5,575 2,513

At first glance, 26% of all 7,498 active publishers default to HTTPS. In general, estimates of websites that default to HTTPS range between 10% and 44%. In other words, scholarly publishers seem to be securing their web pages at similar rates when compared to the overall population of websites. Even so, this does not waive their responsibility to improve the situation.

Running a basic logistic regression to try to predict whether publishers' default their pages to HTTPS shows that large publishers are more likely to do so. Publishers' publications range from one to 1,104,607. Our analysis shows a publisher with only 100 publications since 2017 is estimated to have a 27% chance of using HTTPS by default, whereas one with 1,000 publications since 2017 is estimated to have a 32% chance of using HTTPS by default. Given the average number of publications, the estimated probability of a publisher using HTTPS by default is 31% (median: 25%).

Moving forward

The https-checker project's next step is opening a dialogue with some of the largest publishers that do not provide HTTPS by default. These conversations will be tracked to get a sense of how active and willing they are to improve the security of their users and the content they serve.

The HTTPS scan can go much deeper than just checking whether a page defaults to HTTPS and uncover other practices that can further improve security. For example, preloading an HTTP Strict Transport Security (HSTS) header can help mitigate man-in-the-middle attacks. By using in-depth assessments, we can identify ways websites that already default to HTTPS can further improve their security.

Increased use of secure practices in content transfer on web pages is key to a secure web and truthful information, which ultimately affects users that rely on information published on the internet. Given that misinformation is spreading, it seems like this is low-hanging fruit for 2018.

Tags
Chris Hartgerink's face
I am a Mozilla Science Fellow working to increase the openness of science and the Web.

2 Comments

My immediate question is "Why?"
HTTPS isn't a magic solution, it won't prevent your service provider from being hacked, and if a hacker gets their hand on the server's data, HTTPS won't prevent them from abusing it.

"As a scholar, I am concerned with content integrity."

HTTPS won't help you with that.

"Given how often passwords are reused, HTTP-based published pages threaten the security credentials of people visiting scholarly publishers' websites."

Again, if someone compromises a server where your credentials are stored unsecurely, HTTPS won't help you there. And reusing your password is a recipe for failure. Don't demand others to save you from your own faulty practices.

"Publishers that take a negligent or dismissive position to the situation belittle the security of users and their role in accurate content presentation"

That's true regardless of HTTPS. Twitter uses HTTPS. Didn't help them much protecting their millions of users' password, did it?

HTTPS isn't a magic formula for instant security. Running an HTTPS witchhunt will only result in creating a false sense of security, and also mis-represent sites that have failed to comply but have an overall excellent security models.

I'm not in favor of HTTPS-all-the-things but I believe you have some misconceptions there.

"HTTPS isn't a magic solution, it won't prevent your service provider from being hacked, and if a hacker gets their hand on the server's data, HTTPS won't prevent them from abusing it."

True, but it's another layer of defense, which is what security is all about. You shouldn't enable HTTPS and call it a day, you do it as part of your overall security alongside with hashing passwords, setting secure cookies, etc.

"HTTPS won't help you with [content integrity]"

Yes it will! It will help prevent man-in-the-middle-attacks.
You can have the most secure website on earth, but if you transmit information in plain text anyone can see it and possibly modify it in transit.

The rest is more of the same, you seem to think that HTTPS is being proposed as a solution, when it's only part of the solution, and it's actually a very important part that shouldn't be neglected.

In reply to by Erez Schatz

Creative Commons LicenseThis work is licensed under a CC0 1.0 Universal (CC0 1.0) Public Domain Dedication license