Cyber Security

Decoding Web3: Navigating the Crucial Role of Infrastructure - A Case Study

Uncover the Billion Dollar Web3 Exploit: Learn how overlooking Web2 security in blockchain infrastructure led to massive losses. Explore the case study with insights on vulnerabilities, server practices, and preventive measures.

Albert Heinle
Written by
Albert Heinle
“Blockchain security is usually associated with smart contracts or cryptography attacks. In most cases, people do not consider the security of the actual servers of the network as part of blockchain security. In fact, even bug bounty programs tend to ignore a major component of the network — the validators.” - Ernst, E. (2023, November 21). The Billion Dollar Exploit: Collecting Validators Private Keys via Web2 Attacks 

Web3 = Web2 plus smart contracts. At CoGuard, we have a long standing belief Web3 has a reliance on Web2 cloud services that allows blockchain nodes to operate without creating or investing in their own storage and computing infrastructure. The nodes can be spun up and operate at a low cost on a as needed basis. And if you’re running on web3 workloads on a web2 infrastructure, but you forget about securing the Web2 portion, you are leaving your systems vulnerable. 

There are numerous examples of providers, DApps and companies on where breaches and thefts are the result of security failures in the underlying servers and infrastructure. But to drive it home, the researchers were able to:

“we discovered and exploited during our research allowed us to gain full control, run code and extract private keys of hundreds of validators on multiple major networks, potentially leading to direct losses equivalent to over one billion dollars in cryptocurrencies such as ETH, BNB, SUI, APT and many others”

What are we going to do about it?

The authors described the different ways they identified vulnerabilities and exploited the weaknesses discovered on the servers. It reads like a case-study on how to not do infrastructure. And this is how we are going to treat it. 

Start by reading the article

We are going to go through the steps taken by the researchers and analyze how the exploit could have been avoided by following proper infrastructure practices. 

Part 1: A Server and a Port

“InfStones’ website at the time claimed that their validators are 100% secure”

Infstones proves that a 100% secure systems guarantee is almost impossible. These claims are often an example of over-promising in marketing materials, and under-delivering in reality. Good security comes from understanding the known risks and being prepared for the unknown risks. The unknown risks come from uncertainty introduced by changing software, fallible humans and bad luck. It makes it near impossible to guarantee 100% security. 

Started from the Bottom

In the article, it all started with a server listed that had an open port to the public, namely 55555. So far nothing unusual, except that normally, people would put it behind a load-balancer of some kind.

It appears that the team was able to directly access it without authentication, and that the service, tailon, serving them was running as root.

Generally, any service that you use, should be ensured to be secure, especially when you expose it to the public. A simple CoGuard scan on the Tailon repository discovers that the user in the Dockerfile is root (see our previous article on the exploitability of that), and other mishaps like not setting versions of specific installed packages for better security tracking.

That scan should have led the developers to create a custom Docker image of tailon, where these shortcomings are fixed.

During that investigation, the principle of “nothing accessible from the internet should have no authentication enabled”. A simple read of the manual of tailon reveals:

“By default, tailon is accessible to anyone who knows the server address and port. Basic and digest authentication are under development.”

This should make the infrastructure engineers put an API layer in front of it, if not directly disregard it as an option for a production system. In the end, the functionality it provides is rather simple.

Warning: Using Linux commands directly from an API call and providing the input parameters is calling for a remote code execution; despite the Linux philosophy of “make each program do one thing well”, most basic commands allow for execution, like sed in this case. find is another example.

Let us assume that the engineers have not given up on tailon for some reason yet, and went the API route after fixing all the Docker issues first. Say, they closed the server up from the outside world and made the access only possible via Amazon’s API gateway (there are many other approaches, of course, but let’s stick with that one as an example). While setting up the gateway, they are scanning their settings again (using e.g. CoGuard), and it would recommend, among other things, two items for them:

  1. terraform_aws_api_gateway_authorization_method_not_none rule in CoGuard would ensure that they have authentication enabled.
  2. terrafrom_aws_api_gateway_enable_waf rule in CoGuard would ensure that they install a Web application firewall in front of it, which in turn causes them to specify the exact types of requests that they would let through (WAF features list).

In addition, CoGuard would also flag that they may not have a CI/CD pipeline installed in their repository (cluster_no_ci_cd_tool_used). This ensures automation, and the expectation is that scanners are activated and change and deploy management is handled in there.

All in all, this would ensure that:

  • the service is not accessible via the internet directly, 
  • authentication is enabled, and 
  • that requests are filtered to only let ones through that are intended.

And this is not magic. This is (or should be) normal Web2 infrastructure knowledge.

Part 2: The Proxy

When the authors were looking at the other servers that have 55555 open, they noticed that most of them require Basic Auth to access them.

Usually, when we hear basic auth, we think: Shoot, likely no credentials rotation. It appears that in this case, it was even worse: infstones:infstones was the credential. Basically a fancier admin:admin.

This also is an indicator that they are not using something proven such as api gateway, but rather a custom made solution.

First, they did not have auth enabled on all servers. This means that they are likely not deploying using an automated method, but rather something manual. The check in CoGuard cluster_no_ci_cd_tool_used would have indicated that.

The fact that they had an endpoint, which allowed for any NODE_IP to be entered there without checks is something that no checking tool out there could have protected them from. This is just poor design. There should be a whitelist of some kind that checks the input, as you could configure when using API Gateway. One would then not even be allowed to use basic authentication at all, and would be forced into proper key creation and rotation, if the settings are scanned.

Part 3: The Additional Fun

According to the DWalletLabs team, they found AWS credentials files on the servers, used to access S3 buckets.

This is an architectural flaw, but the access to those credentials would be prevented if the default user would have been not root.

The code samples are something that the developers at infstones have to be held accountable for, and nothing automated (to date) would catch those mistakes. However, if they are already accustomed to seeing some rules for other services, they would know that:

  • Allowing localhost to bypass auth is never a good idea (example CoGuard rule for Postgres: postgres_do_not_allow_trust_auth.)
  • Never have a blank connection allowed, but rather whitelist a certain set of IPs (example rules: terraform_gcp_compute_firewall_source_ranges).

But then, again, even here, they should have put the API behind a load-balancer, so that they could use the benefits of the AWS WAF.

Parting Comments

Did you notice that the exploits did not mention a Common Vulnerabilities and Exposures (CVE) even once? 

Yes, someone should file a CVE for tailon based on the fact that a code execution is possible. But since it is such a small project, no one bothered. But this is a sign that good security practices should not blindly rely on CVE reports in isolation.

The issues that DWalletLabs team exploited could have been made harder (or even almost impossible) to exploit if the infrastructure items had the proper limits (as described in this article) implemented.

Reviewing your infrastructure is as important as reviewing your code. Reviewing the infrastructure, deployment processes, and configurations needs to be part of an improved security process. At CoGuard, we believe that this begins by building an inventory of infrastructure and automating the scans for misconfigurations and security best practices. Scan your IaC code and cloud infrastructure in 2 quick commands:

pip install coguard-cli
coguard folder ./

It is a challenge to be 100% secure, but the goal is to do your best in an automated process at each of the layers. This can help reduce your attack surface. And we’re happy to chat with you about what CoGuard tools find. 

Infstones was lucky to have white-hat hackers on their system. This could have ended much worse.

Explore a test environment

Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.

Check out and explore a test environment to run infra audits on sample repositories of web applications and view select reports on CoGuard's interative dashboard today.