Writing C for Curl

daniel.haxx.se

124 points by TangerineDream 3 months ago

kpcyrd 3 months ago

> We count about 40% of our security vulnerabilities to date to have been the direct result of us using C instead of a memory-safe language alternative. This is however a much lower number than the 60-70% that are commonly repeated, originating from a few big companies and projects.

There has been discussion in an Arch Linux internal channel about the accuracy of these classifications. We noticed many advisories contain a "This bug is not considered a C mistake. It is not likely to have been avoided had we not been using C."-disclaimer, but was unclear what the agenda was and how "C mistake" is defined.

It was brought up because this disclaimer was also present in the CVE-2025-0665 advisory[0], which is essentially a double-free but on file descriptor level. The impact is extremely low (it's more "libcurl causing unsoundness in your process rather than can-be-exploited-into-RCE"), but it's a direct result of how C manages resources. This kind of bug can also occur in Python, but you're unlikely to find this kind of bug in Rust.

Could this bug have occurred with a programming language that isn't C? Yes. Could this bug have been avoided by using a programming language that isn't C? Also yes.

[0]: https://curl.se/docs/CVE-2025-0665.html

uecker 3 months ago

The question is: Could such bugs be avoided in C using the right tools and strategies. And the answer is also often: yes.
This is why a large component of the argument for switching to other languages is usually that is impossible to avoid such bugs in C even for experts. But I think this argument, while having some small amount of truth to it, also is partially deceptive. One can not simply look at number of CVEs and conclude this, one needs to compare apples to apples and then I find the reality looks differently, e.g. if a simple mitigation for the bug in C could not be used for some reason, but this reason would also prevent the use of another language in the first place, then using this as argument is misleading.
- pornel 3 months ago
  
  > Could such bugs be avoided in C using the right tools and strategies
  "right tools and strategies" is very open-ended, almost tautological — if you didn't catch the bug, then obviously you haven't used the right tools and the right strategies! In reality, the tools and strategies have flaws and limitations that turn such problem into "yes but actually no".
  Static analysis of C code has fundamental limits, so there are bugs it can't find, and there are non-trivial bugs that it can't find without also finding false positives. False positives make developers needlessly tweak code that was correct, and leads to fatigue that makes them downplay and ignore the reports. The more reliable tools require catching problems at run-time, but problems like double-free often happen only in rare code paths that are hard to test for, and fuzzers can't reach all code either.
  - uecker 3 months ago
    
    Static analysis of arbitrary legacy code is limited. But I do not find it difficult to structure my code in a way that I can reasonable exclude most errors. The discussion of false positives in C is interesting. In some sense, 99% of what the Rust compiler would complain about would be considered false positives in C. So if you want to have safety in C, you can not approach this from this angle. But this relates to my point. If it is acceptable to structure the code in specific ways to make the Rust compiler happy, but you do not accept that you may have to write code in specific ways to avoid false positives in C, then you are already not comparing apples to apples.
    
    pornel 3 months ago
    
    Even if you rewrite C code to the same "shape" as Rust, it won't become equally easy to statically analyze. The C language doesn't give the same guarantees, so it doesn't benefit from the same restrictions. For example, pointers don't guarantee their data is always initialized, `const` doesn't make the data behind it truly immutable, and there's no Send/Sync to describe thread safety. You have nothing to express that a piece of memory has a single owner. C's type system also loses information whenever you need to use void* instead of generics, unions with a DIY tag, and have pointers with a mixed provenance/ownership.
    Rust checks that you adhere to the analyzable structure, and keeps you from accidentally breaking it at every step. In C you don't get any compiler help for that. I'm not aware of any tools that guide such structure for C beyond local tweaks. It'd be theoretically possible with enough non-standard C language extensions (add borrowing, exclusive ownership with move semantics), but that would get pretty close to rewriting the code in Rust, except using a bolted-on syntax, a dumb compiler that doesn't understand it, and none of the benefits of the rest of Rust's language, tooling, and ecosystem.
    
    uecker 3 months ago
    
    You do not get the same automatic guarantees when not using external tools. But you get almost the same results when using tools readily available and having a good strategy for organizing. I do not have problems with double frees, void type unsafety, or tagged unions in my code. I occasionally have memory leaks, which tooling then tends to find. I certainly have exploitable integer overflows, but those are easily and comprehensively mitigated by UBsan.
    
    GTP 3 months ago
    
    > If it is acceptable to structure the code in specific ways to make the Rust compiler happy
    I think this is a misleading way of presenting the issue. In Rust, there are class of bugs that no valid Rust code can have (unless maybe when using the "unsafe" keyword), while there is valid C code that has said bugs. And here's the difference: in Rust the compiler prevents some mistakes for you, while in C it is you that have to exercise discipline to make sure every single time you structure the code in such a way to make said bugs unlikely. From this, it follows that Rust code will have less (memory-related) bugs.
    It is not an apples to apples comparison because Rust is designed to be a memory safe fruit, while C isn't.
    
    uecker 3 months ago
    
    The point was that you can not reject warnings as part of a solution for C because "it annoys programmers because it has false positives" while accepting Rust's borrow checker as a solution.
    
    GTP 3 months ago
    
    I think the general point of my comment still stands, as you and everyone else working on the project would need to be disciplined and only release binaries that were compiled with no warnings. And even using -Werror doesn't fully solve the problem in C, as not having warnings/errors is still not enough to get memory safety.
    
    uecker 3 months ago
    
    I don't really disagree with you, but I was also making a slightly different point. If you need absolute memory safety, you can use Rust (without ever using unsafe)
    But you get 99% this way with C, a bit of discipline, and tooling and this also means maintaining a super clean code base with many warnings activated even though they cause false positives. My point is that these false positives are not a valid argument why this strategy does not work or is inferior.
    Your claim is that it is inferior because you only get 99% of safety and not 100%. But one can question whether you actually get 100% in Rust in practice in a project of relevant size and complexity, due to FFI and unsafe. One can also question whether 100% memory safety is all that important when there also many other issues to look out for, but this is a different argument.
- kpcyrd 3 months ago
  
  The blogpost claims they are already running "all the tools", can you please be more specific which one they are missing? Maybe the tool to avoid this kind of lifetime issue just happens to be rustc?
  - uecker 3 months ago
    
    Rust is one tool which can be used to avoid life time issues. It is not the only tool. It also only works perfectly only when you exclusively limit yourself to using safe Rust and not use C libraries, unsafe Rust, or not directly use APIs that use integers (I assume in this example, Rust may have special safe wrappers, but in general the language also does not prevent this error). Resource management is something model checkers could verify in C. One could also design a safe API around it in C. Possibly GCC's analyzer could find such issues. In any case, the question is how much effort one wants to invest or not and what tradeoffs the solutions have. A small risk of missing such things may also be an entirely reasonable choice, even so Rust proponents irrationally claim otherwise. For example, curl uses C89 which is certainly not the best choice for safety. It is the best choice for portability to obscure platforms, but this requirement would also rule out Rust.
    
    kpcyrd 3 months ago
    
    > It is not the only tool
    The burden of proof is on you. I asked which tool specifically would have detected CVE-2025-0665 and I don't mean to be mean, but your reply is essentially a very confident "I don't know but I'm sure somebody could build one", while also handwaving the security benefits of programming in Rust.
    When building a http client library, unsafe Rust and C FFI are not really a problem I'm having.
    
    uecker 3 months ago
    
    I already mentioned model checkers one option. There are also other memory safe languages besides Rust. Whether an existing tool finds a bug in an existing C code is a different question though. I would have expected a double close to be found by GCC's analyzer, but I haven't tried for this particular bug (in simple cases it certainly find s it: https://godbolt.org/z/rzr9zT619). In general, it these tools find bugs more easily in well written C code, but do not help much when code is convoluted. But not writing convoluted code is more important than the choice of programming language anyway.
- pjmlp 3 months ago
  
  It would help if since lint was created in 1979, the large majority of C developers actually used the right tools and strategies.
  In practice only when forced down MISRA like processes people seem to care, versus how relevant secure programming is seen in other programming language communities since 1960's.
  Secure programming was part of Burroughs and Multics design, so why is the answer from a systems language community designed a decade later, and after 40+ years since the Morris worm, "we don't use right tools and strategies over here"?
  - uecker 3 months ago
    
    I think this is probably the reason why Rust is attractive to companies. They can say, "use Rust and never use unsafe without sign-off from some senior programmer" and then can be relatively sure there is no buffer overflow. In C you would say, you need to follow these guidelines with tools X, Y, Z and then you can be relatively sure there is no buffer overflow. The problem is that this is a striking argument only when the only thing you care about are memory safety issues. As soon as you care about other forms of security / correctness, you need "guidelines with tools X, Y, Z" anyway. And when you have mixed code base, you also need this anyway.
im3w1l 3 months ago

For all it's many faults, even C++ fstreams is not vulnerable to double freeing (and as a partial reply to @Galanwe, they way they avoid issues is runtime checking).
- kllrnohj 3 months ago
  
  In C++ you can also make the FD-equivalent of std::unique_ptr, like Android does with unique_fd: https://cs.android.com/android/platform/superproject/main/+/...
  It doesn't guarantee the issue never happens, like Rust would, but it does make it dramatically less likely to occur.
  Also I think in general people vastly under-appreciate how severe an issue EBADF actually is. Outside of extremely specific, single-thread-only scenarios, that error is essentially the kernel telling you that heap corruption occurred, but almost nobody treats it with that level of severity
- pjmlp 3 months ago
  
  Already in early 1990s, with C++ARM as the first standard, there were plenty of advantages using C++ instead of plain C.
  RAII, streams instead of stdio patters, compilers had collection classes for common types (string, array, ...) with bounds checking configuration,....
Galanwe 3 months ago

> It was brought up because this disclaimer was also present in the CVE-2025-0665 advisory[0], which is essentially a double-free but on file descriptor level
I don't see how Rust would have prevented calling close() two times with the same eventfd.
- Munksgaard 3 months ago
  
  The same way Rust prevents calling close() two times on a file.
- technion 3 months ago
  
  The standard for rust is that close() gets called automatically when the file descriptor goes out of scope. I believe you could choose to do it manually but that's unusual coding.
  - Munksgaard 3 months ago
    
    There isn't actually any close() function in the std::fs::File: https://doc.rust-lang.org/std/fs/struct.File.html
    
    ibotty 3 months ago
    
    And if there were one (`drop` kind-of is that function though), it will consume the file handle so you cannot call it again.
coliveira 3 months ago

The problem is not as much C, but coding practices that make it seem like we're still in the 1970s. Codebases like curl use C at a very low level. But C has functions, has structures, has a lot of functionality to allow you to write at a higher level, instead of chasing pointers at each while statement. Code that handles pointers could be abstracted in the same way people will have to do in other languages.
- pjmlp 3 months ago
  
  Bell Labs 1970s, I advise learning about what already existed elsewhere in systems programming languages.
- sgarland 3 months ago
  
  Pointers are not a difficult concept.
- zxilly 3 months ago
  
  What about performance overheads? I think the reason a lot of people write C is that they have direct control over the generated assembly to maximise performance.
  - aaronmdjones 3 months ago
    
    Even direct control over the generated assembly (C does not provide this; only writing assembler does, and only if you don't do things like LTO) is not sufficient.
    Modern CPUs do all sorts of weird things. Assembly instructions can be executed out of order. Your conditional jump instruction can be speculatively executed before the condition's truth is known. Fetches from main memory can be reordered.
    Even more wildly, copying the contents of one register to another is often a no-op. Yes, that's right; the following code:
    mov edx, eax
    ... does next to nothing on some modern CPUs. All it does sometimes is set an internal note to the effect of "later references to edx should read/write eax instead", until that note is cleared by some other operation.
    You can write your assembler with the best of intentions as to how it should behave, only to discover that the CPU does things entirely differently. You still end up getting the same observable result out of it, but any timing and order of operation guarantees went out the window decades ago.
  - pjc50 3 months ago
    
    > they have direct control over the generated assembly
    They do not. As seen by all the "optimizer has done weird stuff" bugs.

sebstefan 3 months ago

The guidelines feel out of sync with the directions I've seen people push coding styles over the years

"Identifiers should be short" when I've mostly seen people decry how annoying it is to find yourself in a codebase where everything is abbreviated C-style (htons, strstr, printf, wchar_t, _wfopen, fgetws, wcslen)

There's a case for more verbosity and if you look at modern Curl code it reflects that as well, new identifiers aren't short

https://github.com/curl/curl/blob/master/lib/vquic/vquic.c

"Functions should be short" where I've mostly seen very negative feedback on codebases written following the trend of Uncle Bob's short functions. Complaints that hiding code in 10 levels of function calls isn't helpful, and that following rabbit holes is tedious even with modern editors

"Code should be narrow", "we enforce a strict 80 column maximum line length" I don't think I've seen that take lately. I remember seeing a few posts fly by about the number 80 specifically

You want to prevent dragging your eyes. For my IDE on default settings on a 1080p monitor, half of a 15" screen fits 100 characters

If you take away 20 columns to fit your text on less of the screen do you really get any benefits

What about the cascading effects on the code, like worse names, split lines, ...

In the end it's semi-interesting but we're all building sheds and these are mostly debates on what color the shed should be

tuetuopay 3 months ago

Everything is a balance. IMHO, the "identifiers should be short" "functions should be short" and such are knee-jerk reactions to overly long things that are common in some other languages (looking at you, Java). Like the practice of indicating the type, pointer, etc. Stuff ike `pWcharInputBuffer` and such.
There is a balance between `*p` and `inputPointerToMiddleOfBufferThatFrobnicates`.
- dahauns 3 months ago
  
  >Everything is a balance.
  Very true, or as I like to put it: everything is a tradeoff.
  Over decades of programming, I'm fairly certain my preferences for things like function/identifier length could be plotted along a damped oscillation curve. :)
dfox 3 months ago

> https://github.com/curl/curl/blob/master/lib/vquic/vquic.c
When somebody says "short identifiers" in relation to C, this is exactly the style meant by that, not the cryptic style of C standard library.
creatonez 3 months ago

> "Code should be narrow", "we enforce a strict 80 column maximum line length" I don't think I've seen that take lately. I remember seeing a few posts fly by about the number 80 specifically
To be fair, this is more doable in C than most other languages. No namespacing, no generics, etc. means you're not using as many columns.
I'm still not convinced, though. It's a crunch. Would rather just set a 120 to 160 column limit and make identifiers as descriptive as they should be. And I'd use prefix namespacing all over the place anyways -- fuzzy autocomplete can make it convenient.
GuB-42 3 months ago

For short identifiers, I think you missed an important detail.
> Also related: (in particular local) identifiers and names should be short.
The general idea is that the more distant the identifier is, the more descriptive is should be. Because you don't have as much context, and it is also a hint: if you see a long, descriptive name, it is more likely to be global.
And descriptive doesn't mean long. You still need to try making your descriptive names as short as possible. For example "timeSinceMidnightInSeconds" can be shortened to "secondsSinceMidnight" without loss of information: seconds are a unit of time, no need to repeat it.
stzsch 3 months ago

Seems similar to the linux kernel coding style: https://www.kernel.org/doc/html/latest/process/coding-style....
Almondsetat 3 months ago

There are 2 hard problems in computer science: cache invalidation, naming things, and off-by-1 errors.
I don't thing good names will be solved soon

timhh 3 months ago

When I counted I got about 55% which is pretty close to the standard 2/3.

https://blog.timhutt.co.uk/curl-vulnerabilities-rust/

johnisgood 3 months ago

> Code should be easy to read. It should be clear. No hiding code under clever constructs, fancy macros or overloading.

I highly agree with this. I do not always want highly abstracted code, and some programming languages aiming to replace C are much more difficult to read, that said, Rust is supposed to replace C++, not C, right?

Thank you for the article!

Zambyte 3 months ago

I have been playing around a lot with Zig lately, and though it's still in beta, it really feels like it has the best chance at being a true C successor. While Rust feels like they started with C++ and worked on making it harder to write incorrectly, Zig feels like they started with C and worked on making it easier to write correctly.
They also have a few pillars that they call the "Zen" of Zig[0], of which three out of the first five are directly related to readability.
[0] https://ziglang.org/documentation/0.14.0/#Zen
- Arch-TK 3 months ago
  
  I used to think rust was like C++ but harder to write badly but I don't think it's anything like C++ now that I've spent a couple of years writing it.
  Rust is its own thing, it has none of the extensive baggage of C++, and doesn't appear to be set to reach that level of baggage at any point in time soon if ever. It's a much cleaner, clearer, and easier to reason about programming language.
  - Zambyte 3 months ago
    
    The lack of baggage is definitely a huge improvement, but I believe Zig also has this advantage over C. Rust and C++ also seem to encourage a similar style of programming (particularly newer C++), and so do Zig and C. The former encourages creating a library of types with inheritance, using syntactic sugar where applicable (ie operator overloading), and a functional style, whereas the latter limits the programmer to simple compound types without inheritance (manual, explicit dispatching), obvious syntax, and an imperative style.
    Both seem to have their place in the ecosystem to me, but I'm really excited to see Zig mature.
    
    shepmaster 3 months ago
    
    > [Rust and C++] encourages creating a library of types with inheritance
    You'll want to expand on and clarify your meaning here, as Rust does not have inheritance.
    
    Zambyte 3 months ago
    
    Traits encourage the same programming style, regardless of if it's technically inheritance or not.
    
    Arch-TK 3 months ago
    
    I disagree completely.
    It's not just a technicality, traits are not the same as multiple inheritance. They allow you to solve the same kinds of problems but traits don't introduce the mountain-loads of confusion, complexity and headache that multiple inheritance does.
    Moreover, nothing forces you to use traits. You're just strictly better off using them in many cases. But you can always defer to writing overtly repetitive C style code if you want.
    
    pjmlp 3 months ago
    
    Yet, I have zero issues mapping COM approach to OOP into Rust, because it turns out as per CS type systems theory, there are many ways to achieve the same goal.
    Practical example, Raytracing in one weekend port from C++ to Rust, keeping the overall architecture.
    Second example, Microsoft bindings to WinRT for Rust.
    
    Arch-TK 3 months ago
    
    Yes, I'd argue COM is specifically very limited in such a way that traits can be used to implement it mostly seamlessly. But regardless, the dispute is over "[Rust and C++] encourages creating a library of types with inheritance".
    Anyone who has at least read and understood both of these languages can see that this is not true, even if you replace inheritance with traits in rust. Not only that, traits do not support the same range of things that inheritance does. They have overlap, but it's not a small difference. When you actually get down to real rust codebases, they end up pretty different even if they're heavy on traits. But again, there's nothing requiring you to use traits. You see them often in libraries because they are very useful for generic programming. But that doesn't mean you are forced to use them in your application code.
    
    pjmlp 3 months ago
    
    You are making the mistake to confuse inheritance with class inheritance.
    There is also interface inheritance, which Rust allows for with trait bounds composition, now made even easier with trait upcasting.
    With some macro help, one can even do COM style of multiple interface implementation with aggregation and delegation, to simulate class inheritance, like ATL allows for with template metaprogramming.
    
    Arch-TK 3 months ago
    
    Rust doesn't have interface inheritance either.
    Rust only has composition in this regard. Calling it inheritance in any way is just confusing terminology.
    
    Zambyte 3 months ago
    
    I never said multiple inheritance.
    
    Arch-TK 3 months ago
    
    Multiple inheritance and SFINAE through templates are the only non-concept way of doing traits in C++
    The way single inheritance is used in C++ is nothing like how traits are used in Rust.
- voidUpdate 3 months ago
  
  I really want to start trying to learn zig, but right now I feel like it's not quite finished enough. When it hits 1.0, I'll probably give it a more serious look
  - woodrowbarlow 3 months ago
    
    Zig is friendly for soft-transitions because the compiler can compile C code. you can use Zig tooling for a C codebase, and then slowly add Zig code where it makes the most sense.
    
    voidUpdate 3 months ago
    
    Well I was debating learning C this year, but Zig seems like the more attractive option. I'd be starting new projects from scratch
    
    woodrowbarlow 3 months ago
    
    ahh, ideally i'd say learning them in any order is fine but unfortunately since the zig ecosystem is not mature yet that means you'll find yourself using C libraries from zig fairly often.
    so, yes, i agree learning zig is a great idea for new independent projects but (at the current stage) be prepared to also learn at least enough about C to use C libraries (return type conventions, raw enums, c-style strings, inflexible allocators).
    
    johnisgood 3 months ago
    
    I think it is still useful to learn C, however. For personal projects you could use Zig, but it is still good to know C. C is like the lingua franca of programming languages, and some assembly (if you care about optimization, for example).
masklinn 3 months ago

> Rust is supposed to replace C++, not C, right?
Rust is intended as a systems langage. To the extent that it’s “supposed” to replace anything, it’s both.
MrMcCall 3 months ago

The reality is that we spend FAR more time reading code than writing it. That is why readability is far more important than clever, line saving constructs.
The key to further minimizing the mental load of reacquainting yourself with older existing code is to decide on a set of code patterns and then be fastidious in using them.
And then, if you want to want to be able to easily write a parser for your own code (without every detail in the spec), it's even more important.
And now that I have read TFA, I see he wrote:
> We have tooling that verify basic code style compliance.
His experience and dilligence has led him to the mountaintop, that being we must make ourselves mere cogs in a larger machine, self-limiting ourselves for the greater good of our future workload and production quality.
- baumschubser 3 months ago
  
  > The reality is that we spend FAR more time reading code than writing it. That is why readability is far more important than clever, line saving constructs.
  In JS sometimes chain two or three inline-arrow-functions specifically for readability. When you read code, you often search for the needle of "the real thing" in a haystack of data formatting, API response prepping, localization, exception handling etc.
  Sometimes those shorthand constructs help me to skip the not-so-relevant parts instead of mentally climbing down and up every sort and rename function.
  That being said, I would not want this sentiment formalized in code guidelines :) And JS is not C except both have curly braces.
  - MrMcCall 3 months ago
    
    > That being said, I would not want this sentiment formalized in code guidelines :)
    Surely. I'm all for code formatting standards as long as they're MY code formatting standards :-)
    Ideally, I'd like the IDE to format the code to the user/programmer's style on open, but save the series of tokens to the code database in a formatting-agnostic fashion.
    Then we could each have our own style but still have a consistent codebase.
    And, I should add that my formatting conventions have gotten more extreme and persnickety over the years, and I now put spaces on both sides of my commas, because they're a separate token and are not a part of the expression on either side of it. I did this purely for readability, but I have NEVER seen anyone do that in all my decades on the internet reading code and working on large codebases. But I really like how spacing it out separates the expression information from the structural information.
    It also helps me deal with my jettisoning code color formatting, as, as useful as I've found it in the past, I don't want to deal with having to import/set all that environmental stuff in new environments. So, I just use bland vi with no intelligence, pushing those UI bells and whistles out of it into my code formatting.
    And, I fully endorse whatever it takes for you to deal with JS, as I have loathed it since it appeared on the scene, but that's just me being an old-school C guy.
- johnisgood 3 months ago
  
  > That is why readability is far more important than clever, line saving constructs.
  Yes, I agree, that is why I am put off by some supposed C replacements that are trying to be clever with their abstractions or constructs.
  - pjc50 3 months ago
    
    Could you give an example of "clever" (bad) vs "simple" (good)?
    In my experience C has a lot of simple grammar, a commonly-held simple (wrong) execution model, and a lot more complexity lurking underneath where it can't be so easily seen.
    (One of my formative learning books was https://en.wikipedia.org/wiki/C_Traps_and_Pitfalls , valid in the 90s and mostly still valid today)
  - MrMcCall 3 months ago
    
    Simplicity is essential to achieving managable complexity over time.
    
    kryptiskt 3 months ago
    
    Abstraction is necessary to handle scale. If you have painstakingly arrived at a working solution for a complex problem like say locking, you want to be able to package it up and use it throughout your codebase. C lacks mechanisms to do this apart from using its incredibly brittle macro facility.
    
    johnisgood 3 months ago
    
    Ada has built-in constructs for concurrency, with contracts, and there is formal verification in a subset of Ada named SPARK, so Ada / SPARK is pretty good.
    
    MrMcCall 3 months ago
    
    > C lacks mechanisms to do this apart from using its incredibly brittle macro facility.
    We programmers are the ultimate abstraction mechanism, and refining our techniques in pattern design and implementation in a codebase is our highest form of art. The list of patterns in the Gang-of-Four's "Design Patterns" are not as interesting as its first 50 pages, which are seminal.
    From the organization of files in a project, to organization of projects, to class structure and use, to function design, to debug output, to variable naming as per scope, to commandline argument specification, to parsing, it's nothing but patterns upon patterns.
    You're either doing patterns or you're doing one-offs, and one-offs are more brittle than C macros, are hard to comprehend later, and when you fix a bug in one, you've only fixed one bug, not an entire class of bugs.
    Abstraction is the essense of programming, and abstraction is just pattern design and implementation in a codebase, the design of a functional block and how it's consumed over time.
    The layering of abstractions is the most fundamental perspective on a codebase. They not only handle scale, they make or break correctness, ease of malleability, bug triage, performance, and comprehendability -- I'm sure I could find more.
    The design of the layering of abstractions is the everything of a codebase.
    The success of C's ability to let programmers create layers of abstractions is why C is the foundational language of the OS I'm using, as well as the browser I'm typing this message in. I'm guessing you are, too, and, while I could be wrong, it's not likely. And not a segfault in sight. The scale of Unix is unmatched.
    
    kllrnohj 3 months ago
    
    > The success of C's ability to let programmers create layers of abstractions is why C is the foundational language of the OS I'm using, as well as the browser I'm typing this message in.
    What browser are you using that has any appreciable amount of C in it? They all went C++ ages ago because it has much better abstraction and organization capabilities.
    
    MrMcCall 3 months ago
    
    That's a fair point that I hadn't considered. I was developing C+objects as C++ was first being released in the mid-90s, and then using Borland's C++ compiler in the early 2000s, but never really thought about it as anything more than what its name implies: "C with some more abstractions on top of it".
    Thank you for the correction, but I consider C++ to be just a set of abstractions built upon C, and, if you think about it, and none of those structures are separate from C, but merely overlaid upon it. I mean it is still just ints, floats, and pointers grouped using fancier abstractions. Yes, they're often nicer and much easier to use than what I had to do to write a GUI on top of extended DOS, but it's all just wrappers around C, IMO.
    
    kllrnohj 3 months ago
    
    C++ is very definitely not just wrappers around C and it's pretty ridiculous to frame it like that. Or if you want to insist on that, then C doesn't exist, either, as it's just a few small abstractions over assembly.
    
    pjc50 3 months ago
    
    > The success of C's ability to let programmers create layers of abstractions
    You wrote several entirely valid paragraphs about how important abstractions are and then put this at the end, when C has been eclipsed by 40+ years of better abstractions.
    
    MrMcCall 3 months ago
    
    Because programmers are creating the abstractions, not the programming language.
    And there is no OS I'm aware of that will threaten Unix's dominance any time soon.
    I'm not against it, but C's being so close to what microprocessors actually do seems to be story of of its success, now that I think about it.
    I personally haven't written in C for more than a half-decade, preferring Python, but everything I do in Python could be done in C, with enough scaffolding. In fact, Python is written in C, which makes sense because C++ would introduce too many byproducts to the tightness required of it.
    I was programming C using my own object structuring abstractions as C++ was being developed and released. It can be done, and done well (as evidenced by curl), but it just requires more care, which comes down to the abstractions we choose.
    So, I would say "eclipsed" is a bit strong a sentiment, especially given our newly favorite programming langauges are running on OSes written in C.
    If I had my druthers, I'd like everything to be F# with native compilation (i.e. not running using the .NET JIT), or OCaml with a more C-ish style of variable instantiation and no GC. But the impedance mismatch likely makes F# a poor choice for producing the kinds of precise abstractions needed for an OS, but that's just my opinion. Regardless, the code that runs runs via the microprocessor so the question really is, "What kinds of programming abstractions produce code that runs well on a microprocessor."
    I've never thought of this before, thanks for the great question.
    
    pjmlp 3 months ago
    
    > And there is no OS I'm aware of that will threaten Unix's dominance any time soon.
    Depends on the point of view, and what computing models we are talking about.
    While iDevices and Android have UNIX like bottom layer, the userspace has nothing to do with UNIX, developed in a mix of Objective-C, Swift, Java, Kotlin and C++.
    There is no UNIX per se on game consoles, and even on Orbit OS, there is little of it left.
    The famous Arduino sketches are written in C++ not C.
    Windows, dominant in games industry to the point Valve failed to attract developers to write GNU/Linux games, and had to come up with Proton instead, it is not UNIX, the old style Win32 C code has been practically frozen since Windows XP, with very few additions, as since Windows Vista it became heavily based on C++ and .NET code.
    macOS while being UNIX certified, the userspace that Apple cares about, or NeXT before the acquisition, has very little to do with UNIX and C, rather Objective-C, C++ and Swift.
    On the cloud native space, with managed runtimes on application containers or serverless, the exact nature of the underlying kernel or type 1 hypervisor is mostly irrelevant for application developers.
    
    neonsunset 3 months ago
    
    > I'd like everything to be F# with native compilation
    This already works today (even with GUI applications) - just define non-unbound-reflection using replacements for printfn (2 LOC) and you're good to go: dotnet publish /p:PublishAot=true
    To be clear, in .NET, both JIT runtime and ILC (IL AOT Compiler) drive the same back-end. The compiler itself is called RyuJIT but it really serves all kinds of scenarios today.
    > makes F# a poor choice for producing the kinds of precise abstractions needed for an OS
    You can do this in F# since it has access to all the same attributes for fine-grained memory layout and marshalling control C# does, but the experience of using C# for this is better (it is also, in general, better than using C). There are a couple areas where F# is less convenient to use than C# - it lacks C#'s lifetime analysis for refs and ref structs and its pattern matching does not work on spans and, again, is problematic with ref structs.
    
    pjc50 3 months ago
    
    > there is no OS I'm aware of that will threaten Unix's dominance any time soon
    True, but irrelevant?
    > What kinds of programming abstractions produce code that runs well on a microprocessor
    .. securely. Yes, this can be done in C-with-proofs (sel4), but the cost is rather high.
    To a certain extent microprocessors have co-evolved with C because of the need to run the same code that already exists. And existing systems force new work to be done with C linkage. But the ongoing CVE pressure is never going to go away.
    
    MrMcCall 3 months ago
    
    I'm not at all against a new model providing a more solid foundation for a new OS, but it's not going to be garbage collected, so the most popular of the newer languages make the pickings slim indeed.
    > But the ongoing CVE pressure is never going to go away.
    I think there are other ways to deflect or defeat that pressure, but I have no proof or work in that direction, so I really have nothing but admittedly wild ideas.
    However, one potentially promising possibility in that direction is the dawn of immutable kernels, but once again, that's just an intuition on my part, and they can likely be eventually defeated, if only by weaknesses in the underlying hardware architecture, even though newer techniques such as timing attacks should be more easily detected because they rely on being massively brute force.
    The question, to me, is "Can whittling away at the inherent weaknesses reduce the vulns to a level of practical invulnerability?" I'm not hopeful that that can occur but seeing the amount of work a complete reimplementation would require, it may simply be the best approach to choose from a cost-benefit analysis perspective where having far fewer bugs and vulns is more feasible than guaranteed perfection. And, once again, such perfection would require the hardware architecture be co-developed with the OS and its language to really create a bulletproof system, IMO.

veltas 3 months ago

> So many people will now joke and say something about wide screens being available

And this is a silly point because I want to be able to put 2-3 files side-by-side, on that big monitor. Who are all these people asking for long code that means I don't get more than one file on screen at a time?

Arch-TK 3 months ago

It's not even just that. The reason newspapers have multiple columns rather than lengthy lines is because it's strictly easier to read shorter lines.
- timhh 3 months ago
  
  I don't think anyone disagrees with that, but 80 characters is clearly waaay too restrictive. I think 120 is much more reasonable.
  - johnisgood 3 months ago
    
    I use either 2 spaces or tab for indentation for most languages, and I never go beyond 80 (actually, 79). It works well for XTerm, and most utilities that I use.
    For git commits, I do not go beyond ~69 characters per line, so it looks neat when I am viewing the commit history.
    120 characters may be fine if I only care about coding in VSCodium, for example, so sometimes I might go above the 80 column width when I am programming Go using VSCodium, but I try to not do that, because I still use "less" and whatnot, plus I have an old 17" monitor. I do not like wide monitors, I want to be able to look at the whole screen all at once, with wide monitors I would either have to be too far away, or move my head / neck / eyes too often.
    So... my fonts are small, I limit to 80 column width, and I am quite happy with it. :P
    To each their own, although I would have issues with Java code that not only requires me to have many files open, I would have to switch back and forth files, and I would have to horizontally scroll a lot.
    I hope you realize what I am trying to say, if not, I will elaborate.
  - Arch-TK 3 months ago
    
    Clearly to whom? I think it works fine in C with 8 spaces per indent level. It works fine in python and rust with 4 spaces per indent level. For some languages I think it's worth going down to 3 spaces per indent level. But I've not hit that many languages where it's worth going much past 80 characters.
  - dspillett 3 months ago
    
    As the article states:
    > The question could possibly be exactly where to draw the limit, and that’s a debate for every project to have.
    It is subjective, and does not live in a vacuum because along with purely subjective preference regarding it on its own, it affects, and is affected by, other choices like naming and indentation conventions.
    They like 80 in their project. Feel free to choose something else for your project.
dspillett 3 months ago

There are many, usually non-technical people though some devs & such too, who maximise everything then complain about how much space is wasted on the right of their fancy screen.
I have a 32" screen running at "standard" pixel pitch (matching the 24" 1080p screen I have in portrait next to it) which I sometimes use full-screen but usually have split 50/50, 33/66, 25/75, or 33/33/33, depending on what I'm doing. One of our testers doesn't understand, can't see benefit I get from the flexibility ("why not just have two monitors?" has been asked several times). It seems to actively annoy her that such a wide screen exists. If she ever saw the ultra-wide my friend uses for gaming I think she'd have a seizure.
Admittedly when sat this monitor plus the other in portrait is in total a bit wide (so the other screen is usually relegated to just being mail/chat windows that I only interact with when something pings for my attention) and a touch too tall. It is much more comfortable when I use the desk raised so I stand, which is how I work >⅔ of the time.

kwon-young 3 months ago

Curl is one of the very few projects I managed to contribute to with a very simple PR.

At the time, I was a bit lost with their custom testing framework, but was very imprest by the ease of contributing to one of the most successful open-source project out there.

I now understand why. It is because of their rules around testing and readability (and the friendly attitude of Daniel Stenberg) that a novice like me managed to do it.

kobzol 3 months ago

Great post!

I have some random guesses as to why the 40% vs 60-70% memory issues percentage:

- 180k is not that much code. The 60-70% number comes from Google and Microsoft, and they are dealing with way larger codebases. Of course, the size of the codebase in theory shouldn't affect the percentage, but I suspect in practice it does, as the larger the codebase is, the harder it is to enforce invariants and watch for all possible edge cases.

- A related aspect to that is that curl is primarily maintained by one person (you), or at most a handful of contributors. Of course many more people contribute to it, but there is a single maintainer who knows the whole codebase perfectly and can see behind all (or most) corners. For larger codebases with hundreds of people working on them, that is probably not the case.

- Curl is used by clients a lot (probably it's used more by clients than servers, for whatever definition of these words) over which you have no control and monitoring. That means that some UB or vulnerabilities that were triggered "in the wild", on the client side, might not ever be found. For Google/Microsoft, if we're talking about Chrome, Windows, web services etc., which are much more controled and monitored by their companies, I suspect that they are able to detect a larger fraction of vulnerabilities and issues than we are able to detect in curl.

- You write great code, love what you're doing and take pride in a job done well (again, if we scale this to a large codebase with hundreds of developers, it's quite hard to achieve the same level of quality and dedication there).

(sent this as a comment directly on the post, but it seems like it wasn't approved)

janoelze 3 months ago

This is remarkably clear writing — you sense how it was formed by thousands upon thousands of hours spent communicating, really cool.

bitwize 3 months ago

> how do we write C in curl to make it safe and secure for billions of installations?

"That's the neat thing -- you don't."

Curl should do what fish did: bite the bullet and rewrite the damn thing in Rust.

guappa 3 months ago

That would mean no longer running on a lot of devices it currently runs on. Which would mean those devices would just use extremely out of date curl written in C.
Any other brilliant idea?

dcminter 3 months ago

> "Wider code is harder to read. Period. "

That's stated as if it were proven, and I can believe that it has enough basis in fact that one might choose to enforce it, but I don't believe it's universally true.

I do often see code subject to a line-length linting enforcement that I think would have been clearer not broken up across multiple lines.

Personally I prefer a linter with escape hatches so that you can declare "this line exempt from such and such a rule" if you have enough reason for it and are willing to take the fight to the pull request :D

acmj 3 months ago

Some part of this article is opinionated. Curl may be well written but this is more likely to be the result of the overall structure than the number of characters per line. Actually I don't know whether curl is well written. Popularity doesn't always equate to code quality. I have used curl APIs before. I don't like them.

MrMcCall 3 months ago

All his ideas are fantastic, and are obviously the result of long experience in a seasoned and highly successful project. He is sharing techniques that simply work for large, complex codebases. Ignore them at your peril!

Specifically, though, these sections are related, in my experience:

> Avoid "bad" functions

> Buffer functions

> Parsing functions

> Monitor memory function use

These related aspects are why I tend to wrap many library functions that I use (in any language environment) with my own wrapper function, even if it's to just localize their use into one single entry/use point. That allows me to have one way that I use the function, thereby giving my code a place to not only place all best practices for its use, but to allow me to update those best practices in one single place for the entire codebase. And it is especially helpful if I want to simply rewrite the code itself to, for example, never use scanf, which I determined was a necessary strategy many, many moons ago.

Now, when a single function needs to accomodate different use cases and doing such separate kinds of logic would incur too much logical or runtime cost, a separate wrapper can be added, but if the additional wrappers can utilize the cornerstone wrapper, that is the best, if feasible. Of course, all these wrappers should be located in the same chunk of code.

For C, especially, wrapper functions also allow me to have my own naming convention over top of the standard library's terse names (without using macros, because they're to be avoided). That makes it easier for me to remember its name, thereby further reducing cognitive load.