masleeds 10 hours ago

Riak has been maintained through the post-basho years by engineers at some of its larger customers (disclaimer - including myself).

The focus has been on trying to improve the stability of the database when subject to complex failure scenarios under stressful load, with minimal need for urgent operator intervention. The focus has been on keeping those existing operators happy rather than seeking out new users. Evolution of the product since basho has been slow but significant.

The project now has support from Erlang Ecosystem Foundation, and we're looking to invest some effort over the next few months explaining what we've done, and to start to articulate what we see as the future for Riak. So if you're interested watch this space.

It is expected to remain a niche product though. However, it may still find a home for those demanding specific non-functional requirements, with an acceptance of some functional constraints.

  • jamesblonde 9 hours ago

    Metastability is an under-rated system property for databases and systems software, in general.

    • remram 4 hours ago

      What does metastability mean in this context? I've only seen it used to mean "appears stable but not actually stable", eg systems that resist small perturbations but never return to nominal after bigger disturbance (like cold boot).

      Did you mean "stability"?

  • pton_xd 8 hours ago

    > we're looking to invest some effort over the next few months explaining what we've done, and to start to articulate what we see as the future for Riak. So if you're interested watch this space.

    Where's that going to be posted? I'm not a Riak user but I am interested in hearing what others are doing in regards to improving failure scenarios in distributed systems.

jtuple 4 hours ago

This really hits home and makes me happy to see on the HN front page.

Nearly 10 years later and I still consider my time working on Riak at Basho the highlight of my career.

After leaving, my original plan was to found "Basho 2.0" after my non-compete expired. But, unexpected personal/family hardships in 2015-2018 made big-tech money the better choice for awhile, and Cloud/competitors continued to chip away at the market.

Often stil regret not taking that path.

But, happy to see technology I'm very fond of still living on and providing value to the world.

freerobby 10 hours ago

I led a migration from Mongo to Riak at Shareaholic about 12 years ago: https://www.slideshare.net/slideshow/migrating-to-riak-at-sh...

It was successful at first, but ultimately we traded one set of problems for another (how novel, I know).

In particular, I underestimated the pain of troubleshooting the database itself. Riak was a new product, we were a small team that had never run anything on BEAM, and ultimately we lost too many days debugging and trying to make sense of Erlang stacktraces.

The Basho folks were great, and to this day I appreciate how quickly they fixed a number of bugs for us. But ultimately it wasn't enough -- we found problems faster than they could be patched.

isoos 11 hours ago

Basho team was very kind to open source contributions in ~2011-12: I've written an open source Riak client in Dart, and they had sent me t-shirts (the quality ones that are rare today). Nice treats for a fun project :)

tibbar 11 hours ago

I've never met an engineering team that used Riak, but it is used heavily as an example technology in Kleppmann's 'Designing Data Intensive Applications'. (I would say, informally, it's usually the example of the "other way" as opposed to other more well-known databases.) This does make me wonder what became of it, why it didn't take off.

  • rmetzler 5 minutes ago

    > I've never met an engineering team that used Riak

    I was part of a recent cloud migration. Part of on-prem (though unfortunately not migrated by my team) were this very first Riak Cluster I saw in production.

    The engineering team used it as "kind of S3" for images, with 3 to 5 PHP scripts providing an interface to Riak and imageMagic. It seemed to me like a good abstraction and I think the migration to S3 was mostly painless.

    Other than that I only had contact with Riak at university around 15 years ago, when we tested cluster setups of several NoSQL databases and tried to manually introduce faults to see if they could heal. Riak passed our test at that time, MongoDB didn't.

  • macintux 11 hours ago

    Speaking as a former tech evangelist/engineer at Basho, there were a few significant challenges.

    Riak is horribly unfriendly as a database: no SQL, it exposes eventual consistency directly to the developer, it’s relatively slow, and Erlang is a fairly unusual language.

    While you can run Riak on a single server, you’d have to really want to.

    Its strength is the ability to scale massively, but not many projects need that scale, and by the time you do, you’re probably already using some friendlier database and you’d rather make that one work.

    • mrweasel 3 minutes ago

      I joined a company that had some investment in Basho and managed to sell Riak as the data store for a large client. It never really worked out, not enough of the SREs had be properly trained on Riak, the developers hated it because getting help and support was somewhat difficult, especially in an emergency.

      In the end more and more data was offloaded to MariaDB, until one day the last remaining data couldn't justify the cost of the Riak cluster. I think we swapped out an eight node Riak cluster for two largish MariaDB database (one being a hot-standby).

      For one of the other clients it was the exact same scenario, only we had been contracted in to help run the Riak cluster, which we didn't do well. Once they had migrate of it, to Oracle I think, the client left.

      To me it always felt like it was just the wrong tool for that particular job. Someone really wanted to be able to jump on the NoSQL hype and sell something. They picked Riak, because it honestly looked really good, and probably was, compared to MongoDB, CouchDB or whatever else happened to float around at the time. It just wasn't the right tool for the problems it was applied to.

    • btilly 10 hours ago

      Back in 2011 I was working on a project that involved Riak. The difficulty and slowness for doing stuff corresponding to basic SQL operations was certainly a giant strike against it, and helped sink that project before it was released.

      • p_l 9 hours ago

        > corresponding to basic SQL operations

        Ohhh, this brings memories of developers hitting the wall... Between different SQL databases!

        Back in 2016 I was delegated at work to do ops on a project that had big data ambitions in Threat Intelligence space.

        Part of how they intended to support that was Apache Phoenix, an SQL database backed by HBase, running on top of Hadoop that also provided object storage (annoyingly through WebHDFS gateway).

        Constant problems with hung Phoenix queries and instability of Hadoop in entirety led me to propose moving over to PostgreSQL, which generally went quite well... Except several cases of "basic SQL operations" that turned to have wildly different performance compared to Phoenix and most importantly, to MySQL in MyISAM mode, like doing SELECT (*) on huge tables.

        Fun times, got to meet a postgres core team member thanks to it.

    • anotherjesse 10 hours ago

      https://howfuckedismydatabase.com/nosql/ this infamous comic is about riak

      • rubiquity 8 hours ago

        That could also very well be about CouchDB which implemented indexes/views as MapReduce functions.

        • senderista 6 hours ago

          Back in the day we had a CouchDB MapReduce view (on Cloudant) which took a full month to rebuild (while an angry customer was waiting). The I/O inefficiency was absolutely off the charts.

    • binary132 10 hours ago

      I wonder if some of these issues could be addressed sanely in an extension to the functionality

      • macintux 8 hours ago

        We were working on ways of making it easier (such as CRDTs to reduce the amount of work developers had to do to leverage eventual consistency), but these were pretty challenging problems to solve.

        One of our biggest disappointments: we had plans to add a way to enforce strong consistency leveraging (IIRC) something akin to multi-paxos, but couldn't get it to work.

        • jtuple 5 hours ago

          TBH, we shipped fully working strong consistency in 2014. It just had a limited feature set, was disabled by default, and was never promoted/marketed since it didn't fit the direction the new CEO/CTO was pushing.

          The engineering exodus around that time sorta killed the project though, and we never were able to do the big follow-up work to make it really shine.

          (Disclaimer: Former Basho Principal Engineer, primary author of strong consistency work, lead riak_core dev from 2011-2015)

          I think another 18 months would have been enough too. But it just wasn't the right environment after the hostile take-over / leadership transition.

          • masleeds 6 minutes ago

            I'm not sure though for how much longer it will continue to make sense for the project as-is to continue to roll riak_ensemble forward as part of future releases. As there are no contributors who have direct direct experience or knowledge of using it in production, so it is hard to claim it as being a supported part of the product in any real sense.

            I apologise if we do eventually cut it. Having worked through the code when chasing unstable tests, I developed an appreciation for the quality of the work.

          • NickM 3 hours ago

            There were customers happily using strong consistency in production, but somehow the idea that it wasn’t “finished” kept getting repeated over and over by management. I was well on my way to solving the biggest rough edge (tombstone reaping in SC buckets) but then I got pulled off to work on the infamous “data platform” and never got to finish that work :-(

          • macintux 4 hours ago

            My apologies for misremembering. I’m glad you chimed in to correct the record.

        • binary132 5 hours ago

          that sounds like a painful session

    • cmrdporcupine 10 hours ago

      Thing is, Cassandra became and remained popular, with similar aspects (though in JVM instead of Erlang, so).

      Though it had a couple years head start when there really no other options for people wanting that kind of kit.

      • wbl 5 hours ago

        I feel building a threat intelligence product on Cassandra is a bit on the nose. What's next, calling the TCB Palladium?

  • 0xbadcafebee 10 hours ago

    I worked on a team that built a massive, high-performance internal service based on Riak. There are many things I learned from that system. Here is the best takeaway I can offer:

    It does not matter what your technology is, or how theoretically superior it is. Getting it to actually work well "in production" is a whole separate thing than simply designing it and writing code. When it's a very small system, it will look like it's doing great. As it gets bigger, the seams will start to burst, and you will find out that promises and theory don't always match reality.

    In the end, while its aims are great, it takes a whoooooole lot of work to smooth out the bumps in such a system. You need experts in that technology to address bugs in a timely manner. You need developers versed in the system to properly build apps utilizing it. You need competent operators to build, orchestrate, operate and maintain the whole thing.

    All of that is made easier by using simple technology that everybody knows, that there's a huge support community for, professional services for, etc. A technology like MySQL or Postgres etc, has the corporate, development, support, etc to make it easy to work with at any scale. A little janky at times, limited, but dependable, predictable, controllable.

    A small bespoke system with a small support community and virtually no corporate support is, comparatively, a hell of a lot more difficult/costly to support and harder to make work reliably.

  • red_hare 10 hours ago

    My old team used Riak in production for time series data in a real-time system.

    Our code was in Clojure, and we just wrapped the Java client. The conflict resolution was a steep learning curve, but overall, it was kind of nice (coming from Mongo).

    But man, Clojure stack traces wrapping Java stack traces wrapping Erlang stack traces in a Kafka consumer... I wish that hell on no one.

  • m00x 40 minutes ago

    Companies would rather use something like dynamodb than self-host riak. You get an army of Amazon code monkeys to help you if something goes wrong, and it's a click away.

  • encoderer 11 hours ago

    Inscrutable erlang stack traces definitely played a part. They were horrible.

  • bojo 10 hours ago

    I'm pretty sure Stripe was a heavy user of this for a while. They used it due to their write-heavy system, if I recall.

    I fondly remember writing a Go driver for it. Was a good experience: https://github.com/riaken/riaken-core

  • veyh 10 hours ago

    We used Riak at $dayjob at around 2014-2017 (iirc). I don't exactly remember it fondly. It was slow and unreliable. You could make it freeze/crash with the wrong SOLR query. (I was pretty good at that...)

    • masleeds 10 hours ago

      The SOLR part has now been retired from the last few releases.

      Current development has been focused on improving the flexibility of secondary indexes. There was some funky stuff achieved by some users using overloaded 2i terms and distributed processing of regular expressions against those terms - the aim is now to make this more flexible to the modern developer using the language of projected attributes and filter expressions (ala DynamoDB). There's also some active work to both replicate-to and full-sync (i.e. reconcile with) external OpenSearch clusters.

      The primary goal for OpenRiak is stability under load/failure as a K/V store - so the ultra-flexibility of in-built SOLR querying has been sacrificed in the move towards that aim. Anything that can do harm is to be offloaded or constrained.

carterschonwald 11 hours ago

Cool! I never used it but really liked the engineers I met who worked at basho on risk. AFAIK, they basically had an engineering dream team until their last ceo had them go hard in certain directions that didn’t pan out.

tptacek 9 hours ago

As someone who has used Riak in anger once in his career and who has a blossoming interest in FoundationDB I'd love someone to contrast the two systems. My knee-jerk reaction --- which I'm calling out as such! --- is that FDB has decreased the relevance of systems like Riak.

  • masleeds 8 hours ago

    I would tend to agree, perhaps a decade ago it was easier to define the uniqueness of Riak, and now there are alternatives that offer similar guarantees. So the relevance of Riak is not as obvious.

    Also as we focus on stability on OpenRiak going forward, that means reducing some of the capability that may have made Riak stand-out in the scale-out space. The preference going forward is to do fewer things, but do those things predictably well.

    There will be differences between Riak and FoundationDB, and I hope those differences are sufficient to make Riak interesting, and allow it to continue to occupy a small niche in the world of databases.

chadd 11 hours ago

I used Riak for a project back in 2012, the app that became the Whisper App, and as a huge Erlang fanboy, I was so excited about it.

But it was incredibly unreliable at scale, and my colleague and I spent a week of sleepless nights under incredible personal and business pressure - as the servers got busier and busier - ripping it out.

Still love vector clocks, though, and have fond memories of the Basho team presenting at Erlang Factory

  • amanj41 11 hours ago

    Vector clocks are very cool. Having read through how they were initially used in Riak, I was blown away that such an implementation could scale. I guess this is why Cassandra took a different approach?

    • tibbar 11 hours ago

      Vector clocks are certainly cool but fundamentally premised on the idea of having multiple 'live' versions of a value at once. Amazon's original Dynamo paper required conflict resolution at the application level, which is a very strange framework to build applications on. (Notably DynamoDB has moved away from this, I believe to Last Write Wins.) Cassandra takes the latter approach by default as well, I believe.

      • amanj41 10 hours ago

        yes there's that idiosyncrasy, as well as client ideally needing to read the previous clock from the DB before writing an update for that key unless it's ok with the write being viewed as concurrent. Plus the extra memory overhead to store the clocks in the client.

amerine 5 hours ago

I still use my RICON pint glass and wear my RICON jacket Basho gave everyone at RICON almost every week. My favorite conf swag ever

p_l 4 hours ago

A question possibly answered elsewhere, but did openriak include only Risk KV, or also other projects like CS & TS?

vosper 11 hours ago

Who would this be for in 2024?

I remember evaluating Riak back in 2011 or so for an analytics solution, but ended up going with a more traditional OLAP database that was a much better option.

It's hard for me to imagine where Riak would be a good option given how many choices we have today for various data stores.

  • EwanToo 10 hours ago

    It's realistically for the handful (dozens at most?) of very large Riak implementations where it would be enormously expensive to rewrite the application running on top of it.

    For example, the UK NHS Spine messaging system which has been building on Riak for 10 years

    https://riak.com/posts/press/nhs-launches-upgraded-it-backbo...

  • nicholas-adams 5 hours ago

    I think that in the time since 2011, things have changed more than a bit. As my employer provides Enterprise Grade Riak Support (and, of course, OpenRiak support), I'm under NDA and cannot really share names. However, I can share that that there are quite a few places that use Riak.

    Here are a few off the top of my head:

    - the biggest online betting company in the world

    - one of Japan's largest e-commerce sites

    - a large Hungarian bank

    - one of China's largest electronic manufacturers

    - arguably Asia's largest or second largest messaging platform

    - a significant Indian online-documentation provider

    - one of the largest US insurance providers

    - an Australian app analytics provider

    - a European telephone services provider

    - one of the world's largest travel sites

    - an Asian-based credit-card fraud detection service

    - a number of start ups in various industries

    - me - I do my crypto taxes using a 5 node Riak cluster running on Raspberry Pi's

    In the Basho era (up to early 2017), Riak may have only been targetted to larger players but now, when it comes to areas such as in-house data sovereignty, compliance (e.g. GDPR), the flexibility, speed and reliability Riak now provides plus being free to run, people from individuals to corporates are starting to wake up and see the advantages.

    (edited in an attempt to improve list formatting)

  • sitkack 10 hours ago

    It is a fault tolerant massively scalable key value store capable of handling hundreds of terabytes of data.

    What are these options you are thinking of?

    The only thing that comes to my mind is Aerospike and possibly ScyllaDB.

    • tptacek 6 hours ago

      FoundationDB seems like the obvious example? But they're not strictly comparable in anything but scale, right? FDB is ACID.

    • senderista 6 hours ago

      If cloud is an option, DynamoDB?

  • felixgallo 11 hours ago

    Riak isn’t remotely like OLAP. What was your use case?

    • ramon156 11 hours ago

      Think they meant OLTP

binary132 10 hours ago

It’s actually kinda silly how exciting this is to me