@enumerator4829

enumerator4829@sh.itjust.works · 4 days ago

You mean a transparency log? Just sign and publish. Or if it’s confidential, have a timestamp authority sign it, but what’s the point of a confidential blockchain? Sure, we han have a string of hashes chained together á la git, but that’s just an implementation detail. Where does the trust come from, who does the audit? That’s the interesting part.

enumerator4829@sh.itjust.works · 4 days ago

If your blockchain isn’t distributed, it doesn’t need to be a blockchain, because then you already have trust established.

enumerator4829@sh.itjust.works · 5 days ago

Git gud

/s

enumerator4829@sh.itjust.works · edit-2 5 days ago

You assume a uniform distribution. I’m guessing that it’s not. The question isn’t ”Does the model contain compressed representations of all works it was trained on”. Enough information on any single image is enough to be a copyright issue.

Besides, the situation isn’t as obviously flawed with image models, when compared to LLMs. LLMs are just broken in this regard, because it only takes a handful of bytes being retained in order to violate copyright.

I think there will be a ”find out” stage fairly soon. Currently, the US projects lots and lots of soft power on the rest of the world to enforce copyright terms favourable to Disney and friends. Accepting copyright violations for AI will erode that power internationally over time.

Personally, I do think we need to rework copyright anyway, so I’m not complaining that much. Change the law, go ahead and make the high seas legal. But set against current copyright laws, most large datasets and most models constitute copyright violations. Just imagine the shitshow if OpenAI was an European company training on material from Disney.

enumerator4829@sh.itjust.works · 6 days ago

Stability and standardisation within the kernel for kernel modules. There are plenty of commercial products that use proprietary kernel modules that basically only work on a very specific kernel version, preventing upgrades.

Or they could just open source and inline their garbage kernel modules…

enumerator4829@sh.itjust.works · 6 days ago

Or you know, trusted timestamps and cryptographic signatures via normal PKI. A Merkle tree isn’t worth shit legally if you can’t verify it against a trust outside of the tree.

All of the blockchain bullshit miss that part - you can create a cryptographic representation of money or contracts, but you can’t actually enforce, verify or trust anything in the real world without intermediaries. On the other hand, I can trust a certificate from a CA because there are verifiable actual real-world consequences for someone if that CA breaks legal agreements.

I’ll use a folder of actual papers, signed using a pen. Have some witnesses, make sure they have a legal stake and consequences, and you are golden.

enumerator4829@sh.itjust.works · 6 days ago

Distributed blockchains are useful when all of the below are fulfilled:

Need for distributed ledger
Peers are adversarial w.r.t. contents of transactions in the ledger
Enough peers exist so that no group can become a majority and thus assume control
No trusted central authority exists

Here, we have a single peer creating entries in a ledger. We can get away with a copy of the ledger and one or more trusted timestamping authorities.

enumerator4829@sh.itjust.works · 6 days ago

There is an argument that training actually is a type of (lossy) compression. You can actually build (bad) language models by using standard compression algorithms to ”train”.

By that argument, any model contains lossy and unstructured copies of all data it was trained on. If you download a 480p low quality h264-encoded Bluray rip of a Ghibli movie, it’s not legal, despite the fact that you aren’t downloading the same bits that were on the Bluray.

Besides, even if we consider the model itself to be fine, they did not buy all the media they trained the model on. The action of downloading media, regardless of purpose, is piracy. At least, that has been the interpretation for normal people sailing the seas, large companies are of course exempt from filthy things like laws.

enumerator4829@sh.itjust.works · 11 days ago

I’ve seen ZFS in production use on pools with hundreds of TBs, clustered together into clusters of many PBs. The people running that don’t even think about BTRFS, and certainly won’t actively consider it for anything.

BTRFS once had data corruption bugs. ZFS also had that, but only in very specific edge cases. That case was taken very seriously, but basically, ZFS has a reputation for not fucking up your bits even close to BTRFS
ZFS was built and tested for use on large pools from the beginning, by Sun engineers from back when Sun was fucking amazing and not a pile of Oracle managed garbage. BTRFS still doesn’t have stable RAID5/6.
ZFS send/recv is amazing for remote replication of large filesystems.
DRAID is kind o the only sane thing to do with todays disk sizes, speeds and pool sizes.

But those are ancillary reasons. I’ll quote the big reason from the archwiki:

The RAID 5 and RAID 6 modes of Btrfs are fatally flawed, and should not be used for "anything but testing with throw-away data”.

For economic reasons, you need erasure coding for bigger pools, either classic RAID5/6 or DRAID. BTRFS will either melt your data in RAID5/6 or not support DRAID at all.

enumerator4829@sh.itjust.works · 14 days ago

I have a mac I use for some specific tasks. I’ll agree the Apple is, ehh, Apple.

But mounting network fileshares is dead simple. My SMB share pops right up, authentication works fine, the user interface for it is fine. If I wanted to use it remotely, I’d just export it over my tailnet.

’sshfs’ is good for short stints of brief use, but ultimately it breaks on a protocol level as soon as your socket dies, on any OS.

enumerator4829@sh.itjust.works · 18 days ago

To be fair, most higher density areas in Sweden have fairly good infrastructure for public transit. The national railways are a disgrace, but that mostly affects long distance travel. Mostly. Short to medium distance commute works fairly well everywhere I’ve tried it.

enumerator4829@sh.itjust.works · 19 days ago

But why use money to innovate when there is profit to be made and laws are just made up?

AI is the new kid on the block, trying to make a dent in our society. So far, we don’t really have that many useful or productive deployments. It’s on AI to prove it’s worth, and it’s kinda worthless until proven otherwise. (Name one interaction with a commercially deployed AI model you didn’t hate?)

So far, Apple is failing with consumer products, Microsoft is backing off on GPU-orders, research showing commercial GenAI isn’t increasing productivity, NVDA seems to cool off and you expect the benevolent commercial health care industry to come to the rescue?

Yeah, I’ll keep my knee jerk reaction and keep living with my current socialised health care.

enumerator4829@sh.itjust.works · 19 days ago

LLM training is expensive, so are prompt ”engineers”. This will be the cheapest off-the-shelf LLM they can find, prompted by someone’s nephew. People will be eating glue.

enumerator4829@sh.itjust.works · 19 days ago

Used to? It’s standard practice like everywhere.

enumerator4829@sh.itjust.works · 24 days ago

I’ll extend your RHEL corpo parents with the other children in the family. The majority of their revenue comes from completely legal oxycodone sales, any (alleged) trafficking is just a side hustle.

Rocky: The rich corpo parent’s least favorite child. Chill dude. Gives hugs to his parents victims. Still intends to take over the family business and run an oxycodone-empire - but ethically.

Alma: The other reasonable estranged child. Wants to take over the family business, but considers high quality ”herbal remedies” the only pain medication anyone would ever need.

Oracle: Wants to pivot the family business into more potent opioids and possibly world domination. While it’s obvious he has access to ”stuff”, you suspect he has ties to multiple cartels and possibly the yakuza. For some reason has direct numbers to several heads-of-state in his phone.

enumerator4829@sh.itjust.works · 25 days ago

Exactly, if you need shelf life, you use tape. Shelf life isn’t really a consideration for hard drives or SSDs in real life scenarios.

enumerator4829@sh.itjust.works · 27 days ago

The flaw with hard drives comes with large pools. The recovery speed is simply too slow when a drive fails, unless you build huge pools. So you need additional drives for more parity.

I don’t know who cares about shelf life. Drives spin all their lives, which is 5-10 years. Use M-Disk or something if you want shelf life.

enumerator4829@sh.itjust.works · 27 days ago

Tape will survive, SSDs will survive. Spinning rust will die

enumerator4829@sh.itjust.works · 1 month ago

# echo ”SELINUX=enforcing” > /etc/selinux/conf
# echo ”SELINUXTYPE=mls >> /etc/selinux/conf
# reboot

Come on, it will be fun!