:9front: Okay, here's webvac, which is of very limited use: https://git.freespeechextremist.com/gitweb/?p=webvac;a=blob_plain;f=README;hb=31f6b1e9fca63584c466e646bbf0af7faf855fc2

:bwk: webvac serves static files using venti. It solves the "big directory" problem Pleroma has if you have a lot of uploads, it serves HEAD requests *faster* than static files on-disk, it automatically compresses/dedups, and you can use the same venti server for as many instances as you have, so you get cross-instance dedup. It has been in production on FSE for a few days, serving the hell out of files, and (given that serving uploads is an I/O-bound process) no latency increase for the most part. Almost all of this is thanks to venti, which is great software.

:ken: FSE uses venti for its backups as well, by running vac against the media directory, then using venti/copy to replicate this to a venti server offsite. After the initial backup of the full 100GB (which took most of a day), replication (three times a day) with full history takes under ten minutes, usually.

:mcilroy: You will need: venti (for storing the blocks; a Plan 9 venti server or a P9P one running on Linux/BSD/whatever works fine), redis (for storing the map of filenames to blocks), vac/unvac from P9P ( https://github.com/9fans/plan9port , also available over IPFS: QmW7ytEymcMpw1KAsqQh73gTiP5iCQFNZeD1DxRVgMpFDA), Ruby, and the ability to tweak nginx without getting an aneurysm from the rage.

:pike: There are two main pieces: webvac-server (which serves the files) and webvac-sweep (which sweeps files into venti). webvac-sweep pulls a file into venti and then records the pathname, score, and metadata to redis. If all of that is successful, it can (optionally) delete the file. The file is now ready to be served! What FSE does is sweep everything and then remove files smaller than 4MB (after 4MB it takes about a second to retrieve the file from venti, I plan to fix the source of this problem and then serve *everything* from venti). Then we have nginx check if the file is present in the FS and if not, forward the request to webvac-server. (There is an example nginx config file for testing.)

You can browse the tree at https://git.freespeechextremist.com/gitweb/?p=webvac . You can clone it using:

$ git clone git://git.freespeechextremist.com/webvac

Or in a completely decentralized manner by grabbing a checkout from IPFS:

$ ipfs get -o webvac QmXhsk7s87NSh5fQmhwK3HNizt69iWUDn2bAYASEhpjMAF

There are installation instructions in the README at the top of the source tree and linked above.

There are more announcements coming today.

@p You made claims this was faster. Do you have benchmarks between this and the oldway to see?

@freemo I eyeballed it locally and I ran awk against the server logs. The uploads dir for FSE was 9.6MB, so just a full readdir was a 9.6MB pointer-chase across non-consecutive 4kB blocks of the disk, this is a lookup in a hash table in memory. The worst-case scenario for that was obscene, it was like ten seconds to do an 'ls' when the cache was cold. Across the board, HEAD requests are now pretty uniform and stay under 1ms of backend time.

If you need something more scientific than that, then feel free to not believe me. I'm already hacking on the next thing.

@p
i do t need anything. A proper benchmark woukd be nice to back uo your claim, but if its just a guestimate then it is what it is. I just wanted to know where the xlaim came from and if it had any weight.

@freemo

> A proper benchmark woukd be nice to back uo your claim

If I managed to make a hash table lookup in RAM slower than a pointer chase across the disk, I would publish that. This is just a property of the data structures and their respective storage media. It's an in-memory hash table versus an on-disk linked list, that'd be like devising and running a benchmark to determine that a search index is faster than a full scan of the disk. The only reason I bothered to eyeball it was to make sure I hadn't completely botched it by doing something stupid.

> I just wanted to know where the xlaim came from and if it had any weight.

Okay, the FS's block size is 4kB. dirents are not contiguous because they are appended piecemeal, so between two blocks that represent a directory, you will find thousands of blocks representing the files that have been written to disk in the interim. You can't predict it because it's a linked list. (Every clever thought you're having right now about putting an index in has to account for the physical medium and the failure modes and space overhead and performance. If you have any intuition about this, it is almost certainly tuned for RAM rather than disk, and most of the clever ideas have been tried already, and they did not result in a reliable, performant filesystem.) Here is a diagram. A pointer-chase across *random-access* memory is already bad, but even on SATA or NVMe disks, a seek is costlier.

Here is a hasty diagram, this is CS 101 stuff, dude.
filesystem_pointer_chase.png

@p

Ya know what is also CS 101, writing unit tests and benchmarks to go along side code that is written, even when it is known to be an improvement... Why? Simple, it helps us track the performance improvement and also help us tweak future modifications to the code and know when we make mistakes other than what we intended.

No one is saying your intent isnt justified, this is just how you write good code, that includes good tests not sure to prove out your current code, but more importantly as a measure for future tweaks to the code.

Good job and thanks for the hard work.

The thing is, optimisation is a tricky beast.

> that'd be like devising and running a benchmark to determine that a search index is faster than a full scan of the disk

I have seen many times where search indexes can be slower than full scans, just as sometimes hash tables can be slower than linked lists under the right circumstances.

I am not saying that applies here, I am not saying you are under any obligation to do a benchmark, I am not saying this is bad work in anyway.

All I'm saying is a benchmark would have been interesting and I dont rule out the possibility that under certain conditions it might show a slow down and in others a speedup, either of those might be marginal, and it would be interesting to see where the tradeover occurs and just how much of an improvement you get as various conditions grow.

Again not saying you need to do this to determine it was a good move. Just saying it would have been interesting to see, and the benchmarks in general useful for future diagnostics.

On the projects I run I like to create along side my unit tests extensive benchmarking. As features or fixes are added we watch the benchmarks change along side it, and it provides a similar CI tool as unit tests might. So I generally find it a worthwhile effort even if it may not be critical in knowing that the current feature set makes sense performance wise.

@freemo @p I think you both are over-estimating the level of content taught in CS 101.
@tmy @freemo Hyperbole, maybe. Read "CS 101" like "It is the merest elementary, my dear Watson" but where "Watson" is the IBM AI rather than the doctor/roommate/reader-proxy.

@p

I wasnt headed anywhere. I asked you if you had a benchmark. My intent was you to either say yes, and share it, or no and I'd say "ok".

However you decided, as you tend to, to get insulting and childish and be like "this is just Cs101 you should know this dumbass".. so here we are.

Personally I dont care, I'm not developing on the project so it doesnt effect me. At some point someone might come in and write some benchmarks, that would be great. It would be a very useful tool.

I just laugh at the fact that you get your undies in a knot over someone asking a simple question tot he point you feel the need to start spewing rude quips rather than just saying "no sorry dont have one" and calling it a day.

At one point I just hope you grow up and join the rest of the adults in being able to have a normal conversation with someone.

@tmy

@freemo @tmy Dude, next time I want to know what the HN hivemind has to say about best practices, I'm glad you're available as a resource, but I'd rather be dead than bored.

@p

lol good coding isnt always fun, sometimes its boring. But hey if you dont care about doing a good job, more power to you. Best of luck.

@tmy

@freemo @tmy Work tedium is completely different from hearing a guy read you the development methodology four-color glossies. At least something's getting accomplished. Just assume the person you're talking to has heard everything you've heard and mention the name instead of paraphrasing the bullet points under the beaker/gear/light bulb/person icons on the Bootstrap page about the thing. That way you don't come across as tedious or patronizing and if they don't know what the thing is, they can ask instead of trying to scan a sermon to see if there is anything in it that is worth addressing.

> if you dont care about doing a good job, more power to you.

We've reached the stage where you downgrade your post quality from "Hacker News" to "Reddit", I see.

@p

Says the man who all I wanted to know what benchmarks you had if any, no judgement and went right to telling me my question was "CS 101".. It was a yes or no question and you've went on with your moaning about even daring to be asked it for over an hour now...

We call this projection

seriously grow up at this point. You've wasted far too much time trying to flex your ego. Its toxic and you dont do yourself any favors.

@tmy

@realcaseyrollins

Exactly! Thats what im fricking saying.. Asked the dude a question and every response is some condescending bullshit and he wonders why I keep calling him out on it.. Its getting old.

@p @tmy

@freemo @p @tmy Well IDK about that you just seem mad that he won't make a benchmark. A benchmark isn't the only way to test efficiency, as P is proving here, although it's probably better.

Berating P over not making a benchmark just seems childish to me imho...y'all have different opinions on how this sort of thing should be handled, just agree to disagree man

@realcaseyrollins

I've said the exact opposite several times, several of my quotes in this thread:

> if its just a guestimate then it is what it is. I just wanted to know where the xlaim came from

> I am not saying that applies here, I am not saying you are under any obligation to do a benchmark

> Again not saying you need to do this to determine it was a good move.

> I wasnt headed anywhere. I asked you if you had a benchmark. My intent was you to either say yes, and share it, or no and I'd say "ok".

I am **not** berating P over not doing the benchmark. I am berating P over being condescending and rude and therefore am giving the same in kind and defending against hisa ttacks (like telling me its CS 101 that I should know this and dare even ask).

@p @tmy

@realcaseyrollins

The rudeness started in his very first response..

"A proper benchmark woukd be nice to back uo your claim, but if its just a guestimate then it is what it is"

He started when said "this is CS 101 stuff, dude." which i took as rude and condescending

then went on to tell me i was lecturing for just asking the question:

"I bleed tree sap and I cannot think of anything more boring than being lectured"

and other such condescending or rude comments as:

"next time I want to know what the HN hivemind has to say about best practices, I'm glad you're available as a resource, but I'd rather be dead"

"hearing a guy read you the development methodology four-color glossies"

No one lectured him, I asked a freaking question. He challenged the absurdity of even asking for such a thing, I told him it was good practice, and he went on for over an hour with this crap.

@p @tmy

@freemo @p @tmy You might be right but I can see how one might view "this is just how you write good code, that includes good tests not sure to prove out your current code, but more importantly as a measure for future tweaks to the code." as a sort of lecturing

@realcaseyrollins

Yes I could see that too, but it also was after he already started in with the condescending remarks I quoted. Thus why I turned off the delicate touch.

I will say this, P seems to be getting a little better, so maybe he is on the right path and this childish phase of him being defensive over a simple question and then I get lost in back and forth with him for hours might (I hope) not be the norm for much longer.

@p @tmy

@realcaseyrollins

LOL FYI this comment triggered him so much he blocked me. I'll call QED on that one.

@p @tmy

Follow

@freemo @p @tmy I mean it's better than y'all arguing and fighting nonstop imho

· · Web · 2 · 0 · 1

@realcaseyrollins

Anything is better than that. I just hoped he could have grown the fuck up instead. But he has some comgination of ego issues (insecure) and a bit of a personality disorder mixed in it seems.

Which is fine, no one is perfect, Its probably just years of him being fired from programming jobs for refusing to get along with the group that has led him to be insecure I imagine.

If he acted this way in any group where he wasnt the one in control he wouldnt last a day, which is probably why he needed to start his own server. It was the only way he could be in enough control and feel secure he wouldnt get kicked out of some other server.

@p @tmy

@realcaseyrollins @freemo @tmy I ain't been triggered, I just regret every interaction I've ever had with this guy about a third of the way through the interaction.
Sign in to participate in the conversation
Game Liberty Mastodon

Mainly gaming/nerd instance for people who value free speech. Everyone is welcome.