/top /all /jobs
Topics: #Alcohol #DrugFree #Education #Hobbies #LawFirm #Movies #Music #News #Politics #Programming #PublicFigures #Romance #Technology

(PCRE-compatible)
Email administrator

Read Post
Sun, 10 Nov 2024 16:38:11 -0800

Andy from private IP, post #16994410 /all A deep dive into the Boyer-Moore string search algorithm, which is critical for this site When I was writing the backend for the new JDU, I had a choice between a database-driven web application like the vast majority of bulletin boards, or a novel filesystem-based approach that leverages the power of command line tools and file storage on Linux. Initially, I was concerned that the amount of uses of the Linux command line tool "grep" would be processor-intensive and result in long load times. I was wrong. The Boyer-Moore string search algorithm, which forms the basis of grep, is BLINDINGLY FAST. As in, your mind cannot conceive how fast the algorithm is, even on millions of lines of text. It does this by, in part, pre-processing the pattern to search for so that it only needs to perform operations that would actually find the pattern. For example, let's say you have the pattern "i-am-awesome" and you're looking for it in a 100 MB text file. Rather than slowly read the file character by character, grep first determines that you're looking for a 12-character sequence. Thus, at most it only needs to read 12-character blocks of the file. Then, rather than searching through every 12 characters, grep looks at the *12th* character in every 12-character block, sees whether it matches the 12th character of the pattern, and proceeds accordingly. If the 12th character matches, it moves backward one position and determines whether the 11th character matches, and so on. If the 11th character doesn't match, grep determines that block doesn't match, and moves on. This is an unbelievably fast way to match patterns and is one reason why every page on this site loads in between 20 and 200 ms, which is better performance than I would get from a large SQL database. I've also written a few optimizations, such as caching the number of replies to each post and updating that number anytime a reply is posted, rather than searching through the files to determine how many replies there are every time someone loads the index. I also came up with the technique of reading the last 500 log lines from the web server log and searching registered usernames in the log to see who is recently active on the site. That was freaking genius. The load on the server is virtually zero at all times thanks to my techniques and the design of Mosaic, which is what I am calling this system. There are many improvements and new features coming down the pipe. I'm sorry for the boast, but I really am the greatest lawyer-programmer in the world. https://en.wikipedia.org/wiki/Boyer%E2%80%93Moore_string-search_algorithm #Programming #Technology Mon, 11 Nov 2024 08:20:13 -0800
Andy from private IP Reply #19466453 The most important command line tools I use on here are grep, cut, and tail. That's most of what it takes to run this site. Cut is an interesting tool because it can slice text files based on a delimiter and just give you the fields you specify. Tail is great because you can take a huge text file, seek to the end, and get the last 500 log lines or whatever other number you want. All of these tools are highly optimized and ideal for my use case of a filesystem-based BBS. When combined, they are powerful tools even though individually they don't do very much. I also use the find command, which is the fastest way of searching a filesystem and executing grep, cut, and/or tail on the files found. Linux is the best platform I have ever used in nearly 40 years of using computers. When I switched over to Red Hat Enterprise Linux for my desktop workstation, it was confirmed that RHEL 9 makes Windows look like a toy. RHEL 9 also destroys macOS in the area of customization, although macOS is arguably easier to live with. It's an ongoing process for me, and I learn something new almost every day.
Replies require login.

Telemetry: page generated in 27.8 milliseconds for user at 18.97.9.170 on 2024-12-10 09:40:29

© 2024 Andrew G. Watters, Esq.

Test