MinIO

Commit Graph

Author	SHA1	Message	Date
Harshavardhana	504e52b45e	protect bpool from buffer pollution by invalid buffers (#20342 )	11 months ago
Anis Eleuch	bf1769d3e0	xl: Avoid marking a drive offline after one part read failure (#19779 ) This commit will fix one rare case of a multipart object that can be read in theory but GetObject API returned an error. It turned out that a six years old code was marking a drive offline when the bitrot streaming fails to read a part in a disk with any error. This can affect reading a subsequent part, though having enough shards, but unable to construct because one drive was marked offline earlier. This commit will remove the drive marking offline code. It will also close the bitrotstreaming reader before marking it as nil.	1 year ago
Klaus Post	d4b391de1b	Add PutObject Ring Buffer (#19605 ) Replace the `io.Pipe` from streamingBitrotWriter -> CreateFile with a fixed size ring buffer. This will add an output buffer for encoded shards to be written to disk - potentially via RPC. This will remove blocking when `(*streamingBitrotWriter).Write` is called, and it writes hashes and data. With current settings, the write looks like this: ``` Outbound ┌───────────────────┐ ┌────────────────┐ ┌───────────────┐ ┌────────────────┐ │ │ Parr. │ │ (http body) │ │ │ │ │ Bitrot Hash │ Write │ Pipe │ Read │ HTTP buffer │ Write (syscall) │ TCP Buffer │ │ Erasure Shard │ ──────────► │ (unbuffered) │ ────────────► │ (64K Max) │ ───────────────────► │ (4MB) │ │ │ │ │ │ (io.Copy) │ │ │ └───────────────────┘ └────────────────┘ └───────────────┘ └────────────────┘ ``` We write a Hash (32 bytes). Since the pipe is unbuffered, it will block until the 32 bytes have been delivered to the TCP buffer, and the next Read hits the Pipe. Then we write the shard data. This will typically be bigger than 64KB, so it will block until two blocks have been read from the pipe. When we insert a ring buffer: ``` Outbound ┌───────────────────┐ ┌────────────────┐ ┌───────────────┐ ┌────────────────┐ │ │ │ │ (http body) │ │ │ │ │ Bitrot Hash │ Write │ Ring Buffer │ Read │ HTTP buffer │ Write (syscall) │ TCP Buffer │ │ Erasure Shard │ ──────────► │ (2MB) │ ────────────► │ (64K Max) │ ───────────────────► │ (4MB) │ │ │ │ │ │ (io.Copy) │ │ │ └───────────────────┘ └────────────────┘ └───────────────┘ └────────────────┘ ``` The hash+shard will fit within the ring buffer, so writes will not block - but will complete after a memcopy. Reads can fill the 64KB buffer if there is data for it. If the network is congested, the ring buffer will become filled, and all syscalls will be on full buffers. Only when the ring buffer is filled will erasure coding start blocking. Since there is always "space" to write output data, we remove the parallel writing since we are always writing to memory now, and the goroutine synchronization overhead probably not worth taking. If the output were blocked in the existing, we would still wait for it to unblock in parallel write, so it would make no difference there - except now the ring buffer smoothes out the load. There are some micro-optimizations we could look at later. The biggest is that, in most cases, we could encode directly to the ring buffer - if we are not at a boundary. Also, "force filling" the Read requests (i.e., blocking until a full read can be completed) could be investigated and maybe allow concurrent memory on read and write.	1 year ago
Klaus Post	ec816f3840	Reduce parallelReader allocs (#19558 )	1 year ago
Harshavardhana	caac9d216e	remove all the frivolous logs, that may or may not be actionable (#18922 ) for actionable, inspections we have `mc support inspect` we do not need double logging, healing will report relevant errors if any, in terms of quorum lost etc.	2 years ago
Harshavardhana	1d3bd02089	avoid close 'nil' panics if any (#18890 ) brings a generic implementation that prints a stack trace for 'nil' channel closes(), if not safely closes it.	2 years ago
Harshavardhana	dd2542e96c	add codespell action (#18818 ) Original work here, #18474, refixed and updated.	2 years ago
Harshavardhana	45fb375c41	allow healing to prefer local disks over remote (#17788 )	2 years ago
Kaan Kabalak	21fbe88e1f	Print certain log messages once per error (#17484 )	2 years ago
Anis Eleuch	54c5c88fe6	Add number of offline disks in quorum errors (#16822 )	2 years ago
Harshavardhana	38ccc4f672	fix: make sure to avoid calling RenameData() on disconnected disks. (#14094 ) Large clusters with multiple sets, or multi-pool setups at times might fail and report unexpected "file not found" errors. This can become a problem during startup sequence when some files need to be created at multiple locations. - This PR ensures that we nil the erasure writers such that they are skipped in RenameData() call. - RenameData() doesn't need to "Access()" calls for `.minio.sys` folders they always exist. - Make sure PutObject() never returns ObjectNotFound{} for any errors, make sure it always returns "WriteQuorum" when renameData() fails with ObjectNotFound{}. Return appropriate errors for all other cases.	4 years ago
jiangfucheng	7460fb8349	fix padding error and compatible with uploaded objects (#13803 )	4 years ago
Harshavardhana	ec8d93f756	fix: add missing readTriggerCh close (#12593 )	4 years ago
Harshavardhana	1f262daf6f	rename all remaining packages to internal/ (#12418 ) This is to ensure that there are no projects that try to import `minio/minio/pkg` into their own repo. Any such common packages should go to `https://github.com/minio/pkg`	4 years ago
Harshavardhana	d84261aa6d	fix: ensure proper usage of DataDir (#12300 ) - GetObject() should always use a common dataDir to read from when it starts reading, this allows the code in erasure decoding to have sane expectations. - Healing should always heal on the common dataDir, this allows the code in dangling object detection to purge dangling content. These both situations can happen under certain types of retries during PUT when server is restarting etc, some namespace entries might be left over.	4 years ago
Harshavardhana	091845df39	fix: return quorum error upon decode failures (#12184 )	4 years ago
Harshavardhana	069432566f	update license change for MinIO Signed-off-by: Harshavardhana <harsha@minio.io>	4 years ago
Harshavardhana	6160188bf3	fix: erasure index based reading based on actual ParityBlocks (#11792 ) in some setups with ordering issues in drive configuration, we should rely on expected parityBlocks instead of `len(disks)/2`	4 years ago
Harshavardhana	e019f21bda	fix: trigger heal if one of the parts are not found (#11358 ) Previously we added heal trigger when bit-rot checks failed, now extend that to support heal when parts are not found either. This healing gets only triggered if we can successfully decode the object i.e read quorum is still satisfied for the object.	5 years ago
Harshavardhana	c4b1d394d6	erasure: avoid io.Copy in hotpaths to reduce allocation (#11213 )	5 years ago
Anis Elleuch	677e80c0f8	xl: Remove check-dir in ReadVersion (#11200 ) The only purpose of check-dir flag in ReadVersion is to return 404 when an object has xl.meta but without data. This is causing an extract call to the disk which can be penalizing in case of busy system where disks receive many concurrent access.	5 years ago
Poorna Krishnamoorthy	1ebf6f146a	Add support for ILM transition (#10565 ) This PR adds transition support for ILM to transition data to another MinIO target represented by a storage class ARN. Subsequent GET or HEAD for that object will be streamed from the transition tier. If PostRestoreObject API is invoked, the transitioned object can be restored for duration specified to the source cluster.	5 years ago
Harshavardhana	2f681bed57	fix: pop entries from each drives in parallel (#9918 )	5 years ago
Harshavardhana	4915433bd2	Support bucket versioning (#9377 ) - Implement a new xl.json 2.0.0 format to support, this moves the entire marshaling logic to POSIX layer, top layer always consumes a common FileInfo construct which simplifies the metadata reads. - Implement list object versions - Migrate to siphash from crchash for new deployments for object placements. Fixes #2111	5 years ago
Klaus Post	4a007e3767	Prefer local disks when fetching data blocks (#9563 ) If the requested server is part of the set this will always read from the local disk, even if the disk contains a parity shard. In default setup there is a 50% chance that at least one shard that otherwise would have been fetched remotely will be read locally instead. It basically trades RPC call overhead for reed-solomon. On distributed localhost this seems to be fairly break-even, with a very small gain in throughput and latency. However on networked servers this should be a bigger 1MB objects, before: ``` Operation: GET. Concurrency: 32. Hosts: 4. Requests considered: 76257: * Avg: 25ms 50%: 24ms 90%: 32ms 99%: 42ms Fastest: 7ms Slowest: 67ms * First Byte: Average: 23ms, Median: 22ms, Best: 5ms, Worst: 65ms Throughput: * Average: 1213.68 MiB/s, 1272.63 obj/s (59.948s, starting 14:45:44 CEST) ``` After: ``` Operation: GET. Concurrency: 32. Hosts: 4. Requests considered: 78845: * Avg: 24ms 50%: 24ms 90%: 31ms 99%: 39ms Fastest: 8ms Slowest: 62ms * First Byte: Average: 22ms, Median: 21ms, Best: 6ms, Worst: 57ms Throughput: * Average: 1255.11 MiB/s, 1316.08 obj/s (59.938s, starting 14:43:58 CEST) ``` Bonus fix: Only ask for heal once on an object.	5 years ago
Bala FA	95e89f1712	proactive deep heal object when a bitrot is detected (#9192 )	5 years ago
kannappanr	5ecac91a55	Replace Minio refs in docs with MinIO and links (#7494 )	6 years ago
Krishna Srinivas	730ac5381c	Simplify parallelReader.Read() (#7109 ) Simplify parallelReader.Read() which also fixes previous implementation where it was returning before all the parallel reading go-routines had terminated which caused race conditions.	7 years ago
Krishna Srinivas	98c950aacd	Streaming bitrot verification support (#7004 )	7 years ago
Krishna Srinivas	52f6d5aafc	Rename of structs and methods (#6230 ) Rename of ErasureStorage to Erasure (and rename of related variables and methods)	7 years ago
Krishna Srinivas	ce02ab613d	Simplify erasure code by separating bitrot from erasure code (#5959 )	7 years ago
kannappanr	f8a3fd0c2a	Create logger package and rename errorIf to LogIf (#5678 ) Removing message from error logging Replace errors.Trace with LogIf	7 years ago
Aditya Manthramurthy	ea8973b7d7	Return bit-rot verified data instead of re-reading from disk (#5568 ) - Data from disk was being read after bitrot verification to return data for GetObject. Strictly speaking this does not guarantee bitrot protection, as disks may return bad data even temporarily. - This fix reads data from disk, verifies data for bitrot and then returns data to the client directly.	8 years ago
Harshavardhana	8efa82126b	Convert errors tracer into a separate package (#5221 )	8 years ago
Andreas Auernhammer	02af37a394	optimize memory allocs during reconstruct (#4964 ) The reedsolomon library now avoids allocations during reconstruction. This change exploits that to reduce memory allocs and GC preasure during healing and reading.	8 years ago
Andreas Auernhammer	7e6b5bdbb7	remove ReadFileWithVerify from StorageAPI (#4947 ) This change removes the ReadFileWithVerify function from the StorageAPI. The ReadFile was basically a redirection to ReadFileWithVerify. This change removes the redirection and moves the logic of ReadFileWithVerify directly into ReadFile. This removes a lot of unnecessary code in all StorageAPI implementations. Fixes #4946 * review: fix doc and typos	8 years ago
Harshavardhana	2e6ee68409	fix: [minor] Avoid unnecessary typecasting. (#4828 ) We don't need to typecast identifiers from their base to type to same type again. This is not a bug and compiler is fine to skip it but it is better to avoid if not needed.	8 years ago
Andreas Auernhammer	85fcee1919	erasure: simplify XL backend operations (#4649 ) (#4758 ) This change provides new implementations of the XL backend operations: - create file - read file - heal file Further this change adds table based tests for all three operations. This affects also the bitrot algorithm integration. Algorithms are now integrated in an idiomatic way (like crypto.Hash). Fixes #4696 Fixes #4649 Fixes #4359	8 years ago
Frank Wessels	fffe4ac7e6	Prevent unnecessary verification of parity blocks while reading (#4683 ) * Prevent unnecessary verification of parity blocks while reading erasure coded file. * Update klauspost/reedsolomon and just only reconstruct data blocks while reading (prevent unnecessary parity block reconstruction) * Remove Verification of (all) reconstructed Data and Parity blocks since in our case we are protected by bit rot protection. And even if the verification would fail (essentially impossible) there is no way to definitively say whether the data is still correct or not, so this call make no sense for our use case.	8 years ago
Aditya Manthramurthy	8975da4e84	Add new ReadFileWithVerify storage-layer API (#4349 ) This is an enhancement to the XL/distributed-XL mode. FS mode is unaffected. The ReadFileWithVerify storage-layer call is similar to ReadFile with the additional functionality of performing bit-rot checking. It accepts additional parameters for a hashing algorithm to use and the expected hex-encoded hash string. This patch provides significant performance improvement because: 1. combines the step of reading the file (during erasure-decoding/reconstruction) with bit-rot verification; 2. limits the number of file-reads; and 3. avoids transferring the file over the network for bit-rot verification. ReadFile API is implemented as ReadFileWithVerify with empty hashing arguments. Credits to AB and Harsha for the algorithmic improvement. Fixes #4236.	8 years ago
Harshavardhana	bcc5b6e1ef	xl: Rename getOrderedDisks as shuffleDisks appropriately. (#3796 ) This PR is for readability cleanup - getOrderedDisks as shuffleDisks - getOrderedPartsMetadata as shufflePartsMetadata Distribution is now a second argument instead being the primary input argument for brevity. Also change the usage of type casted int64(0), instead rely on direct type reference as `var variable int64` everywhere.	9 years ago
Harshavardhana	1b85302161	Fix spelling and golint errors. (#3266 ) Fixes #3263	9 years ago
Krishna Srinivas	9358ee011b	logging: Print stack trace in case of errors. fixes #1827	9 years ago
Harshavardhana	bccf549463	server: Move all the top level files into cmd folder. (#2490 ) This change brings a change which was done for the 'mc' package to allow for clean repo and have a cleaner github drop in experience.	9 years ago
karthic rao	a3592228f5	bug-fix: fix for tests failure when cache is disabled (#2439 )	9 years ago
Harshavardhana	f503ac3db8	XL/Erasure: Make bit-rot verification based on xl.json algo. (#2299 ) Currently `xl.json` saves algorithm information for bit-rot verification. Since the bit-rot algo's can change in the future make sure the erasureReadFile doesn't default to a particular algo. Instead use the checkSumInfo.	9 years ago
Krishna Srinivas	043ddbd834	optimize memory allocation during erasure-read by using temporary buffer pool. (#2259 ) * XL/erasure-read: optimize memory allocation during erasure-read by using temporary buffer pool. With the change the buffer needed during GetObject by erasureReadFile is allocated only once.	9 years ago
Krishna Srinivas	18728a0b59	XL/erasure-read: refactor erasure read and add tests (#2232 )	9 years ago
Krishna Srinivas	897d78d113	erasureReadFile and erasureCreateFile testcases. (#2229 ) * unit-tests: Unit tests for erasureCreateFile and erasureReadFile. * appendFile() should return errXLWriteQuorum. * TestErasureReadFileOffsetLength() tests erasureReadFile() for different offset and lengths. * Fix for the failure seen in the erasure read unit test case. Issue #2227 * Move common erasure setup code to newErasureTestSetup() * Review fixes. Add few more test cases for erasureReadFile.	9 years ago
Krishna Srinivas	8cc163e51a	Refactor xl.GetObject and erasureReadFile. (#2211 ) * XL: Refactor xl.GetObject and erasureReadFile. erasureReadFile() responsible for just erasure coding, it takes ordered disks and checkSum slice. * move getOrderedPartsMetadata and getOrderedDisks to xl-v1-utils.go * Review fixes.	9 years ago

30 Commits (b9f0e8c7124ddd65f65f08d2c603f76f39a7ea09)