It adds ARM64, ppc64le, s390x, solaris support, and a bunch of
bugfixs.
Signed-off-by: Qiang Huang <h.huangqiang@huawei.com>
| ... | ... |
@@ -36,7 +36,7 @@ clone git github.com/coreos/etcd v2.2.0 |
| 36 | 36 |
fix_rewritten_imports github.com/coreos/etcd |
| 37 | 37 |
clone git github.com/ugorji/go 5abd4e96a45c386928ed2ca2a7ef63e2533e18ec |
| 38 | 38 |
clone git github.com/hashicorp/consul v0.5.2 |
| 39 |
-clone git github.com/boltdb/bolt v1.0 |
|
| 39 |
+clone git github.com/boltdb/bolt v1.1.0 |
|
| 40 | 40 |
|
| 41 | 41 |
# get graph and distribution packages |
| 42 | 42 |
clone git github.com/docker/distribution 20c4b7a1805a52753dfd593ee1cc35558722a0ce # docker/1.9 branch |
| ... | ... |
@@ -16,7 +16,7 @@ and setting values. That's it. |
| 16 | 16 |
|
| 17 | 17 |
## Project Status |
| 18 | 18 |
|
| 19 |
-Bolt is stable and the API is fixed. Full unit test coverage and randomized |
|
| 19 |
+Bolt is stable and the API is fixed. Full unit test coverage and randomized |
|
| 20 | 20 |
black box testing are used to ensure database consistency and thread safety. |
| 21 | 21 |
Bolt is currently in high-load production environments serving databases as |
| 22 | 22 |
large as 1TB. Many companies such as Shopify and Heroku use Bolt-backed |
| ... | ... |
@@ -87,6 +87,11 @@ are not thread safe. To work with data in multiple goroutines you must start |
| 87 | 87 |
a transaction for each one or use locking to ensure only one goroutine accesses |
| 88 | 88 |
a transaction at a time. Creating transaction from the `DB` is thread safe. |
| 89 | 89 |
|
| 90 |
+Read-only transactions and read-write transactions should not depend on one |
|
| 91 |
+another and generally shouldn't be opened simultaneously in the same goroutine. |
|
| 92 |
+This can cause a deadlock as the read-write transaction needs to periodically |
|
| 93 |
+re-map the data file but it cannot do so while a read-only transaction is open. |
|
| 94 |
+ |
|
| 90 | 95 |
|
| 91 | 96 |
#### Read-write transactions |
| 92 | 97 |
|
| ... | ... |
@@ -120,12 +125,88 @@ err := db.View(func(tx *bolt.Tx) error {
|
| 120 | 120 |
}) |
| 121 | 121 |
``` |
| 122 | 122 |
|
| 123 |
-You also get a consistent view of the database within this closure, however, |
|
| 123 |
+You also get a consistent view of the database within this closure, however, |
|
| 124 | 124 |
no mutating operations are allowed within a read-only transaction. You can only |
| 125 | 125 |
retrieve buckets, retrieve values, and copy the database within a read-only |
| 126 | 126 |
transaction. |
| 127 | 127 |
|
| 128 | 128 |
|
| 129 |
+#### Batch read-write transactions |
|
| 130 |
+ |
|
| 131 |
+Each `DB.Update()` waits for disk to commit the writes. This overhead |
|
| 132 |
+can be minimized by combining multiple updates with the `DB.Batch()` |
|
| 133 |
+function: |
|
| 134 |
+ |
|
| 135 |
+```go |
|
| 136 |
+err := db.Batch(func(tx *bolt.Tx) error {
|
|
| 137 |
+ ... |
|
| 138 |
+ return nil |
|
| 139 |
+}) |
|
| 140 |
+``` |
|
| 141 |
+ |
|
| 142 |
+Concurrent Batch calls are opportunistically combined into larger |
|
| 143 |
+transactions. Batch is only useful when there are multiple goroutines |
|
| 144 |
+calling it. |
|
| 145 |
+ |
|
| 146 |
+The trade-off is that `Batch` can call the given |
|
| 147 |
+function multiple times, if parts of the transaction fail. The |
|
| 148 |
+function must be idempotent and side effects must take effect only |
|
| 149 |
+after a successful return from `DB.Batch()`. |
|
| 150 |
+ |
|
| 151 |
+For example: don't display messages from inside the function, instead |
|
| 152 |
+set variables in the enclosing scope: |
|
| 153 |
+ |
|
| 154 |
+```go |
|
| 155 |
+var id uint64 |
|
| 156 |
+err := db.Batch(func(tx *bolt.Tx) error {
|
|
| 157 |
+ // Find last key in bucket, decode as bigendian uint64, increment |
|
| 158 |
+ // by one, encode back to []byte, and add new key. |
|
| 159 |
+ ... |
|
| 160 |
+ id = newValue |
|
| 161 |
+ return nil |
|
| 162 |
+}) |
|
| 163 |
+if err != nil {
|
|
| 164 |
+ return ... |
|
| 165 |
+} |
|
| 166 |
+fmt.Println("Allocated ID %d", id)
|
|
| 167 |
+``` |
|
| 168 |
+ |
|
| 169 |
+ |
|
| 170 |
+#### Managing transactions manually |
|
| 171 |
+ |
|
| 172 |
+The `DB.View()` and `DB.Update()` functions are wrappers around the `DB.Begin()` |
|
| 173 |
+function. These helper functions will start the transaction, execute a function, |
|
| 174 |
+and then safely close your transaction if an error is returned. This is the |
|
| 175 |
+recommended way to use Bolt transactions. |
|
| 176 |
+ |
|
| 177 |
+However, sometimes you may want to manually start and end your transactions. |
|
| 178 |
+You can use the `Tx.Begin()` function directly but _please_ be sure to close the |
|
| 179 |
+transaction. |
|
| 180 |
+ |
|
| 181 |
+```go |
|
| 182 |
+// Start a writable transaction. |
|
| 183 |
+tx, err := db.Begin(true) |
|
| 184 |
+if err != nil {
|
|
| 185 |
+ return err |
|
| 186 |
+} |
|
| 187 |
+defer tx.Rollback() |
|
| 188 |
+ |
|
| 189 |
+// Use the transaction... |
|
| 190 |
+_, err := tx.CreateBucket([]byte("MyBucket"))
|
|
| 191 |
+if err != nil {
|
|
| 192 |
+ return err |
|
| 193 |
+} |
|
| 194 |
+ |
|
| 195 |
+// Commit the transaction and check for error. |
|
| 196 |
+if err := tx.Commit(); err != nil {
|
|
| 197 |
+ return err |
|
| 198 |
+} |
|
| 199 |
+``` |
|
| 200 |
+ |
|
| 201 |
+The first argument to `DB.Begin()` is a boolean stating if the transaction |
|
| 202 |
+should be writable. |
|
| 203 |
+ |
|
| 204 |
+ |
|
| 129 | 205 |
### Using buckets |
| 130 | 206 |
|
| 131 | 207 |
Buckets are collections of key/value pairs within the database. All keys in a |
| ... | ... |
@@ -175,13 +256,61 @@ db.View(func(tx *bolt.Tx) error {
|
| 175 | 175 |
``` |
| 176 | 176 |
|
| 177 | 177 |
The `Get()` function does not return an error because its operation is |
| 178 |
-guarenteed to work (unless there is some kind of system failure). If the key |
|
| 178 |
+guaranteed to work (unless there is some kind of system failure). If the key |
|
| 179 | 179 |
exists then it will return its byte slice value. If it doesn't exist then it |
| 180 | 180 |
will return `nil`. It's important to note that you can have a zero-length value |
| 181 | 181 |
set to a key which is different than the key not existing. |
| 182 | 182 |
|
| 183 | 183 |
Use the `Bucket.Delete()` function to delete a key from the bucket. |
| 184 | 184 |
|
| 185 |
+Please note that values returned from `Get()` are only valid while the |
|
| 186 |
+transaction is open. If you need to use a value outside of the transaction |
|
| 187 |
+then you must use `copy()` to copy it to another byte slice. |
|
| 188 |
+ |
|
| 189 |
+ |
|
| 190 |
+### Autoincrementing integer for the bucket |
|
| 191 |
+By using the NextSequence() function, you can let Bolt determine a sequence |
|
| 192 |
+which can be used as the unique identifier for your key/value pairs. See the |
|
| 193 |
+example below. |
|
| 194 |
+ |
|
| 195 |
+```go |
|
| 196 |
+// CreateUser saves u to the store. The new user ID is set on u once the data is persisted. |
|
| 197 |
+func (s *Store) CreateUser(u *User) error {
|
|
| 198 |
+ return s.db.Update(func(tx *bolt.Tx) error {
|
|
| 199 |
+ // Retrieve the users bucket. |
|
| 200 |
+ // This should be created when the DB is first opened. |
|
| 201 |
+ b := tx.Bucket([]byte("users"))
|
|
| 202 |
+ |
|
| 203 |
+ // Generate ID for the user. |
|
| 204 |
+ // This returns an error only if the Tx is closed or not writeable. |
|
| 205 |
+ // That can't happen in an Update() call so I ignore the error check. |
|
| 206 |
+ id, _ = b.NextSequence() |
|
| 207 |
+ u.ID = int(id) |
|
| 208 |
+ |
|
| 209 |
+ // Marshal user data into bytes. |
|
| 210 |
+ buf, err := json.Marshal(u) |
|
| 211 |
+ if err != nil {
|
|
| 212 |
+ return err |
|
| 213 |
+ } |
|
| 214 |
+ |
|
| 215 |
+ // Persist bytes to users bucket. |
|
| 216 |
+ return b.Put(itob(u.ID), buf) |
|
| 217 |
+ }) |
|
| 218 |
+} |
|
| 219 |
+ |
|
| 220 |
+// itob returns an 8-byte big endian representation of v. |
|
| 221 |
+func itob(v int) []byte {
|
|
| 222 |
+ b := make([]byte, 8) |
|
| 223 |
+ binary.BigEndian.PutUint64(b, uint64(v)) |
|
| 224 |
+ return b |
|
| 225 |
+} |
|
| 226 |
+ |
|
| 227 |
+type User struct {
|
|
| 228 |
+ ID int |
|
| 229 |
+ ... |
|
| 230 |
+} |
|
| 231 |
+ |
|
| 232 |
+``` |
|
| 185 | 233 |
|
| 186 | 234 |
### Iterating over keys |
| 187 | 235 |
|
| ... | ... |
@@ -254,7 +383,7 @@ db.View(func(tx *bolt.Tx) error {
|
| 254 | 254 |
max := []byte("2000-01-01T00:00:00Z")
|
| 255 | 255 |
|
| 256 | 256 |
// Iterate over the 90's. |
| 257 |
- for k, v := c.Seek(min); k != nil && bytes.Compare(k, max) != -1; k, v = c.Next() {
|
|
| 257 |
+ for k, v := c.Seek(min); k != nil && bytes.Compare(k, max) <= 0; k, v = c.Next() {
|
|
| 258 | 258 |
fmt.Printf("%s: %s\n", k, v)
|
| 259 | 259 |
} |
| 260 | 260 |
|
| ... | ... |
@@ -294,7 +423,7 @@ func (*Bucket) DeleteBucket(key []byte) error |
| 294 | 294 |
|
| 295 | 295 |
### Database backups |
| 296 | 296 |
|
| 297 |
-Bolt is a single file so it's easy to backup. You can use the `Tx.Copy()` |
|
| 297 |
+Bolt is a single file so it's easy to backup. You can use the `Tx.WriteTo()` |
|
| 298 | 298 |
function to write a consistent view of the database to a writer. If you call |
| 299 | 299 |
this from a read-only transaction, it will perform a hot backup and not block |
| 300 | 300 |
your other database reads and writes. It will also use `O_DIRECT` when available |
| ... | ... |
@@ -305,11 +434,12 @@ do database backups: |
| 305 | 305 |
|
| 306 | 306 |
```go |
| 307 | 307 |
func BackupHandleFunc(w http.ResponseWriter, req *http.Request) {
|
| 308 |
- err := db.View(func(tx bolt.Tx) error {
|
|
| 308 |
+ err := db.View(func(tx *bolt.Tx) error {
|
|
| 309 | 309 |
w.Header().Set("Content-Type", "application/octet-stream")
|
| 310 | 310 |
w.Header().Set("Content-Disposition", `attachment; filename="my.db"`)
|
| 311 | 311 |
w.Header().Set("Content-Length", strconv.Itoa(int(tx.Size())))
|
| 312 |
- return tx.Copy(w) |
|
| 312 |
+ _, err := tx.WriteTo(w) |
|
| 313 |
+ return err |
|
| 313 | 314 |
}) |
| 314 | 315 |
if err != nil {
|
| 315 | 316 |
http.Error(w, err.Error(), http.StatusInternalServerError) |
| ... | ... |
@@ -351,14 +481,13 @@ go func() {
|
| 351 | 351 |
// Grab the current stats and diff them. |
| 352 | 352 |
stats := db.Stats() |
| 353 | 353 |
diff := stats.Sub(&prev) |
| 354 |
- |
|
| 354 |
+ |
|
| 355 | 355 |
// Encode stats to JSON and print to STDERR. |
| 356 | 356 |
json.NewEncoder(os.Stderr).Encode(diff) |
| 357 | 357 |
|
| 358 | 358 |
// Save stats for the next loop. |
| 359 | 359 |
prev = stats |
| 360 | 360 |
} |
| 361 |
-} |
|
| 362 | 361 |
}() |
| 363 | 362 |
``` |
| 364 | 363 |
|
| ... | ... |
@@ -366,25 +495,83 @@ It's also useful to pipe these stats to a service such as statsd for monitoring |
| 366 | 366 |
or to provide an HTTP endpoint that will perform a fixed-length sample. |
| 367 | 367 |
|
| 368 | 368 |
|
| 369 |
+### Read-Only Mode |
|
| 370 |
+ |
|
| 371 |
+Sometimes it is useful to create a shared, read-only Bolt database. To this, |
|
| 372 |
+set the `Options.ReadOnly` flag when opening your database. Read-only mode |
|
| 373 |
+uses a shared lock to allow multiple processes to read from the database but |
|
| 374 |
+it will block any processes from opening the database in read-write mode. |
|
| 375 |
+ |
|
| 376 |
+```go |
|
| 377 |
+db, err := bolt.Open("my.db", 0666, &bolt.Options{ReadOnly: true})
|
|
| 378 |
+if err != nil {
|
|
| 379 |
+ log.Fatal(err) |
|
| 380 |
+} |
|
| 381 |
+``` |
|
| 382 |
+ |
|
| 383 |
+ |
|
| 369 | 384 |
## Resources |
| 370 | 385 |
|
| 371 | 386 |
For more information on getting started with Bolt, check out the following articles: |
| 372 | 387 |
|
| 373 | 388 |
* [Intro to BoltDB: Painless Performant Persistence](http://npf.io/2014/07/intro-to-boltdb-painless-performant-persistence/) by [Nate Finch](https://github.com/natefinch). |
| 389 |
+* [Bolt -- an embedded key/value database for Go](https://www.progville.com/go/bolt-embedded-db-golang/) by Progville |
|
| 390 |
+ |
|
| 391 |
+ |
|
| 392 |
+## Comparison with other databases |
|
| 393 |
+ |
|
| 394 |
+### Postgres, MySQL, & other relational databases |
|
| 395 |
+ |
|
| 396 |
+Relational databases structure data into rows and are only accessible through |
|
| 397 |
+the use of SQL. This approach provides flexibility in how you store and query |
|
| 398 |
+your data but also incurs overhead in parsing and planning SQL statements. Bolt |
|
| 399 |
+accesses all data by a byte slice key. This makes Bolt fast to read and write |
|
| 400 |
+data by key but provides no built-in support for joining values together. |
|
| 401 |
+ |
|
| 402 |
+Most relational databases (with the exception of SQLite) are standalone servers |
|
| 403 |
+that run separately from your application. This gives your systems |
|
| 404 |
+flexibility to connect multiple application servers to a single database |
|
| 405 |
+server but also adds overhead in serializing and transporting data over the |
|
| 406 |
+network. Bolt runs as a library included in your application so all data access |
|
| 407 |
+has to go through your application's process. This brings data closer to your |
|
| 408 |
+application but limits multi-process access to the data. |
|
| 409 |
+ |
|
| 410 |
+ |
|
| 411 |
+### LevelDB, RocksDB |
|
| 374 | 412 |
|
| 413 |
+LevelDB and its derivatives (RocksDB, HyperLevelDB) are similar to Bolt in that |
|
| 414 |
+they are libraries bundled into the application, however, their underlying |
|
| 415 |
+structure is a log-structured merge-tree (LSM tree). An LSM tree optimizes |
|
| 416 |
+random writes by using a write ahead log and multi-tiered, sorted files called |
|
| 417 |
+SSTables. Bolt uses a B+tree internally and only a single file. Both approaches |
|
| 418 |
+have trade offs. |
|
| 375 | 419 |
|
| 420 |
+If you require a high random write throughput (>10,000 w/sec) or you need to use |
|
| 421 |
+spinning disks then LevelDB could be a good choice. If your application is |
|
| 422 |
+read-heavy or does a lot of range scans then Bolt could be a good choice. |
|
| 376 | 423 |
|
| 377 |
-## Comparing Bolt to LMDB |
|
| 424 |
+One other important consideration is that LevelDB does not have transactions. |
|
| 425 |
+It supports batch writing of key/values pairs and it supports read snapshots |
|
| 426 |
+but it will not give you the ability to do a compare-and-swap operation safely. |
|
| 427 |
+Bolt supports fully serializable ACID transactions. |
|
| 428 |
+ |
|
| 429 |
+ |
|
| 430 |
+### LMDB |
|
| 378 | 431 |
|
| 379 | 432 |
Bolt was originally a port of LMDB so it is architecturally similar. Both use |
| 380 |
-a B+tree, have ACID semanetics with fully serializable transactions, and support |
|
| 433 |
+a B+tree, have ACID semantics with fully serializable transactions, and support |
|
| 381 | 434 |
lock-free MVCC using a single writer and multiple readers. |
| 382 | 435 |
|
| 383 | 436 |
The two projects have somewhat diverged. LMDB heavily focuses on raw performance |
| 384 | 437 |
while Bolt has focused on simplicity and ease of use. For example, LMDB allows |
| 385 |
-several unsafe actions such as direct writes and append writes for the sake of |
|
| 386 |
-performance. Bolt opts to disallow actions which can leave the database in a |
|
| 387 |
-corrupted state. The only exception to this in Bolt is `DB.NoSync`. |
|
| 438 |
+several unsafe actions such as direct writes for the sake of performance. Bolt |
|
| 439 |
+opts to disallow actions which can leave the database in a corrupted state. The |
|
| 440 |
+only exception to this in Bolt is `DB.NoSync`. |
|
| 441 |
+ |
|
| 442 |
+There are also a few differences in API. LMDB requires a maximum mmap size when |
|
| 443 |
+opening an `mdb_env` whereas Bolt will handle incremental mmap resizing |
|
| 444 |
+automatically. LMDB overloads the getter and setter functions with multiple |
|
| 445 |
+flags whereas Bolt splits these specialized cases into their own functions. |
|
| 388 | 446 |
|
| 389 | 447 |
|
| 390 | 448 |
## Caveats & Limitations |
| ... | ... |
@@ -425,14 +612,33 @@ Here are a few things to note when evaluating and using Bolt: |
| 425 | 425 |
can in memory and will release memory as needed to other processes. This means |
| 426 | 426 |
that Bolt can show very high memory usage when working with large databases. |
| 427 | 427 |
However, this is expected and the OS will release memory as needed. Bolt can |
| 428 |
- handle databases much larger than the available physical RAM. |
|
| 428 |
+ handle databases much larger than the available physical RAM, provided its |
|
| 429 |
+ memory-map fits in the process virtual address space. It may be problematic |
|
| 430 |
+ on 32-bits systems. |
|
| 431 |
+ |
|
| 432 |
+* The data structures in the Bolt database are memory mapped so the data file |
|
| 433 |
+ will be endian specific. This means that you cannot copy a Bolt file from a |
|
| 434 |
+ little endian machine to a big endian machine and have it work. For most |
|
| 435 |
+ users this is not a concern since most modern CPUs are little endian. |
|
| 436 |
+ |
|
| 437 |
+* Because of the way pages are laid out on disk, Bolt cannot truncate data files |
|
| 438 |
+ and return free pages back to the disk. Instead, Bolt maintains a free list |
|
| 439 |
+ of unused pages within its data file. These free pages can be reused by later |
|
| 440 |
+ transactions. This works well for many use cases as databases generally tend |
|
| 441 |
+ to grow. However, it's important to note that deleting large chunks of data |
|
| 442 |
+ will not allow you to reclaim that space on disk. |
|
| 443 |
+ |
|
| 444 |
+ For more information on page allocation, [see this comment][page-allocation]. |
|
| 445 |
+ |
|
| 446 |
+[page-allocation]: https://github.com/boltdb/bolt/issues/308#issuecomment-74811638 |
|
| 429 | 447 |
|
| 430 | 448 |
|
| 431 | 449 |
## Other Projects Using Bolt |
| 432 | 450 |
|
| 433 | 451 |
Below is a list of public, open source projects that use Bolt: |
| 434 | 452 |
|
| 435 |
-* [Bazil](https://github.com/bazillion/bazil) - A file system that lets your data reside where it is most convenient for it to reside. |
|
| 453 |
+* [Operation Go: A Routine Mission](http://gocode.io) - An online programming game for Golang using Bolt for user accounts and a leaderboard. |
|
| 454 |
+* [Bazil](https://bazil.org/) - A file system that lets your data reside where it is most convenient for it to reside. |
|
| 436 | 455 |
* [DVID](https://github.com/janelia-flyem/dvid) - Added Bolt as optional storage engine and testing it against Basho-tuned leveldb. |
| 437 | 456 |
* [Skybox Analytics](https://github.com/skybox/skybox) - A standalone funnel analysis tool for web analytics. |
| 438 | 457 |
* [Scuttlebutt](https://github.com/benbjohnson/scuttlebutt) - Uses Bolt to store and process all Twitter mentions of GitHub projects. |
| ... | ... |
@@ -450,6 +656,16 @@ Below is a list of public, open source projects that use Bolt: |
| 450 | 450 |
* [bleve](http://www.blevesearch.com/) - A pure Go search engine similar to ElasticSearch that uses Bolt as the default storage backend. |
| 451 | 451 |
* [tentacool](https://github.com/optiflows/tentacool) - REST api server to manage system stuff (IP, DNS, Gateway...) on a linux server. |
| 452 | 452 |
* [SkyDB](https://github.com/skydb/sky) - Behavioral analytics database. |
| 453 |
+* [Seaweed File System](https://github.com/chrislusf/weed-fs) - Highly scalable distributed key~file system with O(1) disk read. |
|
| 454 |
+* [InfluxDB](http://influxdb.com) - Scalable datastore for metrics, events, and real-time analytics. |
|
| 455 |
+* [Freehold](http://tshannon.bitbucket.org/freehold/) - An open, secure, and lightweight platform for your files and data. |
|
| 456 |
+* [Prometheus Annotation Server](https://github.com/oliver006/prom_annotation_server) - Annotation server for PromDash & Prometheus service monitoring system. |
|
| 457 |
+* [Consul](https://github.com/hashicorp/consul) - Consul is service discovery and configuration made easy. Distributed, highly available, and datacenter-aware. |
|
| 458 |
+* [Kala](https://github.com/ajvb/kala) - Kala is a modern job scheduler optimized to run on a single node. It is persistent, JSON over HTTP API, ISO 8601 duration notation, and dependent jobs. |
|
| 459 |
+* [drive](https://github.com/odeke-em/drive) - drive is an unofficial Google Drive command line client for \*NIX operating systems. |
|
| 460 |
+* [stow](https://github.com/djherbis/stow) - a persistence manager for objects |
|
| 461 |
+ backed by boltdb. |
|
| 462 |
+* [buckets](https://github.com/joyrexus/buckets) - a bolt wrapper streamlining |
|
| 463 |
+ simple tx and key scans. |
|
| 453 | 464 |
|
| 454 | 465 |
If you are using Bolt in a project please send a pull request to add it to the list. |
| 455 |
- |
| 456 | 466 |
new file mode 100644 |
| ... | ... |
@@ -0,0 +1,138 @@ |
| 0 |
+package bolt |
|
| 1 |
+ |
|
| 2 |
+import ( |
|
| 3 |
+ "errors" |
|
| 4 |
+ "fmt" |
|
| 5 |
+ "sync" |
|
| 6 |
+ "time" |
|
| 7 |
+) |
|
| 8 |
+ |
|
| 9 |
+// Batch calls fn as part of a batch. It behaves similar to Update, |
|
| 10 |
+// except: |
|
| 11 |
+// |
|
| 12 |
+// 1. concurrent Batch calls can be combined into a single Bolt |
|
| 13 |
+// transaction. |
|
| 14 |
+// |
|
| 15 |
+// 2. the function passed to Batch may be called multiple times, |
|
| 16 |
+// regardless of whether it returns error or not. |
|
| 17 |
+// |
|
| 18 |
+// This means that Batch function side effects must be idempotent and |
|
| 19 |
+// take permanent effect only after a successful return is seen in |
|
| 20 |
+// caller. |
|
| 21 |
+// |
|
| 22 |
+// The maximum batch size and delay can be adjusted with DB.MaxBatchSize |
|
| 23 |
+// and DB.MaxBatchDelay, respectively. |
|
| 24 |
+// |
|
| 25 |
+// Batch is only useful when there are multiple goroutines calling it. |
|
| 26 |
+func (db *DB) Batch(fn func(*Tx) error) error {
|
|
| 27 |
+ errCh := make(chan error, 1) |
|
| 28 |
+ |
|
| 29 |
+ db.batchMu.Lock() |
|
| 30 |
+ if (db.batch == nil) || (db.batch != nil && len(db.batch.calls) >= db.MaxBatchSize) {
|
|
| 31 |
+ // There is no existing batch, or the existing batch is full; start a new one. |
|
| 32 |
+ db.batch = &batch{
|
|
| 33 |
+ db: db, |
|
| 34 |
+ } |
|
| 35 |
+ db.batch.timer = time.AfterFunc(db.MaxBatchDelay, db.batch.trigger) |
|
| 36 |
+ } |
|
| 37 |
+ db.batch.calls = append(db.batch.calls, call{fn: fn, err: errCh})
|
|
| 38 |
+ if len(db.batch.calls) >= db.MaxBatchSize {
|
|
| 39 |
+ // wake up batch, it's ready to run |
|
| 40 |
+ go db.batch.trigger() |
|
| 41 |
+ } |
|
| 42 |
+ db.batchMu.Unlock() |
|
| 43 |
+ |
|
| 44 |
+ err := <-errCh |
|
| 45 |
+ if err == trySolo {
|
|
| 46 |
+ err = db.Update(fn) |
|
| 47 |
+ } |
|
| 48 |
+ return err |
|
| 49 |
+} |
|
| 50 |
+ |
|
| 51 |
+type call struct {
|
|
| 52 |
+ fn func(*Tx) error |
|
| 53 |
+ err chan<- error |
|
| 54 |
+} |
|
| 55 |
+ |
|
| 56 |
+type batch struct {
|
|
| 57 |
+ db *DB |
|
| 58 |
+ timer *time.Timer |
|
| 59 |
+ start sync.Once |
|
| 60 |
+ calls []call |
|
| 61 |
+} |
|
| 62 |
+ |
|
| 63 |
+// trigger runs the batch if it hasn't already been run. |
|
| 64 |
+func (b *batch) trigger() {
|
|
| 65 |
+ b.start.Do(b.run) |
|
| 66 |
+} |
|
| 67 |
+ |
|
| 68 |
+// run performs the transactions in the batch and communicates results |
|
| 69 |
+// back to DB.Batch. |
|
| 70 |
+func (b *batch) run() {
|
|
| 71 |
+ b.db.batchMu.Lock() |
|
| 72 |
+ b.timer.Stop() |
|
| 73 |
+ // Make sure no new work is added to this batch, but don't break |
|
| 74 |
+ // other batches. |
|
| 75 |
+ if b.db.batch == b {
|
|
| 76 |
+ b.db.batch = nil |
|
| 77 |
+ } |
|
| 78 |
+ b.db.batchMu.Unlock() |
|
| 79 |
+ |
|
| 80 |
+retry: |
|
| 81 |
+ for len(b.calls) > 0 {
|
|
| 82 |
+ var failIdx = -1 |
|
| 83 |
+ err := b.db.Update(func(tx *Tx) error {
|
|
| 84 |
+ for i, c := range b.calls {
|
|
| 85 |
+ if err := safelyCall(c.fn, tx); err != nil {
|
|
| 86 |
+ failIdx = i |
|
| 87 |
+ return err |
|
| 88 |
+ } |
|
| 89 |
+ } |
|
| 90 |
+ return nil |
|
| 91 |
+ }) |
|
| 92 |
+ |
|
| 93 |
+ if failIdx >= 0 {
|
|
| 94 |
+ // take the failing transaction out of the batch. it's |
|
| 95 |
+ // safe to shorten b.calls here because db.batch no longer |
|
| 96 |
+ // points to us, and we hold the mutex anyway. |
|
| 97 |
+ c := b.calls[failIdx] |
|
| 98 |
+ b.calls[failIdx], b.calls = b.calls[len(b.calls)-1], b.calls[:len(b.calls)-1] |
|
| 99 |
+ // tell the submitter re-run it solo, continue with the rest of the batch |
|
| 100 |
+ c.err <- trySolo |
|
| 101 |
+ continue retry |
|
| 102 |
+ } |
|
| 103 |
+ |
|
| 104 |
+ // pass success, or bolt internal errors, to all callers |
|
| 105 |
+ for _, c := range b.calls {
|
|
| 106 |
+ if c.err != nil {
|
|
| 107 |
+ c.err <- err |
|
| 108 |
+ } |
|
| 109 |
+ } |
|
| 110 |
+ break retry |
|
| 111 |
+ } |
|
| 112 |
+} |
|
| 113 |
+ |
|
| 114 |
+// trySolo is a special sentinel error value used for signaling that a |
|
| 115 |
+// transaction function should be re-run. It should never be seen by |
|
| 116 |
+// callers. |
|
| 117 |
+var trySolo = errors.New("batch function returned an error and should be re-run solo")
|
|
| 118 |
+ |
|
| 119 |
+type panicked struct {
|
|
| 120 |
+ reason interface{}
|
|
| 121 |
+} |
|
| 122 |
+ |
|
| 123 |
+func (p panicked) Error() string {
|
|
| 124 |
+ if err, ok := p.reason.(error); ok {
|
|
| 125 |
+ return err.Error() |
|
| 126 |
+ } |
|
| 127 |
+ return fmt.Sprintf("panic: %v", p.reason)
|
|
| 128 |
+} |
|
| 129 |
+ |
|
| 130 |
+func safelyCall(fn func(*Tx) error, tx *Tx) (err error) {
|
|
| 131 |
+ defer func() {
|
|
| 132 |
+ if p := recover(); p != nil {
|
|
| 133 |
+ err = panicked{p}
|
|
| 134 |
+ } |
|
| 135 |
+ }() |
|
| 136 |
+ return fn(tx) |
|
| 137 |
+} |
| 5 | 8 |
new file mode 100644 |
| ... | ... |
@@ -0,0 +1,9 @@ |
| 0 |
+// +build arm64 |
|
| 1 |
+ |
|
| 2 |
+package bolt |
|
| 3 |
+ |
|
| 4 |
+// maxMapSize represents the largest mmap size supported by Bolt. |
|
| 5 |
+const maxMapSize = 0xFFFFFFFFFFFF // 256TB |
|
| 6 |
+ |
|
| 7 |
+// maxAllocSize is the size used when creating array pointers. |
|
| 8 |
+const maxAllocSize = 0x7FFFFFFF |
| 0 | 9 |
new file mode 100644 |
| ... | ... |
@@ -0,0 +1,9 @@ |
| 0 |
+// +build ppc64le |
|
| 1 |
+ |
|
| 2 |
+package bolt |
|
| 3 |
+ |
|
| 4 |
+// maxMapSize represents the largest mmap size supported by Bolt. |
|
| 5 |
+const maxMapSize = 0xFFFFFFFFFFFF // 256TB |
|
| 6 |
+ |
|
| 7 |
+// maxAllocSize is the size used when creating array pointers. |
|
| 8 |
+const maxAllocSize = 0x7FFFFFFF |
| 0 | 9 |
new file mode 100644 |
| ... | ... |
@@ -0,0 +1,9 @@ |
| 0 |
+// +build s390x |
|
| 1 |
+ |
|
| 2 |
+package bolt |
|
| 3 |
+ |
|
| 4 |
+// maxMapSize represents the largest mmap size supported by Bolt. |
|
| 5 |
+const maxMapSize = 0xFFFFFFFFFFFF // 256TB |
|
| 6 |
+ |
|
| 7 |
+// maxAllocSize is the size used when creating array pointers. |
|
| 8 |
+const maxAllocSize = 0x7FFFFFFF |
| ... | ... |
@@ -1,8 +1,9 @@ |
| 1 |
-// +build !windows,!plan9 |
|
| 1 |
+// +build !windows,!plan9,!solaris |
|
| 2 | 2 |
|
| 3 | 3 |
package bolt |
| 4 | 4 |
|
| 5 | 5 |
import ( |
| 6 |
+ "fmt" |
|
| 6 | 7 |
"os" |
| 7 | 8 |
"syscall" |
| 8 | 9 |
"time" |
| ... | ... |
@@ -10,7 +11,7 @@ import ( |
| 10 | 10 |
) |
| 11 | 11 |
|
| 12 | 12 |
// flock acquires an advisory lock on a file descriptor. |
| 13 |
-func flock(f *os.File, timeout time.Duration) error {
|
|
| 13 |
+func flock(f *os.File, exclusive bool, timeout time.Duration) error {
|
|
| 14 | 14 |
var t time.Time |
| 15 | 15 |
for {
|
| 16 | 16 |
// If we're beyond our timeout then return an error. |
| ... | ... |
@@ -20,9 +21,13 @@ func flock(f *os.File, timeout time.Duration) error {
|
| 20 | 20 |
} else if timeout > 0 && time.Since(t) > timeout {
|
| 21 | 21 |
return ErrTimeout |
| 22 | 22 |
} |
| 23 |
+ flag := syscall.LOCK_SH |
|
| 24 |
+ if exclusive {
|
|
| 25 |
+ flag = syscall.LOCK_EX |
|
| 26 |
+ } |
|
| 23 | 27 |
|
| 24 | 28 |
// Otherwise attempt to obtain an exclusive lock. |
| 25 |
- err := syscall.Flock(int(f.Fd()), syscall.LOCK_EX|syscall.LOCK_NB) |
|
| 29 |
+ err := syscall.Flock(int(f.Fd()), flag|syscall.LOCK_NB) |
|
| 26 | 30 |
if err == nil {
|
| 27 | 31 |
return nil |
| 28 | 32 |
} else if err != syscall.EWOULDBLOCK {
|
| ... | ... |
@@ -41,11 +46,28 @@ func funlock(f *os.File) error {
|
| 41 | 41 |
|
| 42 | 42 |
// mmap memory maps a DB's data file. |
| 43 | 43 |
func mmap(db *DB, sz int) error {
|
| 44 |
+ // Truncate and fsync to ensure file size metadata is flushed. |
|
| 45 |
+ // https://github.com/boltdb/bolt/issues/284 |
|
| 46 |
+ if !db.NoGrowSync && !db.readOnly {
|
|
| 47 |
+ if err := db.file.Truncate(int64(sz)); err != nil {
|
|
| 48 |
+ return fmt.Errorf("file resize error: %s", err)
|
|
| 49 |
+ } |
|
| 50 |
+ if err := db.file.Sync(); err != nil {
|
|
| 51 |
+ return fmt.Errorf("file sync error: %s", err)
|
|
| 52 |
+ } |
|
| 53 |
+ } |
|
| 54 |
+ |
|
| 55 |
+ // Map the data file to memory. |
|
| 44 | 56 |
b, err := syscall.Mmap(int(db.file.Fd()), 0, sz, syscall.PROT_READ, syscall.MAP_SHARED) |
| 45 | 57 |
if err != nil {
|
| 46 | 58 |
return err |
| 47 | 59 |
} |
| 48 | 60 |
|
| 61 |
+ // Advise the kernel that the mmap is accessed randomly. |
|
| 62 |
+ if err := madvise(b, syscall.MADV_RANDOM); err != nil {
|
|
| 63 |
+ return fmt.Errorf("madvise: %s", err)
|
|
| 64 |
+ } |
|
| 65 |
+ |
|
| 49 | 66 |
// Save the original byte slice and convert to a byte array pointer. |
| 50 | 67 |
db.dataref = b |
| 51 | 68 |
db.data = (*[maxMapSize]byte)(unsafe.Pointer(&b[0])) |
| ... | ... |
@@ -67,3 +89,12 @@ func munmap(db *DB) error {
|
| 67 | 67 |
db.datasz = 0 |
| 68 | 68 |
return err |
| 69 | 69 |
} |
| 70 |
+ |
|
| 71 |
+// NOTE: This function is copied from stdlib because it is not available on darwin. |
|
| 72 |
+func madvise(b []byte, advice int) (err error) {
|
|
| 73 |
+ _, _, e1 := syscall.Syscall(syscall.SYS_MADVISE, uintptr(unsafe.Pointer(&b[0])), uintptr(len(b)), uintptr(advice)) |
|
| 74 |
+ if e1 != 0 {
|
|
| 75 |
+ err = e1 |
|
| 76 |
+ } |
|
| 77 |
+ return |
|
| 78 |
+} |
| 70 | 79 |
new file mode 100644 |
| ... | ... |
@@ -0,0 +1,101 @@ |
| 0 |
+ |
|
| 1 |
+package bolt |
|
| 2 |
+ |
|
| 3 |
+import ( |
|
| 4 |
+ "fmt" |
|
| 5 |
+ "os" |
|
| 6 |
+ "syscall" |
|
| 7 |
+ "time" |
|
| 8 |
+ "unsafe" |
|
| 9 |
+ "golang.org/x/sys/unix" |
|
| 10 |
+) |
|
| 11 |
+ |
|
| 12 |
+// flock acquires an advisory lock on a file descriptor. |
|
| 13 |
+func flock(f *os.File, exclusive bool, timeout time.Duration) error {
|
|
| 14 |
+ var t time.Time |
|
| 15 |
+ for {
|
|
| 16 |
+ // If we're beyond our timeout then return an error. |
|
| 17 |
+ // This can only occur after we've attempted a flock once. |
|
| 18 |
+ if t.IsZero() {
|
|
| 19 |
+ t = time.Now() |
|
| 20 |
+ } else if timeout > 0 && time.Since(t) > timeout {
|
|
| 21 |
+ return ErrTimeout |
|
| 22 |
+ } |
|
| 23 |
+ var lock syscall.Flock_t |
|
| 24 |
+ lock.Start = 0 |
|
| 25 |
+ lock.Len = 0 |
|
| 26 |
+ lock.Pid = 0 |
|
| 27 |
+ lock.Whence = 0 |
|
| 28 |
+ lock.Pid = 0 |
|
| 29 |
+ if exclusive {
|
|
| 30 |
+ lock.Type = syscall.F_WRLCK |
|
| 31 |
+ } else {
|
|
| 32 |
+ lock.Type = syscall.F_RDLCK |
|
| 33 |
+ } |
|
| 34 |
+ err := syscall.FcntlFlock(f.Fd(), syscall.F_SETLK, &lock) |
|
| 35 |
+ if err == nil {
|
|
| 36 |
+ return nil |
|
| 37 |
+ } else if err != syscall.EAGAIN {
|
|
| 38 |
+ return err |
|
| 39 |
+ } |
|
| 40 |
+ |
|
| 41 |
+ // Wait for a bit and try again. |
|
| 42 |
+ time.Sleep(50 * time.Millisecond) |
|
| 43 |
+ } |
|
| 44 |
+} |
|
| 45 |
+ |
|
| 46 |
+// funlock releases an advisory lock on a file descriptor. |
|
| 47 |
+func funlock(f *os.File) error {
|
|
| 48 |
+ var lock syscall.Flock_t |
|
| 49 |
+ lock.Start = 0 |
|
| 50 |
+ lock.Len = 0 |
|
| 51 |
+ lock.Type = syscall.F_UNLCK |
|
| 52 |
+ lock.Whence = 0 |
|
| 53 |
+ return syscall.FcntlFlock(uintptr(f.Fd()), syscall.F_SETLK, &lock) |
|
| 54 |
+} |
|
| 55 |
+ |
|
| 56 |
+// mmap memory maps a DB's data file. |
|
| 57 |
+func mmap(db *DB, sz int) error {
|
|
| 58 |
+ // Truncate and fsync to ensure file size metadata is flushed. |
|
| 59 |
+ // https://github.com/boltdb/bolt/issues/284 |
|
| 60 |
+ if !db.NoGrowSync && !db.readOnly {
|
|
| 61 |
+ if err := db.file.Truncate(int64(sz)); err != nil {
|
|
| 62 |
+ return fmt.Errorf("file resize error: %s", err)
|
|
| 63 |
+ } |
|
| 64 |
+ if err := db.file.Sync(); err != nil {
|
|
| 65 |
+ return fmt.Errorf("file sync error: %s", err)
|
|
| 66 |
+ } |
|
| 67 |
+ } |
|
| 68 |
+ |
|
| 69 |
+ // Map the data file to memory. |
|
| 70 |
+ b, err := unix.Mmap(int(db.file.Fd()), 0, sz, syscall.PROT_READ, syscall.MAP_SHARED) |
|
| 71 |
+ if err != nil {
|
|
| 72 |
+ return err |
|
| 73 |
+ } |
|
| 74 |
+ |
|
| 75 |
+ // Advise the kernel that the mmap is accessed randomly. |
|
| 76 |
+ if err := unix.Madvise(b, syscall.MADV_RANDOM); err != nil {
|
|
| 77 |
+ return fmt.Errorf("madvise: %s", err)
|
|
| 78 |
+ } |
|
| 79 |
+ |
|
| 80 |
+ // Save the original byte slice and convert to a byte array pointer. |
|
| 81 |
+ db.dataref = b |
|
| 82 |
+ db.data = (*[maxMapSize]byte)(unsafe.Pointer(&b[0])) |
|
| 83 |
+ db.datasz = sz |
|
| 84 |
+ return nil |
|
| 85 |
+} |
|
| 86 |
+ |
|
| 87 |
+// munmap unmaps a DB's data file from memory. |
|
| 88 |
+func munmap(db *DB) error {
|
|
| 89 |
+ // Ignore the unmap if we have no mapped data. |
|
| 90 |
+ if db.dataref == nil {
|
|
| 91 |
+ return nil |
|
| 92 |
+ } |
|
| 93 |
+ |
|
| 94 |
+ // Unmap using the original byte slice. |
|
| 95 |
+ err := unix.Munmap(db.dataref) |
|
| 96 |
+ db.dataref = nil |
|
| 97 |
+ db.data = nil |
|
| 98 |
+ db.datasz = 0 |
|
| 99 |
+ return err |
|
| 100 |
+} |
| ... | ... |
@@ -16,7 +16,7 @@ func fdatasync(db *DB) error {
|
| 16 | 16 |
} |
| 17 | 17 |
|
| 18 | 18 |
// flock acquires an advisory lock on a file descriptor. |
| 19 |
-func flock(f *os.File, _ time.Duration) error {
|
|
| 19 |
+func flock(f *os.File, _ bool, _ time.Duration) error {
|
|
| 20 | 20 |
return nil |
| 21 | 21 |
} |
| 22 | 22 |
|
| ... | ... |
@@ -28,9 +28,11 @@ func funlock(f *os.File) error {
|
| 28 | 28 |
// mmap memory maps a DB's data file. |
| 29 | 29 |
// Based on: https://github.com/edsrzf/mmap-go |
| 30 | 30 |
func mmap(db *DB, sz int) error {
|
| 31 |
- // Truncate the database to the size of the mmap. |
|
| 32 |
- if err := db.file.Truncate(int64(sz)); err != nil {
|
|
| 33 |
- return fmt.Errorf("truncate: %s", err)
|
|
| 31 |
+ if !db.readOnly {
|
|
| 32 |
+ // Truncate the database to the size of the mmap. |
|
| 33 |
+ if err := db.file.Truncate(int64(sz)); err != nil {
|
|
| 34 |
+ return fmt.Errorf("truncate: %s", err)
|
|
| 35 |
+ } |
|
| 34 | 36 |
} |
| 35 | 37 |
|
| 36 | 38 |
// Open a file mapping handle. |
| ... | ... |
@@ -99,6 +99,7 @@ func (b *Bucket) Cursor() *Cursor {
|
| 99 | 99 |
|
| 100 | 100 |
// Bucket retrieves a nested bucket by name. |
| 101 | 101 |
// Returns nil if the bucket does not exist. |
| 102 |
+// The bucket instance is only valid for the lifetime of the transaction. |
|
| 102 | 103 |
func (b *Bucket) Bucket(name []byte) *Bucket {
|
| 103 | 104 |
if b.buckets != nil {
|
| 104 | 105 |
if child := b.buckets[string(name)]; child != nil {
|
| ... | ... |
@@ -148,6 +149,7 @@ func (b *Bucket) openBucket(value []byte) *Bucket {
|
| 148 | 148 |
|
| 149 | 149 |
// CreateBucket creates a new bucket at the given key and returns the new bucket. |
| 150 | 150 |
// Returns an error if the key already exists, if the bucket name is blank, or if the bucket name is too long. |
| 151 |
+// The bucket instance is only valid for the lifetime of the transaction. |
|
| 151 | 152 |
func (b *Bucket) CreateBucket(key []byte) (*Bucket, error) {
|
| 152 | 153 |
if b.tx.db == nil {
|
| 153 | 154 |
return nil, ErrTxClosed |
| ... | ... |
@@ -192,6 +194,7 @@ func (b *Bucket) CreateBucket(key []byte) (*Bucket, error) {
|
| 192 | 192 |
|
| 193 | 193 |
// CreateBucketIfNotExists creates a new bucket if it doesn't already exist and returns a reference to it. |
| 194 | 194 |
// Returns an error if the bucket name is blank, or if the bucket name is too long. |
| 195 |
+// The bucket instance is only valid for the lifetime of the transaction. |
|
| 195 | 196 |
func (b *Bucket) CreateBucketIfNotExists(key []byte) (*Bucket, error) {
|
| 196 | 197 |
child, err := b.CreateBucket(key) |
| 197 | 198 |
if err == ErrBucketExists {
|
| ... | ... |
@@ -252,6 +255,7 @@ func (b *Bucket) DeleteBucket(key []byte) error {
|
| 252 | 252 |
|
| 253 | 253 |
// Get retrieves the value for a key in the bucket. |
| 254 | 254 |
// Returns a nil value if the key does not exist or if the key is a nested bucket. |
| 255 |
+// The returned value is only valid for the life of the transaction. |
|
| 255 | 256 |
func (b *Bucket) Get(key []byte) []byte {
|
| 256 | 257 |
k, v, flags := b.Cursor().seek(key) |
| 257 | 258 |
|
| ... | ... |
@@ -332,6 +336,12 @@ func (b *Bucket) NextSequence() (uint64, error) {
|
| 332 | 332 |
return 0, ErrTxNotWritable |
| 333 | 333 |
} |
| 334 | 334 |
|
| 335 |
+ // Materialize the root node if it hasn't been already so that the |
|
| 336 |
+ // bucket will be saved during commit. |
|
| 337 |
+ if b.rootNode == nil {
|
|
| 338 |
+ _ = b.node(b.root, nil) |
|
| 339 |
+ } |
|
| 340 |
+ |
|
| 335 | 341 |
// Increment and return the sequence. |
| 336 | 342 |
b.bucket.sequence++ |
| 337 | 343 |
return b.bucket.sequence, nil |
| ... | ... |
@@ -339,7 +349,8 @@ func (b *Bucket) NextSequence() (uint64, error) {
|
| 339 | 339 |
|
| 340 | 340 |
// ForEach executes a function for each key/value pair in a bucket. |
| 341 | 341 |
// If the provided function returns an error then the iteration is stopped and |
| 342 |
-// the error is returned to the caller. |
|
| 342 |
+// the error is returned to the caller. The provided function must not modify |
|
| 343 |
+// the bucket; this will result in undefined behavior. |
|
| 343 | 344 |
func (b *Bucket) ForEach(fn func(k, v []byte) error) error {
|
| 344 | 345 |
if b.tx.db == nil {
|
| 345 | 346 |
return ErrTxClosed |
| ... | ... |
@@ -511,8 +522,12 @@ func (b *Bucket) spill() error {
|
| 511 | 511 |
// Update parent node. |
| 512 | 512 |
var c = b.Cursor() |
| 513 | 513 |
k, _, flags := c.seek([]byte(name)) |
| 514 |
- _assert(bytes.Equal([]byte(name), k), "misplaced bucket header: %x -> %x", []byte(name), k) |
|
| 515 |
- _assert(flags&bucketLeafFlag != 0, "unexpected bucket header flag: %x", flags) |
|
| 514 |
+ if !bytes.Equal([]byte(name), k) {
|
|
| 515 |
+ panic(fmt.Sprintf("misplaced bucket header: %x -> %x", []byte(name), k))
|
|
| 516 |
+ } |
|
| 517 |
+ if flags&bucketLeafFlag == 0 {
|
|
| 518 |
+ panic(fmt.Sprintf("unexpected bucket header flag: %x", flags))
|
|
| 519 |
+ } |
|
| 516 | 520 |
c.node().put([]byte(name), []byte(name), value, 0, bucketLeafFlag) |
| 517 | 521 |
} |
| 518 | 522 |
|
| ... | ... |
@@ -528,7 +543,9 @@ func (b *Bucket) spill() error {
|
| 528 | 528 |
b.rootNode = b.rootNode.root() |
| 529 | 529 |
|
| 530 | 530 |
// Update the root node for this bucket. |
| 531 |
- _assert(b.rootNode.pgid < b.tx.meta.pgid, "pgid (%d) above high water mark (%d)", b.rootNode.pgid, b.tx.meta.pgid) |
|
| 531 |
+ if b.rootNode.pgid >= b.tx.meta.pgid {
|
|
| 532 |
+ panic(fmt.Sprintf("pgid (%d) above high water mark (%d)", b.rootNode.pgid, b.tx.meta.pgid))
|
|
| 533 |
+ } |
|
| 532 | 534 |
b.root = b.rootNode.pgid |
| 533 | 535 |
|
| 534 | 536 |
return nil |
| ... | ... |
@@ -659,7 +676,9 @@ func (b *Bucket) pageNode(id pgid) (*page, *node) {
|
| 659 | 659 |
// Inline buckets have a fake page embedded in their value so treat them |
| 660 | 660 |
// differently. We'll return the rootNode (if available) or the fake page. |
| 661 | 661 |
if b.root == 0 {
|
| 662 |
- _assert(id == 0, "inline bucket non-zero page access(2): %d != 0", id) |
|
| 662 |
+ if id != 0 {
|
|
| 663 |
+ panic(fmt.Sprintf("inline bucket non-zero page access(2): %d != 0", id))
|
|
| 664 |
+ } |
|
| 663 | 665 |
if b.rootNode != nil {
|
| 664 | 666 |
return nil, b.rootNode |
| 665 | 667 |
} |
| ... | ... |
@@ -2,6 +2,7 @@ package bolt |
| 2 | 2 |
|
| 3 | 3 |
import ( |
| 4 | 4 |
"bytes" |
| 5 |
+ "fmt" |
|
| 5 | 6 |
"sort" |
| 6 | 7 |
) |
| 7 | 8 |
|
| ... | ... |
@@ -9,6 +10,8 @@ import ( |
| 9 | 9 |
// Cursors see nested buckets with value == nil. |
| 10 | 10 |
// Cursors can be obtained from a transaction and are valid as long as the transaction is open. |
| 11 | 11 |
// |
| 12 |
+// Keys and values returned from the cursor are only valid for the life of the transaction. |
|
| 13 |
+// |
|
| 12 | 14 |
// Changing data while traversing with a cursor may cause it to be invalidated |
| 13 | 15 |
// and return unexpected keys and/or values. You must reposition your cursor |
| 14 | 16 |
// after mutating data. |
| ... | ... |
@@ -24,6 +27,7 @@ func (c *Cursor) Bucket() *Bucket {
|
| 24 | 24 |
|
| 25 | 25 |
// First moves the cursor to the first item in the bucket and returns its key and value. |
| 26 | 26 |
// If the bucket is empty then a nil key and value are returned. |
| 27 |
+// The returned key and value are only valid for the life of the transaction. |
|
| 27 | 28 |
func (c *Cursor) First() (key []byte, value []byte) {
|
| 28 | 29 |
_assert(c.bucket.tx.db != nil, "tx closed") |
| 29 | 30 |
c.stack = c.stack[:0] |
| ... | ... |
@@ -40,6 +44,7 @@ func (c *Cursor) First() (key []byte, value []byte) {
|
| 40 | 40 |
|
| 41 | 41 |
// Last moves the cursor to the last item in the bucket and returns its key and value. |
| 42 | 42 |
// If the bucket is empty then a nil key and value are returned. |
| 43 |
+// The returned key and value are only valid for the life of the transaction. |
|
| 43 | 44 |
func (c *Cursor) Last() (key []byte, value []byte) {
|
| 44 | 45 |
_assert(c.bucket.tx.db != nil, "tx closed") |
| 45 | 46 |
c.stack = c.stack[:0] |
| ... | ... |
@@ -57,6 +62,7 @@ func (c *Cursor) Last() (key []byte, value []byte) {
|
| 57 | 57 |
|
| 58 | 58 |
// Next moves the cursor to the next item in the bucket and returns its key and value. |
| 59 | 59 |
// If the cursor is at the end of the bucket then a nil key and value are returned. |
| 60 |
+// The returned key and value are only valid for the life of the transaction. |
|
| 60 | 61 |
func (c *Cursor) Next() (key []byte, value []byte) {
|
| 61 | 62 |
_assert(c.bucket.tx.db != nil, "tx closed") |
| 62 | 63 |
k, v, flags := c.next() |
| ... | ... |
@@ -68,6 +74,7 @@ func (c *Cursor) Next() (key []byte, value []byte) {
|
| 68 | 68 |
|
| 69 | 69 |
// Prev moves the cursor to the previous item in the bucket and returns its key and value. |
| 70 | 70 |
// If the cursor is at the beginning of the bucket then a nil key and value are returned. |
| 71 |
+// The returned key and value are only valid for the life of the transaction. |
|
| 71 | 72 |
func (c *Cursor) Prev() (key []byte, value []byte) {
|
| 72 | 73 |
_assert(c.bucket.tx.db != nil, "tx closed") |
| 73 | 74 |
|
| ... | ... |
@@ -99,6 +106,7 @@ func (c *Cursor) Prev() (key []byte, value []byte) {
|
| 99 | 99 |
// Seek moves the cursor to a given key and returns it. |
| 100 | 100 |
// If the key does not exist then the next key is used. If no keys |
| 101 | 101 |
// follow, a nil key is returned. |
| 102 |
+// The returned key and value are only valid for the life of the transaction. |
|
| 102 | 103 |
func (c *Cursor) Seek(seek []byte) (key []byte, value []byte) {
|
| 103 | 104 |
k, v, flags := c.seek(seek) |
| 104 | 105 |
|
| ... | ... |
@@ -228,8 +236,8 @@ func (c *Cursor) next() (key []byte, value []byte, flags uint32) {
|
| 228 | 228 |
// search recursively performs a binary search against a given page/node until it finds a given key. |
| 229 | 229 |
func (c *Cursor) search(key []byte, pgid pgid) {
|
| 230 | 230 |
p, n := c.bucket.pageNode(pgid) |
| 231 |
- if p != nil {
|
|
| 232 |
- _assert((p.flags&(branchPageFlag|leafPageFlag)) != 0, "invalid page type: %d: %x", p.id, p.flags) |
|
| 231 |
+ if p != nil && (p.flags&(branchPageFlag|leafPageFlag)) == 0 {
|
|
| 232 |
+ panic(fmt.Sprintf("invalid page type: %d: %x", p.id, p.flags))
|
|
| 233 | 233 |
} |
| 234 | 234 |
e := elemRef{page: p, node: n}
|
| 235 | 235 |
c.stack = append(c.stack, e) |
| ... | ... |
@@ -12,9 +12,6 @@ import ( |
| 12 | 12 |
"unsafe" |
| 13 | 13 |
) |
| 14 | 14 |
|
| 15 |
-// The smallest size that the mmap can be. |
|
| 16 |
-const minMmapSize = 1 << 22 // 4MB |
|
| 17 |
- |
|
| 18 | 15 |
// The largest step that can be taken when remapping the mmap. |
| 19 | 16 |
const maxMmapStep = 1 << 30 // 1GB |
| 20 | 17 |
|
| ... | ... |
@@ -30,6 +27,12 @@ const magic uint32 = 0xED0CDAED |
| 30 | 30 |
// must be synchronzied using the msync(2) syscall. |
| 31 | 31 |
const IgnoreNoSync = runtime.GOOS == "openbsd" |
| 32 | 32 |
|
| 33 |
+// Default values if not set in a DB instance. |
|
| 34 |
+const ( |
|
| 35 |
+ DefaultMaxBatchSize int = 1000 |
|
| 36 |
+ DefaultMaxBatchDelay = 10 * time.Millisecond |
|
| 37 |
+) |
|
| 38 |
+ |
|
| 33 | 39 |
// DB represents a collection of buckets persisted to a file on disk. |
| 34 | 40 |
// All data access is performed through transactions which can be obtained through the DB. |
| 35 | 41 |
// All the functions on DB will return a ErrDatabaseNotOpen if accessed before Open() is called. |
| ... | ... |
@@ -52,9 +55,33 @@ type DB struct {
|
| 52 | 52 |
// THIS IS UNSAFE. PLEASE USE WITH CAUTION. |
| 53 | 53 |
NoSync bool |
| 54 | 54 |
|
| 55 |
+ // When true, skips the truncate call when growing the database. |
|
| 56 |
+ // Setting this to true is only safe on non-ext3/ext4 systems. |
|
| 57 |
+ // Skipping truncation avoids preallocation of hard drive space and |
|
| 58 |
+ // bypasses a truncate() and fsync() syscall on remapping. |
|
| 59 |
+ // |
|
| 60 |
+ // https://github.com/boltdb/bolt/issues/284 |
|
| 61 |
+ NoGrowSync bool |
|
| 62 |
+ |
|
| 63 |
+ // MaxBatchSize is the maximum size of a batch. Default value is |
|
| 64 |
+ // copied from DefaultMaxBatchSize in Open. |
|
| 65 |
+ // |
|
| 66 |
+ // If <=0, disables batching. |
|
| 67 |
+ // |
|
| 68 |
+ // Do not change concurrently with calls to Batch. |
|
| 69 |
+ MaxBatchSize int |
|
| 70 |
+ |
|
| 71 |
+ // MaxBatchDelay is the maximum delay before a batch starts. |
|
| 72 |
+ // Default value is copied from DefaultMaxBatchDelay in Open. |
|
| 73 |
+ // |
|
| 74 |
+ // If <=0, effectively disables batching. |
|
| 75 |
+ // |
|
| 76 |
+ // Do not change concurrently with calls to Batch. |
|
| 77 |
+ MaxBatchDelay time.Duration |
|
| 78 |
+ |
|
| 55 | 79 |
path string |
| 56 | 80 |
file *os.File |
| 57 |
- dataref []byte |
|
| 81 |
+ dataref []byte // mmap'ed readonly, write throws SEGV |
|
| 58 | 82 |
data *[maxMapSize]byte |
| 59 | 83 |
datasz int |
| 60 | 84 |
meta0 *meta |
| ... | ... |
@@ -66,6 +93,9 @@ type DB struct {
|
| 66 | 66 |
freelist *freelist |
| 67 | 67 |
stats Stats |
| 68 | 68 |
|
| 69 |
+ batchMu sync.Mutex |
|
| 70 |
+ batch *batch |
|
| 71 |
+ |
|
| 69 | 72 |
rwlock sync.Mutex // Allows only one writer at a time. |
| 70 | 73 |
metalock sync.Mutex // Protects meta page access. |
| 71 | 74 |
mmaplock sync.RWMutex // Protects mmap access during remapping. |
| ... | ... |
@@ -74,6 +104,10 @@ type DB struct {
|
| 74 | 74 |
ops struct {
|
| 75 | 75 |
writeAt func(b []byte, off int64) (n int, err error) |
| 76 | 76 |
} |
| 77 |
+ |
|
| 78 |
+ // Read only mode. |
|
| 79 |
+ // When true, Update() and Begin(true) return ErrDatabaseReadOnly immediately. |
|
| 80 |
+ readOnly bool |
|
| 77 | 81 |
} |
| 78 | 82 |
|
| 79 | 83 |
// Path returns the path to currently open database file. |
| ... | ... |
@@ -101,20 +135,34 @@ func Open(path string, mode os.FileMode, options *Options) (*DB, error) {
|
| 101 | 101 |
if options == nil {
|
| 102 | 102 |
options = DefaultOptions |
| 103 | 103 |
} |
| 104 |
+ db.NoGrowSync = options.NoGrowSync |
|
| 105 |
+ |
|
| 106 |
+ // Set default values for later DB operations. |
|
| 107 |
+ db.MaxBatchSize = DefaultMaxBatchSize |
|
| 108 |
+ db.MaxBatchDelay = DefaultMaxBatchDelay |
|
| 109 |
+ |
|
| 110 |
+ flag := os.O_RDWR |
|
| 111 |
+ if options.ReadOnly {
|
|
| 112 |
+ flag = os.O_RDONLY |
|
| 113 |
+ db.readOnly = true |
|
| 114 |
+ } |
|
| 104 | 115 |
|
| 105 | 116 |
// Open data file and separate sync handler for metadata writes. |
| 106 | 117 |
db.path = path |
| 107 |
- |
|
| 108 | 118 |
var err error |
| 109 |
- if db.file, err = os.OpenFile(db.path, os.O_RDWR|os.O_CREATE, mode); err != nil {
|
|
| 119 |
+ if db.file, err = os.OpenFile(db.path, flag|os.O_CREATE, mode); err != nil {
|
|
| 110 | 120 |
_ = db.close() |
| 111 | 121 |
return nil, err |
| 112 | 122 |
} |
| 113 | 123 |
|
| 114 |
- // Lock file so that other processes using Bolt cannot use the database |
|
| 115 |
- // at the same time. This would cause corruption since the two processes |
|
| 116 |
- // would write meta pages and free pages separately. |
|
| 117 |
- if err := flock(db.file, options.Timeout); err != nil {
|
|
| 124 |
+ // Lock file so that other processes using Bolt in read-write mode cannot |
|
| 125 |
+ // use the database at the same time. This would cause corruption since |
|
| 126 |
+ // the two processes would write meta pages and free pages separately. |
|
| 127 |
+ // The database file is locked exclusively (only one process can grab the lock) |
|
| 128 |
+ // if !options.ReadOnly. |
|
| 129 |
+ // The database file is locked using the shared lock (more than one process may |
|
| 130 |
+ // hold a lock at the same time) otherwise (options.ReadOnly is set). |
|
| 131 |
+ if err := flock(db.file, !db.readOnly, options.Timeout); err != nil {
|
|
| 118 | 132 |
_ = db.close() |
| 119 | 133 |
return nil, err |
| 120 | 134 |
} |
| ... | ... |
@@ -162,16 +210,6 @@ func (db *DB) mmap(minsz int) error {
|
| 162 | 162 |
db.mmaplock.Lock() |
| 163 | 163 |
defer db.mmaplock.Unlock() |
| 164 | 164 |
|
| 165 |
- // Dereference all mmap references before unmapping. |
|
| 166 |
- if db.rwtx != nil {
|
|
| 167 |
- db.rwtx.root.dereference() |
|
| 168 |
- } |
|
| 169 |
- |
|
| 170 |
- // Unmap existing data before continuing. |
|
| 171 |
- if err := db.munmap(); err != nil {
|
|
| 172 |
- return err |
|
| 173 |
- } |
|
| 174 |
- |
|
| 175 | 165 |
info, err := db.file.Stat() |
| 176 | 166 |
if err != nil {
|
| 177 | 167 |
return fmt.Errorf("mmap stat error: %s", err)
|
| ... | ... |
@@ -184,7 +222,20 @@ func (db *DB) mmap(minsz int) error {
|
| 184 | 184 |
if size < minsz {
|
| 185 | 185 |
size = minsz |
| 186 | 186 |
} |
| 187 |
- size = db.mmapSize(size) |
|
| 187 |
+ size, err = db.mmapSize(size) |
|
| 188 |
+ if err != nil {
|
|
| 189 |
+ return err |
|
| 190 |
+ } |
|
| 191 |
+ |
|
| 192 |
+ // Dereference all mmap references before unmapping. |
|
| 193 |
+ if db.rwtx != nil {
|
|
| 194 |
+ db.rwtx.root.dereference() |
|
| 195 |
+ } |
|
| 196 |
+ |
|
| 197 |
+ // Unmap existing data before continuing. |
|
| 198 |
+ if err := db.munmap(); err != nil {
|
|
| 199 |
+ return err |
|
| 200 |
+ } |
|
| 188 | 201 |
|
| 189 | 202 |
// Memory-map the data file as a byte slice. |
| 190 | 203 |
if err := mmap(db, size); err != nil {
|
| ... | ... |
@@ -215,22 +266,40 @@ func (db *DB) munmap() error {
|
| 215 | 215 |
} |
| 216 | 216 |
|
| 217 | 217 |
// mmapSize determines the appropriate size for the mmap given the current size |
| 218 |
-// of the database. The minimum size is 4MB and doubles until it reaches 1GB. |
|
| 219 |
-func (db *DB) mmapSize(size int) int {
|
|
| 220 |
- if size <= minMmapSize {
|
|
| 221 |
- return minMmapSize |
|
| 222 |
- } else if size < maxMmapStep {
|
|
| 223 |
- size *= 2 |
|
| 224 |
- } else {
|
|
| 225 |
- size += maxMmapStep |
|
| 218 |
+// of the database. The minimum size is 1MB and doubles until it reaches 1GB. |
|
| 219 |
+// Returns an error if the new mmap size is greater than the max allowed. |
|
| 220 |
+func (db *DB) mmapSize(size int) (int, error) {
|
|
| 221 |
+ // Double the size from 32KB until 1GB. |
|
| 222 |
+ for i := uint(15); i <= 30; i++ {
|
|
| 223 |
+ if size <= 1<<i {
|
|
| 224 |
+ return 1 << i, nil |
|
| 225 |
+ } |
|
| 226 |
+ } |
|
| 227 |
+ |
|
| 228 |
+ // Verify the requested size is not above the maximum allowed. |
|
| 229 |
+ if size > maxMapSize {
|
|
| 230 |
+ return 0, fmt.Errorf("mmap too large")
|
|
| 231 |
+ } |
|
| 232 |
+ |
|
| 233 |
+ // If larger than 1GB then grow by 1GB at a time. |
|
| 234 |
+ sz := int64(size) |
|
| 235 |
+ if remainder := sz % int64(maxMmapStep); remainder > 0 {
|
|
| 236 |
+ sz += int64(maxMmapStep) - remainder |
|
| 226 | 237 |
} |
| 227 | 238 |
|
| 228 | 239 |
// Ensure that the mmap size is a multiple of the page size. |
| 229 |
- if (size % db.pageSize) != 0 {
|
|
| 230 |
- size = ((size / db.pageSize) + 1) * db.pageSize |
|
| 240 |
+ // This should always be true since we're incrementing in MBs. |
|
| 241 |
+ pageSize := int64(db.pageSize) |
|
| 242 |
+ if (sz % pageSize) != 0 {
|
|
| 243 |
+ sz = ((sz / pageSize) + 1) * pageSize |
|
| 244 |
+ } |
|
| 245 |
+ |
|
| 246 |
+ // If we've exceeded the max size then only grow up to the max size. |
|
| 247 |
+ if sz > maxMapSize {
|
|
| 248 |
+ sz = maxMapSize |
|
| 231 | 249 |
} |
| 232 | 250 |
|
| 233 |
- return size |
|
| 251 |
+ return int(sz), nil |
|
| 234 | 252 |
} |
| 235 | 253 |
|
| 236 | 254 |
// init creates a new database file and initializes its meta pages. |
| ... | ... |
@@ -250,7 +319,6 @@ func (db *DB) init() error {
|
| 250 | 250 |
m.magic = magic |
| 251 | 251 |
m.version = version |
| 252 | 252 |
m.pageSize = uint32(db.pageSize) |
| 253 |
- m.version = version |
|
| 254 | 253 |
m.freelist = 2 |
| 255 | 254 |
m.root = bucket{root: 3}
|
| 256 | 255 |
m.pgid = 4 |
| ... | ... |
@@ -283,8 +351,15 @@ func (db *DB) init() error {
|
| 283 | 283 |
// Close releases all database resources. |
| 284 | 284 |
// All transactions must be closed before closing the database. |
| 285 | 285 |
func (db *DB) Close() error {
|
| 286 |
+ db.rwlock.Lock() |
|
| 287 |
+ defer db.rwlock.Unlock() |
|
| 288 |
+ |
|
| 286 | 289 |
db.metalock.Lock() |
| 287 | 290 |
defer db.metalock.Unlock() |
| 291 |
+ |
|
| 292 |
+ db.mmaplock.RLock() |
|
| 293 |
+ defer db.mmaplock.RUnlock() |
|
| 294 |
+ |
|
| 288 | 295 |
return db.close() |
| 289 | 296 |
} |
| 290 | 297 |
|
| ... | ... |
@@ -304,8 +379,11 @@ func (db *DB) close() error {
|
| 304 | 304 |
|
| 305 | 305 |
// Close file handles. |
| 306 | 306 |
if db.file != nil {
|
| 307 |
- // Unlock the file. |
|
| 308 |
- _ = funlock(db.file) |
|
| 307 |
+ // No need to unlock read-only file. |
|
| 308 |
+ if !db.readOnly {
|
|
| 309 |
+ // Unlock the file. |
|
| 310 |
+ _ = funlock(db.file) |
|
| 311 |
+ } |
|
| 309 | 312 |
|
| 310 | 313 |
// Close the file descriptor. |
| 311 | 314 |
if err := db.file.Close(); err != nil {
|
| ... | ... |
@@ -323,6 +401,11 @@ func (db *DB) close() error {
|
| 323 | 323 |
// will cause the calls to block and be serialized until the current write |
| 324 | 324 |
// transaction finishes. |
| 325 | 325 |
// |
| 326 |
+// Transactions should not be depedent on one another. Opening a read |
|
| 327 |
+// transaction and a write transaction in the same goroutine can cause the |
|
| 328 |
+// writer to deadlock because the database periodically needs to re-mmap itself |
|
| 329 |
+// as it grows and it cannot do that while a read transaction is open. |
|
| 330 |
+// |
|
| 326 | 331 |
// IMPORTANT: You must close read-only transactions after you are finished or |
| 327 | 332 |
// else the database will not reclaim old pages. |
| 328 | 333 |
func (db *DB) Begin(writable bool) (*Tx, error) {
|
| ... | ... |
@@ -371,6 +454,11 @@ func (db *DB) beginTx() (*Tx, error) {
|
| 371 | 371 |
} |
| 372 | 372 |
|
| 373 | 373 |
func (db *DB) beginRWTx() (*Tx, error) {
|
| 374 |
+ // If the database was opened with Options.ReadOnly, return an error. |
|
| 375 |
+ if db.readOnly {
|
|
| 376 |
+ return nil, ErrDatabaseReadOnly |
|
| 377 |
+ } |
|
| 378 |
+ |
|
| 374 | 379 |
// Obtain writer lock. This is released by the transaction when it closes. |
| 375 | 380 |
// This enforces only one writer transaction at a time. |
| 376 | 381 |
db.rwlock.Lock() |
| ... | ... |
@@ -501,6 +589,12 @@ func (db *DB) View(fn func(*Tx) error) error {
|
| 501 | 501 |
return nil |
| 502 | 502 |
} |
| 503 | 503 |
|
| 504 |
+// Sync executes fdatasync() against the database file handle. |
|
| 505 |
+// |
|
| 506 |
+// This is not necessary under normal operation, however, if you use NoSync |
|
| 507 |
+// then it allows you to force the database file to sync against the disk. |
|
| 508 |
+func (db *DB) Sync() error { return fdatasync(db) }
|
|
| 509 |
+ |
|
| 504 | 510 |
// Stats retrieves ongoing performance stats for the database. |
| 505 | 511 |
// This is only updated when a transaction closes. |
| 506 | 512 |
func (db *DB) Stats() Stats {
|
| ... | ... |
@@ -561,18 +655,30 @@ func (db *DB) allocate(count int) (*page, error) {
|
| 561 | 561 |
return p, nil |
| 562 | 562 |
} |
| 563 | 563 |
|
| 564 |
+func (db *DB) IsReadOnly() bool {
|
|
| 565 |
+ return db.readOnly |
|
| 566 |
+} |
|
| 567 |
+ |
|
| 564 | 568 |
// Options represents the options that can be set when opening a database. |
| 565 | 569 |
type Options struct {
|
| 566 | 570 |
// Timeout is the amount of time to wait to obtain a file lock. |
| 567 | 571 |
// When set to zero it will wait indefinitely. This option is only |
| 568 | 572 |
// available on Darwin and Linux. |
| 569 | 573 |
Timeout time.Duration |
| 574 |
+ |
|
| 575 |
+ // Sets the DB.NoGrowSync flag before memory mapping the file. |
|
| 576 |
+ NoGrowSync bool |
|
| 577 |
+ |
|
| 578 |
+ // Open database in read-only mode. Uses flock(..., LOCK_SH |LOCK_NB) to |
|
| 579 |
+ // grab a shared lock (UNIX). |
|
| 580 |
+ ReadOnly bool |
|
| 570 | 581 |
} |
| 571 | 582 |
|
| 572 | 583 |
// DefaultOptions represent the options used if nil options are passed into Open(). |
| 573 | 584 |
// No timeout is used which will cause Bolt to wait indefinitely for a lock. |
| 574 | 585 |
var DefaultOptions = &Options{
|
| 575 |
- Timeout: 0, |
|
| 586 |
+ Timeout: 0, |
|
| 587 |
+ NoGrowSync: false, |
|
| 576 | 588 |
} |
| 577 | 589 |
|
| 578 | 590 |
// Stats represents statistics about the database. |
| ... | ... |
@@ -647,9 +753,11 @@ func (m *meta) copy(dest *meta) {
|
| 647 | 647 |
|
| 648 | 648 |
// write writes the meta onto a page. |
| 649 | 649 |
func (m *meta) write(p *page) {
|
| 650 |
- |
|
| 651 |
- _assert(m.root.root < m.pgid, "root bucket pgid (%d) above high water mark (%d)", m.root.root, m.pgid) |
|
| 652 |
- _assert(m.freelist < m.pgid, "freelist pgid (%d) above high water mark (%d)", m.freelist, m.pgid) |
|
| 650 |
+ if m.root.root >= m.pgid {
|
|
| 651 |
+ panic(fmt.Sprintf("root bucket pgid (%d) above high water mark (%d)", m.root.root, m.pgid))
|
|
| 652 |
+ } else if m.freelist >= m.pgid {
|
|
| 653 |
+ panic(fmt.Sprintf("freelist pgid (%d) above high water mark (%d)", m.freelist, m.pgid))
|
|
| 654 |
+ } |
|
| 653 | 655 |
|
| 654 | 656 |
// Page id is either going to be 0 or 1 which we can determine by the transaction ID. |
| 655 | 657 |
p.id = pgid(m.txid % 2) |
| ... | ... |
@@ -675,13 +783,8 @@ func _assert(condition bool, msg string, v ...interface{}) {
|
| 675 | 675 |
} |
| 676 | 676 |
} |
| 677 | 677 |
|
| 678 |
-func warn(v ...interface{}) {
|
|
| 679 |
- fmt.Fprintln(os.Stderr, v...) |
|
| 680 |
-} |
|
| 681 |
- |
|
| 682 |
-func warnf(msg string, v ...interface{}) {
|
|
| 683 |
- fmt.Fprintf(os.Stderr, msg+"\n", v...) |
|
| 684 |
-} |
|
| 678 |
+func warn(v ...interface{}) { fmt.Fprintln(os.Stderr, v...) }
|
|
| 679 |
+func warnf(msg string, v ...interface{}) { fmt.Fprintf(os.Stderr, msg+"\n", v...) }
|
|
| 685 | 680 |
|
| 686 | 681 |
func printstack() {
|
| 687 | 682 |
stack := strings.Join(strings.Split(string(debug.Stack()), "\n")[2:], "\n") |
| ... | ... |
@@ -36,6 +36,10 @@ var ( |
| 36 | 36 |
// ErrTxClosed is returned when committing or rolling back a transaction |
| 37 | 37 |
// that has already been committed or rolled back. |
| 38 | 38 |
ErrTxClosed = errors.New("tx closed")
|
| 39 |
+ |
|
| 40 |
+ // ErrDatabaseReadOnly is returned when a mutating transaction is started on a |
|
| 41 |
+ // read-only database. |
|
| 42 |
+ ErrDatabaseReadOnly = errors.New("database is in read-only mode")
|
|
| 39 | 43 |
) |
| 40 | 44 |
|
| 41 | 45 |
// These errors can occur when putting or deleting a value or a bucket. |
| ... | ... |
@@ -1,6 +1,7 @@ |
| 1 | 1 |
package bolt |
| 2 | 2 |
|
| 3 | 3 |
import ( |
| 4 |
+ "fmt" |
|
| 4 | 5 |
"sort" |
| 5 | 6 |
"unsafe" |
| 6 | 7 |
) |
| ... | ... |
@@ -47,15 +48,14 @@ func (f *freelist) pending_count() int {
|
| 47 | 47 |
|
| 48 | 48 |
// all returns a list of all free ids and all pending ids in one sorted list. |
| 49 | 49 |
func (f *freelist) all() []pgid {
|
| 50 |
- ids := make([]pgid, len(f.ids)) |
|
| 51 |
- copy(ids, f.ids) |
|
| 50 |
+ m := make(pgids, 0) |
|
| 52 | 51 |
|
| 53 | 52 |
for _, list := range f.pending {
|
| 54 |
- ids = append(ids, list...) |
|
| 53 |
+ m = append(m, list...) |
|
| 55 | 54 |
} |
| 56 | 55 |
|
| 57 |
- sort.Sort(pgids(ids)) |
|
| 58 |
- return ids |
|
| 56 |
+ sort.Sort(m) |
|
| 57 |
+ return pgids(f.ids).merge(m) |
|
| 59 | 58 |
} |
| 60 | 59 |
|
| 61 | 60 |
// allocate returns the starting page id of a contiguous list of pages of a given size. |
| ... | ... |
@@ -67,7 +67,9 @@ func (f *freelist) allocate(n int) pgid {
|
| 67 | 67 |
|
| 68 | 68 |
var initial, previd pgid |
| 69 | 69 |
for i, id := range f.ids {
|
| 70 |
- _assert(id > 1, "invalid page allocation: %d", id) |
|
| 70 |
+ if id <= 1 {
|
|
| 71 |
+ panic(fmt.Sprintf("invalid page allocation: %d", id))
|
|
| 72 |
+ } |
|
| 71 | 73 |
|
| 72 | 74 |
// Reset initial page if this is not contiguous. |
| 73 | 75 |
if previd == 0 || id-previd != 1 {
|
| ... | ... |
@@ -103,13 +105,17 @@ func (f *freelist) allocate(n int) pgid {
|
| 103 | 103 |
// free releases a page and its overflow for a given transaction id. |
| 104 | 104 |
// If the page is already free then a panic will occur. |
| 105 | 105 |
func (f *freelist) free(txid txid, p *page) {
|
| 106 |
- _assert(p.id > 1, "cannot free page 0 or 1: %d", p.id) |
|
| 106 |
+ if p.id <= 1 {
|
|
| 107 |
+ panic(fmt.Sprintf("cannot free page 0 or 1: %d", p.id))
|
|
| 108 |
+ } |
|
| 107 | 109 |
|
| 108 | 110 |
// Free page and all its overflow pages. |
| 109 | 111 |
var ids = f.pending[txid] |
| 110 | 112 |
for id := p.id; id <= p.id+pgid(p.overflow); id++ {
|
| 111 | 113 |
// Verify that page is not already free. |
| 112 |
- _assert(!f.cache[id], "page %d already freed", id) |
|
| 114 |
+ if f.cache[id] {
|
|
| 115 |
+ panic(fmt.Sprintf("page %d already freed", id))
|
|
| 116 |
+ } |
|
| 113 | 117 |
|
| 114 | 118 |
// Add to the freelist and cache. |
| 115 | 119 |
ids = append(ids, id) |
| ... | ... |
@@ -120,15 +126,17 @@ func (f *freelist) free(txid txid, p *page) {
|
| 120 | 120 |
|
| 121 | 121 |
// release moves all page ids for a transaction id (or older) to the freelist. |
| 122 | 122 |
func (f *freelist) release(txid txid) {
|
| 123 |
+ m := make(pgids, 0) |
|
| 123 | 124 |
for tid, ids := range f.pending {
|
| 124 | 125 |
if tid <= txid {
|
| 125 | 126 |
// Move transaction's pending pages to the available freelist. |
| 126 | 127 |
// Don't remove from the cache since the page is still free. |
| 127 |
- f.ids = append(f.ids, ids...) |
|
| 128 |
+ m = append(m, ids...) |
|
| 128 | 129 |
delete(f.pending, tid) |
| 129 | 130 |
} |
| 130 | 131 |
} |
| 131 |
- sort.Sort(pgids(f.ids)) |
|
| 132 |
+ sort.Sort(m) |
|
| 133 |
+ f.ids = pgids(f.ids).merge(m) |
|
| 132 | 134 |
} |
| 133 | 135 |
|
| 134 | 136 |
// rollback removes the pages from a given pending tx. |
| ... | ... |
@@ -2,6 +2,7 @@ package bolt |
| 2 | 2 |
|
| 3 | 3 |
import ( |
| 4 | 4 |
"bytes" |
| 5 |
+ "fmt" |
|
| 5 | 6 |
"sort" |
| 6 | 7 |
"unsafe" |
| 7 | 8 |
) |
| ... | ... |
@@ -70,7 +71,9 @@ func (n *node) pageElementSize() int {
|
| 70 | 70 |
|
| 71 | 71 |
// childAt returns the child node at a given index. |
| 72 | 72 |
func (n *node) childAt(index int) *node {
|
| 73 |
- _assert(!n.isLeaf, "invalid childAt(%d) on a leaf node", index) |
|
| 73 |
+ if n.isLeaf {
|
|
| 74 |
+ panic(fmt.Sprintf("invalid childAt(%d) on a leaf node", index))
|
|
| 75 |
+ } |
|
| 74 | 76 |
return n.bucket.node(n.inodes[index].pgid, n) |
| 75 | 77 |
} |
| 76 | 78 |
|
| ... | ... |
@@ -111,9 +114,13 @@ func (n *node) prevSibling() *node {
|
| 111 | 111 |
|
| 112 | 112 |
// put inserts a key/value. |
| 113 | 113 |
func (n *node) put(oldKey, newKey, value []byte, pgid pgid, flags uint32) {
|
| 114 |
- _assert(pgid < n.bucket.tx.meta.pgid, "pgid (%d) above high water mark (%d)", pgid, n.bucket.tx.meta.pgid) |
|
| 115 |
- _assert(len(oldKey) > 0, "put: zero-length old key") |
|
| 116 |
- _assert(len(newKey) > 0, "put: zero-length new key") |
|
| 114 |
+ if pgid >= n.bucket.tx.meta.pgid {
|
|
| 115 |
+ panic(fmt.Sprintf("pgid (%d) above high water mark (%d)", pgid, n.bucket.tx.meta.pgid))
|
|
| 116 |
+ } else if len(oldKey) <= 0 {
|
|
| 117 |
+ panic("put: zero-length old key")
|
|
| 118 |
+ } else if len(newKey) <= 0 {
|
|
| 119 |
+ panic("put: zero-length new key")
|
|
| 120 |
+ } |
|
| 117 | 121 |
|
| 118 | 122 |
// Find insertion index. |
| 119 | 123 |
index := sort.Search(len(n.inodes), func(i int) bool { return bytes.Compare(n.inodes[i].key, oldKey) != -1 })
|
| ... | ... |
@@ -189,7 +196,9 @@ func (n *node) write(p *page) {
|
| 189 | 189 |
p.flags |= branchPageFlag |
| 190 | 190 |
} |
| 191 | 191 |
|
| 192 |
- _assert(len(n.inodes) < 0xFFFF, "inode overflow: %d (pgid=%d)", len(n.inodes), p.id) |
|
| 192 |
+ if len(n.inodes) >= 0xFFFF {
|
|
| 193 |
+ panic(fmt.Sprintf("inode overflow: %d (pgid=%d)", len(n.inodes), p.id))
|
|
| 194 |
+ } |
|
| 193 | 195 |
p.count = uint16(len(n.inodes)) |
| 194 | 196 |
|
| 195 | 197 |
// Loop over each item and write it to the page. |
| ... | ... |
@@ -212,11 +221,20 @@ func (n *node) write(p *page) {
|
| 212 | 212 |
_assert(elem.pgid != p.id, "write: circular dependency occurred") |
| 213 | 213 |
} |
| 214 | 214 |
|
| 215 |
+ // If the length of key+value is larger than the max allocation size |
|
| 216 |
+ // then we need to reallocate the byte array pointer. |
|
| 217 |
+ // |
|
| 218 |
+ // See: https://github.com/boltdb/bolt/pull/335 |
|
| 219 |
+ klen, vlen := len(item.key), len(item.value) |
|
| 220 |
+ if len(b) < klen+vlen {
|
|
| 221 |
+ b = (*[maxAllocSize]byte)(unsafe.Pointer(&b[0]))[:] |
|
| 222 |
+ } |
|
| 223 |
+ |
|
| 215 | 224 |
// Write data for the element to the end of the page. |
| 216 | 225 |
copy(b[0:], item.key) |
| 217 |
- b = b[len(item.key):] |
|
| 226 |
+ b = b[klen:] |
|
| 218 | 227 |
copy(b[0:], item.value) |
| 219 |
- b = b[len(item.value):] |
|
| 228 |
+ b = b[vlen:] |
|
| 220 | 229 |
} |
| 221 | 230 |
|
| 222 | 231 |
// DEBUG ONLY: n.dump() |
| ... | ... |
@@ -348,7 +366,9 @@ func (n *node) spill() error {
|
| 348 | 348 |
} |
| 349 | 349 |
|
| 350 | 350 |
// Write the node. |
| 351 |
- _assert(p.id < tx.meta.pgid, "pgid (%d) above high water mark (%d)", p.id, tx.meta.pgid) |
|
| 351 |
+ if p.id >= tx.meta.pgid {
|
|
| 352 |
+ panic(fmt.Sprintf("pgid (%d) above high water mark (%d)", p.id, tx.meta.pgid))
|
|
| 353 |
+ } |
|
| 352 | 354 |
node.pgid = p.id |
| 353 | 355 |
node.write(p) |
| 354 | 356 |
node.spilled = true |
| ... | ... |
@@ -3,12 +3,12 @@ package bolt |
| 3 | 3 |
import ( |
| 4 | 4 |
"fmt" |
| 5 | 5 |
"os" |
| 6 |
+ "sort" |
|
| 6 | 7 |
"unsafe" |
| 7 | 8 |
) |
| 8 | 9 |
|
| 9 | 10 |
const pageHeaderSize = int(unsafe.Offsetof(((*page)(nil)).ptr)) |
| 10 | 11 |
|
| 11 |
-const maxAllocSize = 0xFFFFFFF |
|
| 12 | 12 |
const minKeysPerPage = 2 |
| 13 | 13 |
|
| 14 | 14 |
const branchPageElementSize = int(unsafe.Sizeof(branchPageElement{}))
|
| ... | ... |
@@ -97,7 +97,7 @@ type branchPageElement struct {
|
| 97 | 97 |
// key returns a byte slice of the node key. |
| 98 | 98 |
func (n *branchPageElement) key() []byte {
|
| 99 | 99 |
buf := (*[maxAllocSize]byte)(unsafe.Pointer(n)) |
| 100 |
- return buf[n.pos : n.pos+n.ksize] |
|
| 100 |
+ return (*[maxAllocSize]byte)(unsafe.Pointer(&buf[n.pos]))[:n.ksize] |
|
| 101 | 101 |
} |
| 102 | 102 |
|
| 103 | 103 |
// leafPageElement represents a node on a leaf page. |
| ... | ... |
@@ -111,13 +111,13 @@ type leafPageElement struct {
|
| 111 | 111 |
// key returns a byte slice of the node key. |
| 112 | 112 |
func (n *leafPageElement) key() []byte {
|
| 113 | 113 |
buf := (*[maxAllocSize]byte)(unsafe.Pointer(n)) |
| 114 |
- return buf[n.pos : n.pos+n.ksize] |
|
| 114 |
+ return (*[maxAllocSize]byte)(unsafe.Pointer(&buf[n.pos]))[:n.ksize] |
|
| 115 | 115 |
} |
| 116 | 116 |
|
| 117 | 117 |
// value returns a byte slice of the node value. |
| 118 | 118 |
func (n *leafPageElement) value() []byte {
|
| 119 | 119 |
buf := (*[maxAllocSize]byte)(unsafe.Pointer(n)) |
| 120 |
- return buf[n.pos+n.ksize : n.pos+n.ksize+n.vsize] |
|
| 120 |
+ return (*[maxAllocSize]byte)(unsafe.Pointer(&buf[n.pos+n.ksize]))[:n.vsize] |
|
| 121 | 121 |
} |
| 122 | 122 |
|
| 123 | 123 |
// PageInfo represents human readable information about a page. |
| ... | ... |
@@ -133,3 +133,40 @@ type pgids []pgid |
| 133 | 133 |
func (s pgids) Len() int { return len(s) }
|
| 134 | 134 |
func (s pgids) Swap(i, j int) { s[i], s[j] = s[j], s[i] }
|
| 135 | 135 |
func (s pgids) Less(i, j int) bool { return s[i] < s[j] }
|
| 136 |
+ |
|
| 137 |
+// merge returns the sorted union of a and b. |
|
| 138 |
+func (a pgids) merge(b pgids) pgids {
|
|
| 139 |
+ // Return the opposite slice if one is nil. |
|
| 140 |
+ if len(a) == 0 {
|
|
| 141 |
+ return b |
|
| 142 |
+ } else if len(b) == 0 {
|
|
| 143 |
+ return a |
|
| 144 |
+ } |
|
| 145 |
+ |
|
| 146 |
+ // Create a list to hold all elements from both lists. |
|
| 147 |
+ merged := make(pgids, 0, len(a)+len(b)) |
|
| 148 |
+ |
|
| 149 |
+ // Assign lead to the slice with a lower starting value, follow to the higher value. |
|
| 150 |
+ lead, follow := a, b |
|
| 151 |
+ if b[0] < a[0] {
|
|
| 152 |
+ lead, follow = b, a |
|
| 153 |
+ } |
|
| 154 |
+ |
|
| 155 |
+ // Continue while there are elements in the lead. |
|
| 156 |
+ for len(lead) > 0 {
|
|
| 157 |
+ // Merge largest prefix of lead that is ahead of follow[0]. |
|
| 158 |
+ n := sort.Search(len(lead), func(i int) bool { return lead[i] > follow[0] })
|
|
| 159 |
+ merged = append(merged, lead[:n]...) |
|
| 160 |
+ if n >= len(lead) {
|
|
| 161 |
+ break |
|
| 162 |
+ } |
|
| 163 |
+ |
|
| 164 |
+ // Swap lead and follow. |
|
| 165 |
+ lead, follow = follow, lead[n:] |
|
| 166 |
+ } |
|
| 167 |
+ |
|
| 168 |
+ // Append what's left in follow. |
|
| 169 |
+ merged = append(merged, follow...) |
|
| 170 |
+ |
|
| 171 |
+ return merged |
|
| 172 |
+} |
| ... | ... |
@@ -87,18 +87,21 @@ func (tx *Tx) Stats() TxStats {
|
| 87 | 87 |
|
| 88 | 88 |
// Bucket retrieves a bucket by name. |
| 89 | 89 |
// Returns nil if the bucket does not exist. |
| 90 |
+// The bucket instance is only valid for the lifetime of the transaction. |
|
| 90 | 91 |
func (tx *Tx) Bucket(name []byte) *Bucket {
|
| 91 | 92 |
return tx.root.Bucket(name) |
| 92 | 93 |
} |
| 93 | 94 |
|
| 94 | 95 |
// CreateBucket creates a new bucket. |
| 95 | 96 |
// Returns an error if the bucket already exists, if the bucket name is blank, or if the bucket name is too long. |
| 97 |
+// The bucket instance is only valid for the lifetime of the transaction. |
|
| 96 | 98 |
func (tx *Tx) CreateBucket(name []byte) (*Bucket, error) {
|
| 97 | 99 |
return tx.root.CreateBucket(name) |
| 98 | 100 |
} |
| 99 | 101 |
|
| 100 | 102 |
// CreateBucketIfNotExists creates a new bucket if it doesn't already exist. |
| 101 | 103 |
// Returns an error if the bucket name is blank, or if the bucket name is too long. |
| 104 |
+// The bucket instance is only valid for the lifetime of the transaction. |
|
| 102 | 105 |
func (tx *Tx) CreateBucketIfNotExists(name []byte) (*Bucket, error) {
|
| 103 | 106 |
return tx.root.CreateBucketIfNotExists(name) |
| 104 | 107 |
} |
| ... | ... |
@@ -127,7 +130,8 @@ func (tx *Tx) OnCommit(fn func()) {
|
| 127 | 127 |
} |
| 128 | 128 |
|
| 129 | 129 |
// Commit writes all changes to disk and updates the meta page. |
| 130 |
-// Returns an error if a disk write error occurs. |
|
| 130 |
+// Returns an error if a disk write error occurs, or if Commit is |
|
| 131 |
+// called on a read-only transaction. |
|
| 131 | 132 |
func (tx *Tx) Commit() error {
|
| 132 | 133 |
_assert(!tx.managed, "managed tx commit not allowed") |
| 133 | 134 |
if tx.db == nil {
|
| ... | ... |
@@ -203,7 +207,8 @@ func (tx *Tx) Commit() error {
|
| 203 | 203 |
return nil |
| 204 | 204 |
} |
| 205 | 205 |
|
| 206 |
-// Rollback closes the transaction and ignores all previous updates. |
|
| 206 |
+// Rollback closes the transaction and ignores all previous updates. Read-only |
|
| 207 |
+// transactions must be rolled back and not committed. |
|
| 207 | 208 |
func (tx *Tx) Rollback() error {
|
| 208 | 209 |
_assert(!tx.managed, "managed tx rollback not allowed") |
| 209 | 210 |
if tx.db == nil {
|
| ... | ... |
@@ -234,7 +239,8 @@ func (tx *Tx) close() {
|
| 234 | 234 |
var freelistPendingN = tx.db.freelist.pending_count() |
| 235 | 235 |
var freelistAlloc = tx.db.freelist.size() |
| 236 | 236 |
|
| 237 |
- // Remove writer lock. |
|
| 237 |
+ // Remove transaction ref & writer lock. |
|
| 238 |
+ tx.db.rwtx = nil |
|
| 238 | 239 |
tx.db.rwlock.Unlock() |
| 239 | 240 |
|
| 240 | 241 |
// Merge statistics. |
| ... | ... |
@@ -248,41 +254,51 @@ func (tx *Tx) close() {
|
| 248 | 248 |
} else {
|
| 249 | 249 |
tx.db.removeTx(tx) |
| 250 | 250 |
} |
| 251 |
+ |
|
| 252 |
+ // Clear all references. |
|
| 251 | 253 |
tx.db = nil |
| 254 |
+ tx.meta = nil |
|
| 255 |
+ tx.root = Bucket{tx: tx}
|
|
| 256 |
+ tx.pages = nil |
|
| 252 | 257 |
} |
| 253 | 258 |
|
| 254 | 259 |
// Copy writes the entire database to a writer. |
| 255 |
-// A reader transaction is maintained during the copy so it is safe to continue |
|
| 256 |
-// using the database while a copy is in progress. |
|
| 257 |
-// Copy will write exactly tx.Size() bytes into the writer. |
|
| 260 |
+// This function exists for backwards compatibility. Use WriteTo() in |
|
| 258 | 261 |
func (tx *Tx) Copy(w io.Writer) error {
|
| 259 |
- var f *os.File |
|
| 260 |
- var err error |
|
| 262 |
+ _, err := tx.WriteTo(w) |
|
| 263 |
+ return err |
|
| 264 |
+} |
|
| 261 | 265 |
|
| 266 |
+// WriteTo writes the entire database to a writer. |
|
| 267 |
+// If err == nil then exactly tx.Size() bytes will be written into the writer. |
|
| 268 |
+func (tx *Tx) WriteTo(w io.Writer) (n int64, err error) {
|
|
| 262 | 269 |
// Attempt to open reader directly. |
| 270 |
+ var f *os.File |
|
| 263 | 271 |
if f, err = os.OpenFile(tx.db.path, os.O_RDONLY|odirect, 0); err != nil {
|
| 264 | 272 |
// Fallback to a regular open if that doesn't work. |
| 265 | 273 |
if f, err = os.OpenFile(tx.db.path, os.O_RDONLY, 0); err != nil {
|
| 266 |
- return err |
|
| 274 |
+ return 0, err |
|
| 267 | 275 |
} |
| 268 | 276 |
} |
| 269 | 277 |
|
| 270 | 278 |
// Copy the meta pages. |
| 271 | 279 |
tx.db.metalock.Lock() |
| 272 |
- _, err = io.CopyN(w, f, int64(tx.db.pageSize*2)) |
|
| 280 |
+ n, err = io.CopyN(w, f, int64(tx.db.pageSize*2)) |
|
| 273 | 281 |
tx.db.metalock.Unlock() |
| 274 | 282 |
if err != nil {
|
| 275 | 283 |
_ = f.Close() |
| 276 |
- return fmt.Errorf("meta copy: %s", err)
|
|
| 284 |
+ return n, fmt.Errorf("meta copy: %s", err)
|
|
| 277 | 285 |
} |
| 278 | 286 |
|
| 279 | 287 |
// Copy data pages. |
| 280 |
- if _, err := io.CopyN(w, f, tx.Size()-int64(tx.db.pageSize*2)); err != nil {
|
|
| 288 |
+ wn, err := io.CopyN(w, f, tx.Size()-int64(tx.db.pageSize*2)) |
|
| 289 |
+ n += wn |
|
| 290 |
+ if err != nil {
|
|
| 281 | 291 |
_ = f.Close() |
| 282 |
- return err |
|
| 292 |
+ return n, err |
|
| 283 | 293 |
} |
| 284 | 294 |
|
| 285 |
- return f.Close() |
|
| 295 |
+ return n, f.Close() |
|
| 286 | 296 |
} |
| 287 | 297 |
|
| 288 | 298 |
// CopyFile copies the entire database to file at the given path. |
| ... | ... |
@@ -416,15 +432,39 @@ func (tx *Tx) write() error {
|
| 416 | 416 |
// Write pages to disk in order. |
| 417 | 417 |
for _, p := range pages {
|
| 418 | 418 |
size := (int(p.overflow) + 1) * tx.db.pageSize |
| 419 |
- buf := (*[maxAllocSize]byte)(unsafe.Pointer(p))[:size] |
|
| 420 | 419 |
offset := int64(p.id) * int64(tx.db.pageSize) |
| 421 |
- if _, err := tx.db.ops.writeAt(buf, offset); err != nil {
|
|
| 422 |
- return err |
|
| 423 |
- } |
|
| 424 | 420 |
|
| 425 |
- // Update statistics. |
|
| 426 |
- tx.stats.Write++ |
|
| 421 |
+ // Write out page in "max allocation" sized chunks. |
|
| 422 |
+ ptr := (*[maxAllocSize]byte)(unsafe.Pointer(p)) |
|
| 423 |
+ for {
|
|
| 424 |
+ // Limit our write to our max allocation size. |
|
| 425 |
+ sz := size |
|
| 426 |
+ if sz > maxAllocSize-1 {
|
|
| 427 |
+ sz = maxAllocSize - 1 |
|
| 428 |
+ } |
|
| 429 |
+ |
|
| 430 |
+ // Write chunk to disk. |
|
| 431 |
+ buf := ptr[:sz] |
|
| 432 |
+ if _, err := tx.db.ops.writeAt(buf, offset); err != nil {
|
|
| 433 |
+ return err |
|
| 434 |
+ } |
|
| 435 |
+ |
|
| 436 |
+ // Update statistics. |
|
| 437 |
+ tx.stats.Write++ |
|
| 438 |
+ |
|
| 439 |
+ // Exit inner for loop if we've written all the chunks. |
|
| 440 |
+ size -= sz |
|
| 441 |
+ if size == 0 {
|
|
| 442 |
+ break |
|
| 443 |
+ } |
|
| 444 |
+ |
|
| 445 |
+ // Otherwise move offset forward and move pointer to next chunk. |
|
| 446 |
+ offset += int64(sz) |
|
| 447 |
+ ptr = (*[maxAllocSize]byte)(unsafe.Pointer(&ptr[sz])) |
|
| 448 |
+ } |
|
| 427 | 449 |
} |
| 450 |
+ |
|
| 451 |
+ // Ignore file sync if flag is set on DB. |
|
| 428 | 452 |
if !tx.db.NoSync || IgnoreNoSync {
|
| 429 | 453 |
if err := fdatasync(tx.db); err != nil {
|
| 430 | 454 |
return err |