Nathan Wailes - Blog - GitHub - LinkedIn - Patreon - Reddit - Stack Overflow - Twitter - YouTube
Databases / db / sqlite / MongoDB / MySQL
Table of contents
- http://coding-geek.com/how-databases-work/
- Highly rec'd by soham
- https://www.quora.com/Why-doesnt-Quora-use-PostgreSQL
- https://www.quora.com/Why-does-Quora-use-MySQL-as-the-data-store-instead-of-NoSQLs-such-as-Cassandra-MongoDB-or-CouchDB
- Typical Programmer - Why Programmers Don’t Like Relational Databases
- Typical Programmer - Abject-Oriented Databases
- 2014.02.21 - DigitalOcean - SQLite vs MySQL vs PostgreSQL: A Comparison Of Relational Database Management Systems - I thought this was a helpful overview.
Tools
- SQLiteBrowser
- I've used this and it works fine.
Best practices
How/when to use particular data types
- Enum data types
- Use reference tables
- http://komlenic.com/244/8-reasons-why-mysqls-enum-data-type-is-evil/
- IIRC this is what was done at ServiceFusion
- Use reference tables
How/when to use ER Diagrams
- https://news.ycombinator.com/item?id=31255800
- There're some good comments in there from people who have done a lot of ER diagrams talking about the best ways to use an ER diagram and the best software to do it with.
Specific DBs
SQLite
When to use SQLite
- https://www.sqlite.org/whentouse.html
Websites
SQLite works great as the database engine for most low to medium traffic websites (which is to say, most websites). The amount of web traffic that SQLite can handle depends on how heavily the website uses its database. Generally speaking, any site that gets fewer than 100K hits/day should work fine with SQLite. The 100K hits/day figure is a conservative estimate, not a hard upper bound. SQLite has been demonstrated to work with 10 times that amount of traffic.
The SQLite website (https://www.sqlite.org/) uses SQLite itself, of course, and as of this writing (2015) it handles about 400K to 500K HTTP requests per day, about 15-20% of which are dynamic pages touching the database. Dynamic content uses about 200 SQL statements per webpage. This setup runs on a single VM that shares a physical server with 23 others and yet still keeps the load average below 0.1 most of the time.
See also: Hacker New discussion from 2022-12-13.
Also:
- Client/server database engines are designed to live inside a lovingly-attended datacenter at the core of the network.
A good rule of thumb is to avoid using SQLite in situations where the same database will be accessed directly (without an intervening application server) and simultaneously from many computers over a network.
- Ask HN: Have you used SQLite as a primary database?
- The sqlite docs page has a nice article on when to use an embedded database such as sqlite and when to go with a client/server model (postgres, mysql or others)
When not to use sqlite:
- Is the data separated from the application by a network?
- Many concurrent writers?
- Data size > 280 TB
It is exceptionally great if you don't need parallel writes or have many terabytes of data - ie: for most services out there.
When embedding natively, like in a Rust app, the performance is better than any other RDBMs because no network/serialization overhead and being able to use pointers in-process if needed.The DevOps story also is a dream: typically it is just a single file (optionally + some more for journaling) and setup is automated away (most language libs bundle it already), plus it is widely known since smartphone SDKs and all webbrowsers include/expose it.
A subtile advantage: the supported SQL subset is so small, that "if it works in sqlite, it will also work with $RDBMS" in most cases, but not the other way around. I always use it when getting started when in need of relational data, and only had to swap it out for postgres once, but not due to technical/scaling reasons (IT policy change & stuff).
Having said that, it is mind-boggling what kind of load you can handle with a small VPS that runs a Rust microservice that embeds it's own SQLite natively... that would be an expensive cluster of your typical rails/django servers and still have worse performance.
- The sqlite docs page has a nice article on when to use an embedded database such as sqlite and when to go with a client/server model (postgres, mysql or others)
Misc tips
- Installation
- Go to SQLite3 download page
- Go down to the “Precompiled Binaries For Windows” section.
- Download “sqlite-shell” and “sqlite-dll” archive files.
- sqlite-dll-win64-x64-3160200.zip
- Download the sqlite tools .zip
- sqlite-tools-win32-x86-3160200.zip
- Unpack everything in C:\sqlite folder.
- Add that folder to your PATH.
- Go to SQLite3 download page
- Creating the empty .db file:
- Hit the Windows key, type 'cmd', then right-click the 'Command Prompt' result and select 'Run as Administrator'.
- Navigate to the folder where you want the db to be.
- Type 'sqlite3 name_of_the_db.db' and hit 'Enter'.
- Simple example of a schema file:
CREATE TABLE salespeople ( id INTEGER PRIMARY KEY, first_name TEXT NOT NULL, last_name TEXT NOT NULL, commission_rate REAL NOT NULL );
Good to know
- SQLite will allow you to insert a string into an integer field. (Source)
MySQL
- How to get your database to allow emojis and other unusual characters:
ALTER DATABASE dbname CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci
PostgreSQL
Websites
Tutorials
- PostgreSQL Exercises (pgexercises.com)
- This looks good.
Extensions
- PostGIS
- This was one of the major reasons Krieger and Systrom went with Postgres over MySQL. (Source)
Books
- Lists
- Individual books
- PostgreSQL.org - PostgreSQL 9.6.1 Documentation ← This is basically a book.
- Beginner-level
- 2001 - Beginning Databases with PostgreSQL (4/5, only 11 reviews)
- 2005 - Beginning Databases with PostgreSQL: From Novice to Professional (5/5)
- 2002 - Practical PostgreSQL (2.5/5, 17 reviews)
- Intermediate-level
- 2001 - PostgreSQL (Linux) (3/5, 5 reviews)
- 2001 - PostgreSQL Developer's Handbook (4/5)
- 2005 - PostgreSQL (2nd Edition) (4/5, 22 reviews)
- 2010 - PostgreSQL 9.0 High Performance (4.5/5)
- 2014 - PostgreSQL: Up and Running (2nd ed) (3.5/5, 15 reviews)
- 2014 - PostgreSQL 9 High Availability Cookbook (5/5, 6 reviews)
- 2015 - PostgreSQL Server Programming - Second Edition (4.5/5, 5 reviews)
- 2015 - PostgreSQL Developer's Guide (3.5/5, 4 reviews)
- 2015 - PostgreSQL 9 Administration Cookbook (2nd ed) (4.5/5, 2 reviews)
- 2015 - PostgreSQL for Data Architects (5/5, 2 reviews)
- 2015 - Troubleshooting PostgreSQL (5/5, 4 reviews)
- 2015 - Learning PostgreSQL (4/5, 5 reviews)
Snippets
- Get the number of rows in all tables in a database.
SELECT schemaname,relname,n_live_tup
FROM pg_stat_user_tables
ORDER BY n_live_tup DESC;
MongoDB
Using MongoDB:
- Import a bson file into your Mongo /data/ folder: mongorestore -d db_name -c collection_name "C:\path\to\export.bson"
- Make sure to add your MongoDB's 'bin' folder to the PATH to get mongorestore to work from a command prompt.
- Run Mongod: "C:\Program Files\MongoDB\Server\4.0\bin\mongod.exe" --dbpath="c:\data\db"
- Run MongoDB shell: "C:\Program Files\MongoDB\Server\4.0\bin\mongo.exe"
- Show list of dbs: 'show dbs'
- Switch to using a particular db: 'use <db>'
- Show list of collections for the active db: 'show collections'
Queries
- Get all records from a particular collection once you're switched to a particular db
- 'db.<collection_name>.find()'
- Check if a field contains a string
- db.getCollection('collection_name').find({"Party1": {$regex : ".*similarly situated.*"}})
- Find records by non-existence of a field:
- db.mycollection.find({ "price" : { "$exists" : false } })
- Find a record by its id:
- db.collection_name.find({"_id": ObjectId("587862a88362593254464c69")})
- So you can find it by the id's string value, but you need to wrap the string with ObjectId()
- db.collection_name.find({"_id": ObjectId("587862a88362593254464c69")})
Neo4j
General
Tutorials
- Neo4j GraphAcademy
- There's a lot of stuff here
- Neo4j - The Definitive Guide to Graph Databases for RDBMS Developers
- 2015.09 - O'Reilly - Nicole White - Building Web Apps Using Flask and Neo4j
Tools
Pros and cons of graph databases
- Summary
- It seems like Neo4j is helpful for making join-heavy queries orders-of-magnitude faster.
- Links
- http://stackoverflow.com/questions/19570654/do-graph-databases-deprecate-relational-databases
- https://groups.google.com/forum/#!topic/neo4j/mts6H9Py-2I
- https://neo4j.com/developer/graph-db-vs-rdbms/
- "If you use many-to-many relationships, you have to introduce a JOIN table (or junction table) that holds foreign keys of both participating tables which further increases join operation costs. Those costly join operations are usually addressed by denormalising data to reduce the number of joins necessary."
- "This ability of pre-materializing relationships into database structures allows Neo4j to provide performances of several orders of magnitude, especially for join heavy queries, the minutes to milliseconds advantage that many users leverage."