API 文档#

连接#

dataset.connect(url=None, schema=None, engine_kwargs=None, ensure_schema=True, row_type=<class 'collections.OrderedDict'>, sqlite_wal_mode=True, on_connect_statements=None)[source]#

打开一个新的数据库连接。

url 可以是任何有效的 SQLAlchemy 引擎 URL。如果 url 未定义，它将尝试使用环境变量中的 DATABASE_URL。返回一个 Database 实例。此外，engine_kwargs 将直接传递给 SQLAlchemy，例如，设置 engine_kwargs={‘pool_recycle’: 3600} 将避免 DB 连接超时。将 row_type 设置为另一个类似字典的类以更改存储行的容器类型。

db = dataset.connect('sqlite:///factbook.db')

dataset 的主要功能之一是自动创建表和列，因为数据被插入。此行为可以通过 ensure_schema 参数选择性地禁用。它也可以在许多数据操作方法中使用 ensure 标志覆盖。

如果您想在数据库连接时运行自定义 SQLite pragma，您可以将它们添加为字符串集添加到 on_connect_statements 中。您可以查看完整的 PRAGMA 列表。

注意#

dataset 在连接到数据库时使用 SQLAlchemy 连接池。除了让数据集实例被垃圾回收之外，没有办法显式清除或关闭连接。

数据库#

class dataset.Database(url, schema=None, engine_kwargs=None, ensure_schema=True, row_type=<class 'collections.OrderedDict'>, sqlite_wal_mode=True, on_connect_statements=None)[source]#

数据库对象表示具有多个表的 SQL 数据库。

begin()[source]#

显式进入事务。

在事务提交之前，不会写入任何数据。

commit()[source]#

提交当前事务。

使自事务开始以来执行的所有语句都成为永久性的。

create_table(table_name, primary_id=None, primary_type=None, primary_increment=None)[source]#

创建一个新表。

要么加载一个表，要么在它还不存在的情况下创建它。您可以定义主键字段的名称和类型，如果要创建一个新表。默认情况下是创建一个自动递增的整数，id。您也可以将主键设置为字符串或大整数。如果将 primary_id 定义为文本类型，则调用者将负责其唯一性。您可以通过将 primary_increment 设置为 False 来禁用数字主键的自动递增行为。

返回一个 Table 实例。

table = db.create_table('population')

# custom id and type
table2 = db.create_table('population2', 'age')
table3 = db.create_table('population3',
                         primary_id='city',
                         primary_type=db.types.text)
# custom length of String
table4 = db.create_table('population4',
                         primary_id='city',
                         primary_type=db.types.string(25))
# no primary key
table5 = db.create_table('population5',
                         primary_id=False)

get_table(table_name, primary_id=None, primary_type=None, primary_increment=None)[source]#

加载或创建表。

这现在与 create_table 相同。

table = db.get_table('population')
# you can also use the short-hand syntax:
table = db['population']

load_table(table_name)[source]#

加载表。

如果表不存在于数据库中，这将失败。如果表存在，它的列将被反映，并且在 Table 对象上可用。

返回一个 Table 实例。

table = db.load_table('population')

query(query, *args, **kwargs)[source]#

直接在数据库上运行语句。

允许执行任意读/写查询。查询可以是纯文本字符串，也可以是 SQLAlchemy 表达式。如果传递了一个纯字符串，它将自动转换为表达式。

进一步的位置和关键字参数将用于参数绑定。要在查询中包含位置参数，请在查询中使用问号（即 SELECT * FROM tbl WHERE a = ?）。对于关键字参数，请使用绑定参数（即 SELECT * FROM tbl WHERE a = :foo）。

statement = 'SELECT user, COUNT(*) c FROM photos GROUP BY user'
for row in db.query(statement):
    print(row['user'], row['c'])

返回的迭代器将按顺序生成每个结果。

rollback()[source]#

回滚当前事务。

丢弃从事务开始以来执行的所有语句。

property tables#: 获取数据库中所有存在的表列表。

表#

class dataset.Table(database, table_name, primary_id=None, primary_type=None, primary_increment=None, auto_create=False)[source]#

表示数据库中的一个表，并公开常用操作。

__iter__()[source]#

将表的所有行作为简单的字典返回。

允许在表中迭代所有行，而无需显式调用 find()。

for row in table:
    print(row)

__len__()[source]#: 返回表中行的数量。

all(*_clauses, **kwargs)#

对表执行简单搜索。

只需将关键字参数作为 filter 传递。

results = table.find(country='France')
results = table.find(country='France', year=1980)

使用 _limit

# just return the first 10 rows
results = table.find(country='France', _limit=10)

您可以按单个或多个列对结果进行排序。在列名后附加一个减号表示降序

# sort results by a column 'year'
results = table.find(country='France', order_by='year')
# return all rows sorted by multiple columns (descending by year)
results = table.find(order_by=['country', '-year'])

您还可以提交基于除相等性以外的其他条件的过滤器，有关详细信息，请参阅高级过滤器。

要使用 JOIN 运行更复杂的查询，或执行 GROUP BY 样式的聚合，您还可以使用 db.query() 运行原始 SQL 查询。

property columns#: 获取表中所有存在的列列表。

count(*_clauses, **kwargs)[source]#: 返回给定过滤器集的结果计数。

create_column(name, type, **kwargs)[source]#

创建一个新的列 name，其类型为指定类型。

table.create_column('created_at', db.types.datetime)

type 对应于 dataset.db.Types 中描述的 SQLAlchemy 类型。其他关键字参数将传递给 Column 的构造函数，以便可以设置默认值以及 nullable 和 unique 等选项。

table.create_column('key', unique=True, nullable=False)
table.create_column('food', default='banana')

create_column_by_example(name, value)[source]#

显式创建一个新的列 name，其类型适合存储给定的示例 value。类型是根据与 ensure=True 一起使用的插入方法相同的方式猜测的。

table.create_column_by_example('length', 4.2)

如果同名列已存在，则不执行任何操作，即使它不是我们将要创建的类型。

create_index(columns, name=None, **kw)[source]#

创建索引以加快对表的查询速度。

如果没有给出 name，则会创建一个随机名称。

table.create_index(['name', 'country'])

delete(*clauses, **filters)[source]#

从表中删除行。

关键字参数可用于添加基于列的过滤器。过滤器条件始终为相等性

table.delete(place='Berlin')

如果没有给出任何参数，则会删除所有记录。

distinct(*args, **_filter)[source]#

返回给定 columns 的所有唯一（不同）值。

# returns only one row per year, ignoring the rest
table.distinct('year')
# works with multiple columns, too
table.distinct('year', 'country')
# you can also combine this with a filter
table.distinct('year', country='China')

drop()[source]#

从数据库中删除表。

删除架构和其中的所有内容。

drop_column(name)[source]#

删除列 name。

table.drop_column('created_at')

find(*_clauses, **kwargs)[source]#

对表执行简单搜索。

只需将关键字参数作为 filter 传递。

results = table.find(country='France')
results = table.find(country='France', year=1980)

使用 _limit

# just return the first 10 rows
results = table.find(country='France', _limit=10)

您可以按单个或多个列对结果进行排序。在列名后附加一个减号表示降序

# sort results by a column 'year'
results = table.find(country='France', order_by='year')
# return all rows sorted by multiple columns (descending by year)
results = table.find(order_by=['country', '-year'])

您还可以提交基于除相等性以外的其他条件的过滤器，有关详细信息，请参阅高级过滤器。

要使用 JOIN 运行更复杂的查询，或执行 GROUP BY 样式的聚合，您还可以使用 db.query() 运行原始 SQL 查询。

find_one(*args, **kwargs)[source]#

从表中获取单个结果。

与 find() 的工作方式相同，但只返回一个结果，或 None。

row = table.find_one(country='United States')

has_column(column)[source]#: 检查此表上是否存在具有给定名称的列。

has_index(columns)[source]#: 检查是否存在索引以覆盖给定的 columns。

insert(row, ensure=None, types=None)[source]#

通过将 row 字典插入表中来添加它。

如果设置了 ensure，则行中的任何键都不是表列，它们将被自动创建。

在创建列期间，将检查 types 中是否存在与要创建的列名称匹配的键，并将使用给定的 SQLAlchemy 列类型。否则，类型将从行值中猜测，默认为简单的 unicode 字段。

data = dict(title='I am a banana!')
table.insert(data)

返回插入行的主键。

insert_ignore(row, keys, ensure=None, types=None)[source]#

如果行不存在，则将 row 字典添加到表中。

如果存在具有匹配 keys 的行，则不会进行任何更改。

设置 ensure 会导致自动创建缺少的列，即行的键不是表列。

data = dict(id=10, title='I am a banana!')
table.insert_ignore(data, ['id'])

insert_many(rows, chunk_size=1000, ensure=None, types=None)[source]#

一次添加多行。

这比逐个添加它们快得多。默认情况下，行以每提交 1000 行的块进行处理，除非您指定不同的 chunk_size。

有关其他参数的详细信息，请参阅 insert()。

rows = [dict(name='Dolly')] * 10000
table.insert_many(rows)

update(row, keys, ensure=None, types=None, return_count=False)[source]#

更新表中的行。

更新通过 keys 中声明的一组列名来管理：它们将用作要更新数据的过滤器，使用 row 中的值。

# update all entries with id matching 10, setting their title
# columns
data = dict(id=10, title='I am a banana!')
table.update(data, ['id'])

如果 row 中的键更新表中不存在的列，它们将根据 ensure 和 types 的设置创建，与 insert() 的行为匹配。

update_many(rows, keys, chunk_size=1000, ensure=None, types=None)[source]#: 一次更新表中的多行。

这比逐个更新它们快得多。默认情况下，行以每提交 1000 行的块进行处理，除非您指定不同的 chunk_size。

有关其他参数的详细信息，请参阅 update()。

upsert(row, keys, ensure=None, types=None)[source]#

UPSERT 是插入和更新的巧妙组合。

如果存在具有匹配 keys 的行，它们将被更新，否则将在表中插入新行。

data = dict(id=10, title='I am a banana!') table.upsert(data, ['id'])

upsert_many(rows, keys, chunk_size=1000, ensure=None, types=None)[source]#

将多个输入行排序为 upsert 和 insert。将 insert 传递给 insert，将 upsert 更新。

参见 upsert() 和 insert_many().

数据导出#

注意： 数据导出已提取到独立的包 datafreeze 中。请参见相关仓库此处.