在rPi上选择您的毒药:实体框架,Linq2DB或ADO.NET（译文）

By robot-v1.0

本文链接 https://www.kyfws.com/pi/choose-your-poison-on-the-rpi-entity-framework-lin-zh/

01月01日, 0001 - 19 分钟阅读 - 9059 个词 阅读量 0

在rPi上选择您的毒药:实体框架,Linq2DB或ADO.NET（译文）

原文地址：https://www.codeproject.com/Articles/1274903/Choose-Your-Poison-on-the-rPi-Entity-Framework-Lin

原文作者：Marc Clifton

译文由本站 robot-v1.0 翻译

前言

Performance differences on the rPi

rPi的性能差异

下载程序-4.6 KB(Download program - 4.6 KB)

内容(Contents)

介绍(Introduction)

在(In) 我以前的文章(my earlier article) 在建立包括Postgres的rPi时,我很沮丧地发现Entity Framework(EF)Core平均花费大约8秒钟来建立初始连接并从rPi上运行的Postgres数据库获取一些数据.因此,我决定进一步研究该问题,测试EF Core,(on setting up an rPi including Postgres, I was dismayed to discover that Entity Framework (EF) Core took on an average around 8 seconds to establish an initial connection and obtain some data from the Postgres database running on the rPi. So I decided to dig into the issue further, testing EF Core,) Linq2DB(Linq2DB) 和ADO.NET(旧(, and ADO.NET (the old) DataTable /(/) DataSet /(/) DataAdapter 自.NET 1.0起).(since .NET 1.0).) 注意事项:我的rPi设置了通过USB 2.0的外部便携式硬盘(不是SSD),这可能会或可能不会影响性能.(Something to keep in mind: my rPi is set up with an external spinny hard drive (not SSD) over USB 2.0, which may or may not be affecting performance.) 关于Linq2DB,不要认为这是Linq2SQL的替代品. Linq2DB非常棒,但是不包含更改跟踪之类的功能,因此您现有的Linq2SQL代码只能在模型中更新属性,而" SaveChanges"将无法使用. Linq2DB更精简,并使用(Regarding Linq2DB, do not think this is a replacement for Linq2SQL. Linq2DB is awesome but does not include features like change tracking, so your existing Linq2SQL code, where you simply update the property in the model and “SaveChanges” will not work. Linq2DB is leaner and does partial updates using the) Set 方法.(method.)

NuGet(NuGet)

Linq2DB软件包:(The Linq2DB package:)

有关在rPi(或通常的.NET Core)上设置Postgres的信息,请参阅(For setting up Postgres on the rPi (or .NET Core in general), please see) 我以前的文章(my earlier article) .(.) 安装必要的软件包后,您的NuGet引用应类似于以下内容:(After installing the necessary packages, your NuGet references should look similar to this:)

控制台(The Console)

我编写了控制台以设置各种测试方案,因此您将看到一个选项菜单:(I wrote the console to set up a variety of test scenarios, so you’ll see a menu of options:)

创建100,000条记录(Creating 100,000 records)需要很长的时间.(takes a long time.)

测试代码(The Test Code)

以下是我用于ADO.NET,Linq2DB和EF Core的测试例程.它们都很相似.(Below are the test routines I used for ADO.NET, Linq2DB, and EF Core. They are all very similar.)

架构图(Schema)

由pdgadmin4生成,这是我正在使用的两个测试表-除了表名相同,它们是相同的:(As generated by pdgadmin4, these are the two test tables I’m using – they are identical except for the table names:)

CREATE TABLE public."TestTable"
(
  "ID" integer NOT NULL DEFAULT nextval('"TestTable_ID_seq"'::regclass),
  "FirstName" text COLLATE pg_catalog."default",
  "LastName" text COLLATE pg_catalog."default",
  CONSTRAINT "TestTable_pkey" PRIMARY KEY ("ID")
)
WITH (
  OIDS = FALSE
)
TABLESPACE pg_default;

ALTER TABLE public."TestTable"
  OWNER to pi;

CREATE TABLE public."TestTable2"
(
  "ID" integer NOT NULL DEFAULT nextval('"TestTable_ID_seq"'::regclass),
  "FirstName" text COLLATE pg_catalog."default",
  "LastName" text COLLATE pg_catalog."default",
  CONSTRAINT "TestTable2_pkey" PRIMARY KEY ("ID")
)
WITH (
  OIDS = FALSE
)
TABLESPACE pg_default;

ALTER TABLE public."TestTable2"
  OWNER to pi;

模型(The Models)

public class TestTable
{
  [DatabaseGenerated(DatabaseGeneratedOption.Identity)]
  public int ID { get; set; }
  public string FirstName { get; set; }
  public string LastName { get; set; }
}

public class TestTable2
{
  [DatabaseGenerated(DatabaseGeneratedOption.Identity)]
  public int ID { get; set; }
  public string FirstName { get; set; }
  public string LastName { get; set; }
}

public class Context : DbContext
{
  public Context(DbContextOptions options) : base(options) { }

  public DbSet<TestTable> TestTable { get; set; }
  public DbSet<TestTable2> TestTable2 { get; set; }
}

ADO.NET(ADO.NET)

static void TestPostgresAdoNet()
{
  string connStr = Configuration.GetValue<string>("ConnectionStrings:rpidb");
  DataTable dt = new DataTable();
  long total = 0;
  long ms;

  Console.WriteLine("Connection opening...");
  Stopwatch stopwatch = new Stopwatch();
  stopwatch.Start();
  var conn = new NpgsqlConnection(connStr);
  conn.Open();
  ms = stopwatch.ElapsedMilliseconds;
  total += ms;
  Console.WriteLine(ms + "ms");

  var cmd = new NpgsqlCommand
            ("select \"ID\", \"FirstName\", \"LastName\" from public.\"TestTable\"", conn);
  stopwatch.Restart();
  NpgsqlDataAdapter da = new NpgsqlDataAdapter(cmd);
  da.Fill(dt);
  ms = stopwatch.ElapsedMilliseconds;
  total += ms;
  Console.WriteLine("First Query: " + ms + "ms");
  Console.WriteLine($"Subtotal: {total} ms");
  Console.WriteLine($"Count: {dt.Rows.Count}");
  conn.Close();

  stopwatch.Restart();
  conn = new NpgsqlConnection(connStr);
  conn.Open();
  dt = new DataTable();
  da.Fill(dt);
  ms = stopwatch.ElapsedMilliseconds;
  total += ms;
  Console.WriteLine("Second Query: " + ms + "ms");
  conn.Close();

  stopwatch.Restart();
  conn = new NpgsqlConnection(connStr);
  conn.Open();
  dt = new DataTable();
  da.Fill(dt);
  ms = stopwatch.ElapsedMilliseconds;
  total += ms;
  Console.WriteLine("Third Query: " + ms + "ms");
  conn.Close();

  cmd = new NpgsqlCommand
        ("select \"ID\", \"FirstName\", \"LastName\" from public.\"TestTable2\"", conn);
  stopwatch.Restart();
  conn = new NpgsqlConnection(connStr);
  conn.Open();
  da = new NpgsqlDataAdapter(cmd);
  dt = new DataTable();
  da.Fill(dt);
  ms = stopwatch.ElapsedMilliseconds;
  total += ms;
  Console.WriteLine("Test Table 2: " + ms + "ms");
  Console.WriteLine($"Total: {total} ms");
  conn.Close();
}

ADO.NET +模型(ADO.NET + Model)

static void TestPostgresAdoNetPopulateModel<T>() where T : new()
{
  string connStr = Configuration.GetValue<string>("ConnectionStrings:rpidb");
  DataTable dt = new DataTable();
  long total = 0;
  long ms;

  Console.WriteLine("Connection open...");
  Stopwatch stopwatch = new Stopwatch();
  stopwatch.Start();
  var conn = new NpgsqlConnection(connStr);
  conn.Open();
  ms = stopwatch.ElapsedMilliseconds;
  total += ms;
  Console.WriteLine(ms + "ms");

  var cmd = new NpgsqlCommand
            ("select \"ID\", \"FirstName\", \"LastName\" from public.\"TestTable\"", conn);
  stopwatch.Restart();
  NpgsqlDataAdapter da = new NpgsqlDataAdapter(cmd);
  da.Fill(dt);
  PopulateModel<T>(dt);
  ms = stopwatch.ElapsedMilliseconds;
  total += ms;
  Console.WriteLine("First Query: " + ms + "ms");
  Console.WriteLine($"Subtotal: {total} ms");
  Console.WriteLine($"Count: {dt.Rows.Count}");
  conn.Close();

  stopwatch.Restart();
  conn = new NpgsqlConnection(connStr);
  conn.Open();
  dt = new DataTable();
  da.Fill(dt);
  PopulateModel<T>(dt);
  ms = stopwatch.ElapsedMilliseconds;
  total += ms;
  Console.WriteLine("Second Query: " + ms + "ms");
  conn.Close();

  stopwatch.Restart();
  conn = new NpgsqlConnection(connStr);
  conn.Open();
  dt = new DataTable();
  da.Fill(dt);
  PopulateModel<T>(dt);
  ms = stopwatch.ElapsedMilliseconds;
  total += ms;
  Console.WriteLine("Third Query: " + ms + "ms");
  Console.WriteLine($"Total: {total} ms");
  conn.Close();
}

static List<T> PopulateModel<T>(DataTable dt) where T : new()
{
  Type t = typeof(T);
  var props = t.GetProperties(BindingFlags.Instance | BindingFlags.Public);
  List<T> items = new List<T>();

  foreach (DataRow row in dt.Rows)
  {
    var item = new T();

    foreach (DataColumn col in dt.Columns)
    {
      props.Single(p => p.Name == col.ColumnName).SetValue(item, row[col]);
    }

    items.Add(item);
  }

  return items;
}

Linq2DB(Linq2DB)

static void TestPostgresLinq2Db()
{
  string connStr = Configuration.GetValue<string>("ConnectionStrings:rpidb");
  long total = 0;
  long ms;

  Console.WriteLine("Create Data Connection...");
  Stopwatch stopwatch = new Stopwatch();
  stopwatch.Start();
  var db = PostgreSQLTools.CreateDataConnection(connStr);
  ms = stopwatch.ElapsedMilliseconds;
  total += ms;
  Console.WriteLine(ms + "ms");
  db.Close();

  stopwatch.Restart();
  db = PostgreSQLTools.CreateDataConnection(connStr);
  var recs = db.GetTable<TestTable>().ToList();
  ms = stopwatch.ElapsedMilliseconds;
  total += ms;
  Console.WriteLine("First Query: " + ms + "ms");
  Console.WriteLine($"Count: {recs.Count}");
  Console.WriteLine($"Subtotal: {total} ms");
  db.Close();

  stopwatch.Restart();
  db = PostgreSQLTools.CreateDataConnection(connStr);
  recs = db.GetTable<TestTable>().ToList();
  ms = stopwatch.ElapsedMilliseconds;
  total += ms;
  Console.WriteLine("Second Query: " + ms + "ms");
  Console.WriteLine($"Count: {recs.Count}");
  db.Close();

  stopwatch.Restart();
  db = PostgreSQLTools.CreateDataConnection(connStr);
  recs = db.GetTable<TestTable>().ToList();
  ms = stopwatch.ElapsedMilliseconds;
  total += ms;
  Console.WriteLine("Third Query: " + ms + "ms");
  Console.WriteLine($"Count: {recs.Count}");
  Console.WriteLine($"Total: {total} ms");
  db.Close();

  stopwatch.Restart();
  db = PostgreSQLTools.CreateDataConnection(connStr);
  var recs2 = db.GetTable<TestTable2>().ToList();
  ms = stopwatch.ElapsedMilliseconds;
  total += ms;
  Console.WriteLine("Table 2 Query: " + ms + "ms");
  Console.WriteLine($"Count: {recs2.Count}");
  Console.WriteLine($"Total: {total} ms");
  db.Close();
}

EF核心(EF Core)

static long TestPostgresEfCore()
{
  long firstTotal = 0;
  long total = 0;
  long ms;

  var contextBuilder = new DbContextOptionsBuilder();
  // Database name is case-sensitive
  string connStr = Configuration.GetValue<string>("ConnectionStrings:rpidb");
  contextBuilder.UseNpgsql(connStr);

  Stopwatch stopwatch = new Stopwatch();
  stopwatch.Restart();
  using (var context = new Context(contextBuilder.Options))
  {
    var items = context.TestTable.ToList();
    ms = stopwatch.ElapsedMilliseconds;
    total += ms;
    firstTotal = ms;
    Console.WriteLine("First query: " + ms + "ms");
    Console.WriteLine($"Count: {items.Count}");
    Console.WriteLine($"Subtotal = {total} ms");
  }

  // Query again to see how long a second query takes.
  stopwatch.Restart();
  using (var context = new Context(contextBuilder.Options))
  {
    var items2 = context.TestTable.ToList();
    ms = stopwatch.ElapsedMilliseconds;
    total += ms;
    Console.WriteLine("Second query: " + ms + "ms");
    Console.WriteLine($"Count: {items2.Count}");
}

  // Query again to see how long a third query takes.
  stopwatch.Restart();
  using (var context = new Context(contextBuilder.Options))
  {
    var items2 = context.TestTable.ToList();
    ms = stopwatch.ElapsedMilliseconds;
    total += ms;
    Console.WriteLine("Third query: " + ms + "ms");
    Console.WriteLine($"Count: {items2.Count}");
    Console.WriteLine($"Total: {total} ms");
  }

  // Query again to see how long a different table query takes.
  total = 0;
  stopwatch.Restart();
  using (var context = new Context(contextBuilder.Options))
  {
    var items2 = context.TestTable2.ToList();
    ms = stopwatch.ElapsedMilliseconds;
    total += ms;
    Console.WriteLine("Table 2 query: " + ms + "ms");
    Console.WriteLine($"Count: {items2.Count}");
    Console.WriteLine($"Total: {total} ms");
  }

  return firstTotal;
}

检测结果(Test Results)

在进入代码之前,让我们看一下测试结果.这里的重点是执行第一个查询所花费的时间-查询本身本质上是不相关的,它是由ID,名和姓组成的10行.每个测试都可以直接或通过.NET的连接池使用该连接有效地执行三个查询.所有时间均为毫秒.(Before we get into the code, let’s look at the test results. The focus here is on the time it takes to perform the first query – the query itself is essentially irrelevant, it’s 10 rows consisting of an ID, first name, and last name. Each test performs three queries effectively using the connection, either directly or via .NET’s connection pooling. All times are in milliseconds.)

rPi计时结果(rPi Timing Results)

测试" ADO.NET +模型"可实现反序列化(The test “ADO.NET + Model” implements deserializing the) DataTable 记录到主要出于好奇而实现的.NET对象模型中.(records into a .NET object model that I implemented mainly out of curiosity.)

Windows计时结果(Windows Timing Results)

现在,为了比较,这是我的Windows笔记本电脑上运行相同四个测试的第一个查询的结果(Now, just for comparison, here’s the result for the first query running the same four tests from my Windows laptop)*在rPi上连接到Postgres(connecting to Postgres on the rPi)*通过我的家庭WIFI(无线网络上的笔记本电脑和rPi):(over my home WIFI (both laptop and rPi on the wireless network):)

假设EF查询在rPi上运行客户端代码需要8000毫秒,而在Windows机器上(1500ms)运行相同的客户端代码,则rPi肯定存在性能瓶颈.第二和第三次查询大约需要3毫秒.(Given that the EF query running client code on the rPi takes 8000ms vs. running the same client code on a Windows box (1500ms), there is definitely a performance bottleneck on the rPi. Second and third queries take about 3ms.)

实体框架在做什么？(What is Entity Framework Doing?)

无论采用哪种方法,在rPi上执行第一个查询所花费的时间都是非常糟糕的.但是,我发现了有关实体框架阅读的一些有趣信息(Regardless of the approach you take, the time it takes to perform the first query on the rPi is abysmal. None-the-less, I discovered something interesting about Entity Framework reading) 这样的帖子(this SO post) :(:) “(")使用EF Core,您可以提早作弊并加载模型…这将在另一个线程中创建dbcontext的模型,同时完成应用程序的其余初始化(以及其他预热)和请求的开始.这样,它将早日准备就绪.如果需要,EFCore将等待尚未完成的模型创建.该模型在所有DbContext实例之间共享,因此可以触发并忽略此虚拟dbcontext.(With EF Core, you can cheat and load the model early…This will create the model of the dbcontext in another thread while the rest of the initialization of the application is done (and maybe other warmups) and the beginning of a request. This way, it will be ready sooner. When you need it, EFCore will wait for the Model to be created if it hasn’t finished already. The Model is shared across all DbContext instances so it is ok to fire and forget this dummy dbcontext.)"(") 看起来好像模型不是在另一个线程中创建的.首先这样做:(It doesn’t really seem like the model is created in another thread. Doing this first:)

Console.WriteLine("Forcing model creating...");
Stopwatch stopwatch = new Stopwatch();
stopwatch.Start();

using (var dbContext = new Context(contextBuilder.Options))
{
  var model = dbContext.Model; //force the model creation
  ms = stopwatch.ElapsedMilliseconds;
  total += ms;
  Console.WriteLine(ms + "ms");
}

我注意到在rPi上运行的这些结果:(I note these results, running on the rPi:)

=== Testing Postgres EF ===
Forcing model creating...
4265ms
First query: 4847ms
Count: 10
Subtotal = 9112 ms

请注意,现在如何在"模型创建"和实际执行首次查询之间分配时间(允许的时间,这将使整个过程增加一整秒!).我不愿发现如果有一百个需要"创建"的模型会发生什么.(Notice how the time (granted, this added one whole second to the entire process!) is now split between “model creating” and actually performing the first-time query. I’m loathe to find out what happens if I have a hundred models that need to be “created.") 顺便说一句,在SO帖子的后面,有人评论了设置(Incidentally, further on in the SO post, someone makes a comment about setting) AutoDetectChangesEnabled 和(and) LazyLoadingEnabled 至(to) false .这对性能没有影响(或至少在EF Core中没有影响).(. This has no effect on the performance (or lack thereof, at least in EF Core.))

是每个表还是每个连接有延迟？(Is This Delay Per Table or Per Connection?)

修改代码以查询其他表(Modifying the code to query a different table) TestTable2 ,我们看到查询时间类似于的第二和第三次查询(, we see that the query time is similar to the second and third queries of) TestTable :(:)

=== Testing Postgres EF ===
First query: 4774ms
Count: 10
Subtotal = 4774 ms
Second query: 20ms
Count: 10
Third query: 5ms
Count: 10
Total: 4799 ms

Table 2 query: 4ms
Count: 10
Total: 4 ms

连接持续多长时间？(How Long Does the Connection Persist?)

根据(According to) 这个文章(this writeup) 在ADO.NET 2.0中的SQL Server连接池上:“连接池在空闲大约4至8分钟后会从池中删除连接…“我们可以通过重复测试查询并确定(使用分钟粒度),连接池将丢弃该连接.(on SQL Server connection pooling in ADO.NET 2.0: “The connection pooler removes a connection from the pool after it has been idle for approximately 4-8 minutes…” We can test this out by repeatedly testing a query and determining (with minute granularity) when the connection pool discards the connection.)

int min = 1;
int maxWait = 10;

while (min <= maxWait)
{
  Console.WriteLine("=============================");
  Console.WriteLine($"Waiting {min} minute...");
  int sleepFor = min * 60 * 1000;
  Thread.Sleep(sleepFor);
  long q = TestPostgresEfCore();

  if (q > 1000)
  {
    break;
  }

  ++min;
}

if (min <= maxWait)
{
  Console.WriteLine
     ($"Connection pool released all connections in approximately {min} minutes.");
}

结论是,即使等待了10分钟,连接仍然处于"活动"状态.我只能假定,在.NET Core或Postgres适配器中实现的连接池与在文档中实现的连接是不同的.或者,第三种可能性是(The verdict is that even after waiting 10 minutes, the connection was still “live.” I can only assume that the connection pooling, either as implemented in .NET Core or in Postgres adapter, is implemented differently than from the docs. Or, a third possibility is that there) Thread.Sleep 当所有内容都在同一线程上运行时,这不是测试此方法的正确方法.(is not the right way to test this when everything is running on the same thread.) 我还从上面提到的Microsoft页面注意到:” ADO.NET 2.0引入了两种新方法来清除池:(I also note, from the Microsoft page mentioned above: “ADO.NET 2.0 introduced two new methods to clear the pool:) ClearAllPools 和(and) ClearPool .“执行以下方法:(.” that executing these methods:)

NpgsqlConnection.ClearAllPools();
NpgsqlConnection.ClearPool(conn);

对2没有影响(has no affect on the 2)nd(nd)和3(and 3)rd(rd)查询时间.(query timing.)

查询大数据集(Querying Large Sets of Data)

让我们看一下第二和第三个查询的平均值的性能,读取了100,000条记录:(Let’s look at the performance of the average of the second and third queries, reading 100,000 records:)

实际上,我对Linq2DB的速度感到惊讶.(I’m actually quite surprised as to how fast Linq2DB is.)

多线程查询(Multithreaded Queries)

对于此测试,我正在执行您通常不会在"客户端"上执行的操作,该客户端正在查询以返回测试表中的所有100,000条记录.请记住,数据库正在rPi上运行.(For this test, I’m doing something you normally wouldn’t be doing on the “client”, which is querying to return all 100,000 records in the test table. Remember, the database is running on the rPi.)

同时EF核心查询(Simultaneous EF Core Queries)

请注意,在此代码中,我确保任务数量不会超过逻辑处理器的数量.我还将通过"模型构建-连接-查询"过程来启动EF和一个连接(稍后我们将对此进行处理.)(Note that in this code, I’m ensuring that the number of tasks do not exceed the number of logical processors. I’m also priming EF and one connection with the “model build - connect - query” process (we’ll play with this a bit later on.))

static void EfCoreMultithreading()
{
  var contextBuilder = new DbContextOptionsBuilder();
  string connStr = Configuration.GetValue<string>("ConnectionStrings:rpidb");
  contextBuilder.UseNpgsql(connStr);
  int numProcs = Environment.ProcessorCount;
  Console.WriteLine($"Logical processors = {numProcs}");
  var trial1 = new Task<(DateTime start, DateTime end)>[numProcs];
  var trial2 = new Task<(DateTime start, DateTime end)>[numProcs];

  var stopwatch = new Stopwatch();
  stopwatch.Start();

  // Priming: run a query to deal with how long it takes for the initial connection+query.
  using (var context = new Context(contextBuilder.Options))
  {
    var items = context.TestTable.Where(t => t.ID == -1).ToList();
  }

  long priming = stopwatch.ElapsedMilliseconds;
  stopwatch.Stop();
  Console.WriteLine($"connect + query took {priming}ms");

  var now = DateTime.Now;

  numProcs.ForEachWithIndex(idx => trial1[idx] = StartEfCoreTask());
  Task.WaitAll(trial1);

  numProcs.ForEachWithIndex(idx => trial2[idx] = StartEfCoreTask());
  Task.WaitAll(trial2);

  Console.WriteLine("Trial 1:");
  trial1.ForEach(t =>
  {
    Console.WriteLine($" Start: {(t.Result.start - now).TotalMilliseconds}ms Duration: 
                                {(t.Result.end - t.Result.start).TotalMilliseconds}ms");
  });

  Console.WriteLine("Trial 2:");
  trial2.ForEach(t =>
  {
    Console.WriteLine($" Start: {(t.Result.start - now).TotalMilliseconds}ms Duration: 
                                {(t.Result.end - t.Result.start).TotalMilliseconds}ms");
  });
}

static Task<(DateTime start, DateTime end)> StartEfCoreTask()
{
  return Task.Run(() =>
  {
    var contextBuilder = new DbContextOptionsBuilder();
    string connStr = Configuration.GetValue<string>("ConnectionStrings:rpidb");
    contextBuilder.UseNpgsql(connStr);
    var start = DateTime.Now;

    using (var context = new Context(contextBuilder.Options))
    {
      var items = context.TestTable.ToList();
    }

    var end = DateTime.Now;

    return (start, end);
  });
}

在Windows上运行(Running on Windows)

回想一下,一个"准备好的"查询100,000条记录需要大约2200毫秒才能在单个线程上运行.(Recall that it takes about 2200ms for one “primed” query of 100,000 records to run on a single thread.)

Logical processors = 4
Priming took 1589ms
Trial 1:
Start: 3.3745ms Duration: 3269.5232ms
Start: 3.3745ms Duration: 5263.6895ms
Start: 3.3745ms Duration: 2403.0416ms          <=== re-use of connection?
Start: 3.3745ms Duration: 5459.1072ms
Trial 2:
Start: 5462.846ms Duration: 5404.7242ms
Start: 5462.8351ms Duration: 5399.7877ms
Start: 5462.8351ms Duration: 5403.8799ms
Start: 5462.846ms Duration: 5411.174ms

在rPI上运行(Running on rPI)

回想一下,一次"预备"查询100,000条记录需要大约4000毫秒才能在单个线程上运行.(Recall that it takes about 4000ms for one “primed” query of 100,000 records to run on a single thread.)

Logical processors = 4
Priming took 9221ms
Trial 1:
Start: 24.6442ms Duration: 9613.1905ms
Start: 24.8993ms Duration: 9636.8815ms
Start: 24.6441ms Duration: 9673.5425ms
Start: 24.8986ms Duration: 9655.8864ms
Trial 2:
Start: 9699.3142ms Duration: 9103.8668ms
Start: 9699.3141ms Duration: 8743.3299ms
Start: 9699.314ms Duration: 8969.2488ms
Start: 9699.3142ms Duration: 9024.2545ms

如果我们不使用单个连接"启动"连接池怎么办？(What if we don’t “prime” the connection pool with a single connection?)

Windows进行查询:(Windows making the queries:)
rPi进行查询:(rPi making the queries:)

结果分析(An Analysis of the Results)

要了解这些结果,我们需要记住,我们可以通过减少"连接+查询"时间的模型构建步骤来为EF Core装箱.我本质上是在"启动"测试中通过在主线程上进行初始查询来实现的.(To understand these results, we need to remember that we can prime EF Core with a model building step that reduces the “connect + query” time. I’m essentially doing that in the “priming” tests by making an initial query on the main thread.)

从Windows运行时,似乎一个线程可以重新使用此连接.(It seems that this connection is re-used by one thread when running from Windows.)
当省略模型"构建"启动时,从Windows运行的所有四个线程似乎都需要构建模型,但是在第二次试用中,这种模型构建不是必需的,但是我们再次看到,只有一个线程在利用池化的优势.即使从理论上讲我们应该从第一次试验开始就拥有四个合并的连接.(When the model “building” priming is omitted, all four threads running from Windows appear to require building the model, but on the second trial, this model building isn’t necessary but again we see that only one thread is taking advantage of a pooled connection even though we should, theoretically, have four pooled connections from the first trial.)
除了在这种情况下,在rPi上运行测试会产生相似的结果,而且我不确定关于连接池的结论.我们在第二个试验中看到的时序改进似乎是EF Core构建了该模型的结果,但似乎表明它不够聪明,无法知道它是由其他线程构建的.(Running the test on the rPi yields similar results except that in this case, and I’m not sure what to conclude about connection pooling. The timing improvement we’re seeing in the second trial appears to be the result of EF Core having built the model but seems to indicate that it isn’t smart enough to know that it’s being built by other threads.) 可以通过查看在rPi上仅执行模型构建的时间来证明#3中的假设.为此,我们用以下代码替换完整的connect +查询启动:(The supposition in #3 can be proved by looking at the timing with performing just model building on the rPi. For this, we replace the full connect + query priming with this code:)

using (var dbContext = new Context(contextBuilder.Options))
{
  dbContext.ChangeTracker.AutoDetectChangesEnabled = false; // <----- trick
  dbContext.ChangeTracker.LazyLoadingEnabled = false; // <----- trick
  var model = dbContext.Model; //force the model creation
}

long priming = stopwatch.ElapsedMilliseconds;
stopwatch.Stop();
Console.WriteLine($"Model building took {priming}ms");

结果:(Results:)

Logical processors = 4
Model building took 4330ms
Trial 1:
Start: 25.583ms Duration: 14884.9089ms
Start: 25.1461ms Duration: 14754.3145ms
Start: 25.1461ms Duration: 14797.9605ms
Start: 25.4297ms Duration: 14854.3302ms
Trial 2:
Start: 14911.8329ms Duration: 11586.8008ms
Start: 14911.6562ms Duration: 10127.6391ms
Start: 14911.6561ms Duration: 11580.0408ms
Start: 14911.6561ms Duration: 11671.2985ms

通过使用模型” build"启动,试验1的时间减少了5000ms,并且试验2的时间可能会或可能不会重新使用试验1的连接池.有趣的是,试验2的时间要长1-2秒比使用” build + connect + query"启动时要好.我不会对此做出任何猜测.(By priming with the model “build”, the trial 1 timing is reduced by 5000ms and the trial 2 timing may or may not be re-using the connection pools from trial 1. Interestingly, the timing of trial 2 is 1 - 2 seconds longer than when the “build+connect+query” priming was used. I’m not going to make any guesses as to why that is.) 同样,在Windows上,当我们仅执行"构建"启动步骤时,试用2会花费更长的时间,但是试用1和试用2之间的差异很小.(Similarly on Windows, trial 2 takes longer when we perform just the “build” priming step, but there’s very little difference between trial 1 and trial 2.)

Logical processors = 4
Model building took 816ms
Trial 1:
Start: 3.9416ms Duration: 7166.4065ms
Start: 4.244ms Duration: 8147.1518ms
Start: 3.9416ms Duration: 4244.7402ms                <== why is this one faster?
Start: 3.9416ms Duration: 7848.992ms
Trial 2:
Start: 8151.7212ms Duration: 7786.576ms
Start: 8151.7212ms Duration: 6795.6511ms
Start: 8151.722ms Duration: 6242.4902ms
Start: 8151.722ms Duration: 8786.9947ms

我们确实看到试验1的时间减少了,但是试验2的时间却没有减少,尽管奇怪的是,在试验1中,我们看到一个线程花费了一半的时间.奇怪的是,我们没有看到试用版2在Windows上运行会花费更少的时间,因为它是相同的代码和程序包,因此肯定还有其他事情在进行.(We do see a reduction in trial 1 timing, but we don’t see a reduction in time in trial 2, though oddly, in trial 1, we see one thread that took half the time. It’s odd that we don’t see trial 2 taking less time when running on Windows as it’s the same code and packages, so there must be something else going on.) 此外,当在所有4个内核上同时运行此100,000条记录查询时,我们发现与单个线程查询相比,每个内核将其花费的时间加倍.在单个线程上,rPi平均为4000毫秒,而4个多线程查询大约花费9000毫秒.如果我们仅使用2个内核,理论上为Postgres内部留出2个内核可用,那会发生什么？(Furthermore, when running this 100,000 record query on all 4 cores simultaneously, we see that each core doubles the amount of time it takes as compared to a single thread query. On a single thread, the rPi averages 4000ms, and 4 multithreaded queries took around 9000ms. What happens if we use only 2 cores, theoretically leaving 2 cores free for maybe the internals of Postgres?)

Logical processors = 2
Model + connect + query priming took 9567ms
Trial 1:
Start: 25.1111ms Duration: 5637.642ms
Start: 25.1111ms Duration: 5612.4357ms
Trial 2:
Start: 5663.9105ms Duration: 5467.9455ms
Start: 5663.9104ms Duration: 5528.9189ms

好多了.我们在2个线程上运行的惩罚是28%,而不是在所有四个线程上运行查询的惩罚是56%.(Much better. We’re suffering a 28% penalty running on 2 threads instead of a 56% penalty running the query on all four threads.)

造成灾难(Creating Disaster)

让我们忽略逻辑处理器的数量,并强制8个线程执行查询.我们将仅关注rPi.(Let’s ignore the number of logical processors and force 8 threads to execute the query. We’ll focus just on the rPi.) 他们离开了!前4个线程已超出其门限,并直接走了出去,而其他4个线程似乎离开了门有问题!(And they’re off! The first 4 threads are out of their gates and heading down the straightaway while there seems to be some problem with the other 4 threads leaving the gate!)

Logical processors = 8
Model + connect + query priming took 9452ms
Trial 1:
Start: 24.623ms Duration: 10444.4752ms
Start: 24.623ms Duration: 11347.9301ms
Start: 24.623ms Duration: 11646.0459ms
Start: 24.623ms Duration: 11261.6374ms
Start: 4130.0397ms Duration: 12144.0146ms
Start: 9234.6732ms Duration: 9702.6407ms
Start: 10479.0248ms Duration: 9170.6971ms
Start: 11313.1258ms Duration: 8702.2047ms
Trial 2:
Start: 20016.4353ms Duration: 17668.6654ms
Start: 20016.4353ms Duration: 17616.3623ms
Start: 20016.4353ms Duration: 15696.1627ms
Start: 20016.4352ms Duration: 17027.2403ms
Start: 20017.9037ms Duration: 15747.4193ms
Start: 20017.9037ms Duration: 17643.9682ms
Start: 26699.7751ms Duration: 12650.1307ms
Start: 34472.6065ms Duration: 6260.0021ms

首先,通过注意到开始时间,我们看到.NET线程池节流开始了,但是在试用2中有些困惑,因为它同时启动了6个线程.更糟糕的是,一个查询大约需要4000毫秒,而这要花费17500毫秒,与单线程查询相比要付出77%的代价,这主要是因为在试用2中,线程池同时启动了六个线程.(First, by noticing the start times, we see the .NET thread pool throttling kicking in but getting a bit confused in trial 2 as it starts off 6 threads simultaneously. At it’s worse, a query that should take around 4000ms is taking 17,500ms, a penalty of 77% as compared to a single thread query, mainly because in trial 2, the thread pool started of six threads simultaneously.)

同时Linq2DB查询(Simultaneous Linq2DB Queries)

启动(Priming)

var db = PostgreSQLTools.CreateDataConnection(connStr);
var recs = db.GetTable<TestTable>().Take(1).ToList();

Linq2DB查询任务(The Linq2DB Query Task)

static Task<(DateTime start, DateTime end)> StartLinq2DBTask()
{
  return Task.Run(() =>
  {
    string connStr = Configuration.GetValue<string>("ConnectionStrings:rpidb");
    var start = DateTime.Now;
    var db = PostgreSQLTools.CreateDataConnection(connStr);
    var recs = db.GetTable<TestTable>().ToList();
    var end = DateTime.Now;

    return (start, end);
  });
}

让我们做数字(Let’s do the Numbers)

回想一下,对100,000个记录工具Linq2DB的单线程查询少于1400ms.运行在4个内核上,我们看到大约65%的罚款.(Recall that a single thread query of 100,000 records tool Linq2DB a little less than 1400ms. Running on 4 cores, we see around a 65% penalty.)

Logical processors = 4
priming took 6943ms
Trial 1:
Start: 14.6684ms Duration: 3829.2075ms
Start: 14.6683ms Duration: 3872.5ms
Start: 14.8168ms Duration: 3850.6253ms
Start: 8.7451ms Duration: 3840.4848ms
Trial 2:
Start: 3888.2774ms Duration: 3646.7527ms
Start: 3888.4093ms Duration: 3609.2581ms
Start: 3888.2774ms Duration: 3623.5665ms
Start: 3888.3238ms Duration: 3134.8659ms

运行在2个内核上,我们在试验1中看不到罚款,而在试验2中看不到33%的罚款.不知道为什么会这样,但是似乎是一致的.(Running on 2 cores, we see no penalty in trial 1 and a 33% penalty in trial 2. No idea why that is but it seems consistent.)

Logical processors = 2
priming took 7095ms
Trial 1:
Start: 8.7998ms Duration: 1458.5924ms
Start: 8.7998ms Duration: 1467.26ms
Trial 2:
Start: 1477.2797ms Duration: 2092.9932ms
Start: 1477.4778ms Duration: 2085.4642ms

结论(Conclusion)

无论我们是从Windows还是从rPi本身连接到rPi上的Postgres,我都认为初始连接和查询时间是不可接受的.我认为这与我将操作系统和应用程序安装在外部USB驱动器上并利用rPi上的USB 2.0接口无关,而与.NET Core中的基础实现无关.此外,在设置初始连接+查询时,EF Core显然会有很多开销,只能通过强制EF Core创建模型来部分缓解.我认为这是.NET Core"问题"的结论还基于以下事实:我从未使用过Postgres遇到过此类延迟(I consider the initial connection and query time to be unacceptable regardless of whether we’re connecting to Postgres on rPi from Windows or the rPi itself. I don’t think this is related to my having installed the OS and apps on an external USB drive and utilizing the USB 2.0 interface on the rPi but rather the underlying implementation in .NET Core. Furthermore, EF Core clearly has a lot of overhead when setting up that initial connection + query, which can only be partially mitigated by forcing EF Core to create the model. My conclusion that it’s a .NET Core “problem” is also based on the fact that I’ve never experienced these kinds of delays using the Postgres) pgadmin 工具.(tools.)

解决方法(Workarounds)

当应用程序启动时,打开连接并查询表-任何查询都可以,即使是不返回任何记录的查询也可以.即使我没有体验到连接池删除连接的经历,即使10秒钟之后,也应该以一定的间隔执行此"打开并查询"活动.(When the application starts up, open the connection and query a table – any query will do, even one that doesn’t return any records. Even though I didn’t experience the connection pool removing connections, even after 10 seconds, it might be wise to perform this “open & query” activity at some regular interval.)
尽可能避免长时间运行的查询.这可能涉及优化查询,例如,让数据库求和或取平均值,而不是在客户端执行.(Avoid long-running queries if at all possible. This might involve optimizing the query, for example, let the database sum or average a value, don’t do it on the client-side.)
在rPi上,查询不要超过2个内核.这意味着您将必须管理自己的线程池,但是想法是我们需要其他2个内核来处理其他进程.(On the rPi, don’t exceed 2 cores for queries. This means you’ll have to manage your own thread pool, but the idea is that we need the other 2 cores for other processes.)
考虑不使用EF Core,而是使用Linq2DB.如果您是重新启动项目而不是移植现有的EF项目,则最容易实现.(Consider not using EF Core and instead using Linq2DB. This is easiest accomplished if you’re starting a project fresh rather than porting an existing EF project.) 可能最重要的解决方法是,不要在进行查询的同一个rPi上运行像Postgres这样的数据库服务器. rPi的价格为35美元,足够便宜,您可以在一个rPi上运行数据库,而在第二个rPi上运行应用程序(网络服务器,无论如何).在某个时候,我会尝试一下,但是我需要另一个rPi,或者也许我会使用我拥有的Beaglebone之一.(Probably the most important workaround is not to run a database server like Postgres on the same rPi as is making the queries. rPi’s are $35, cheap enough that you can run the database on one rPi and the application (web server, whatever) on a second rPi. At some point, I’ll give that a try, but I need another rPi, or maybe I’ll use one of the Beaglebone’s I have.)

历史(History)

20(20)日(th)2019年1月:初始版本(January, 2019: Initial version)

许可

本文以及所有相关的源代码和文件均已获得The Code Project Open License (CPOL)的许可。

C#7 C# Raspberry .NET-Core .NET LINQ ADO.NET Dev VS2017 JSON 新闻翻译