CARVIEW |
Select Language
HTTP/2 200
date: Sun, 27 Jul 2025 12:24:01 GMT
content-type: text/html; charset=utf-8
vary: X-PJAX, X-PJAX-Container, Turbo-Visit, Turbo-Frame, X-Requested-With,Accept-Encoding, Accept, X-Requested-With
x-robots-tag: none
etag: W/"2c69f824bddebbf289b0c84c67ed77ad"
cache-control: max-age=0, private, must-revalidate
strict-transport-security: max-age=31536000; includeSubdomains; preload
x-frame-options: deny
x-content-type-options: nosniff
x-xss-protection: 0
referrer-policy: no-referrer-when-downgrade
content-security-policy: default-src 'none'; base-uri 'self'; child-src github.githubassets.com github.com/assets-cdn/worker/ github.com/assets/ gist.github.com/assets-cdn/worker/; connect-src 'self' uploads.github.com www.githubstatus.com collector.github.com raw.githubusercontent.com api.github.com github-cloud.s3.amazonaws.com github-production-repository-file-5c1aeb.s3.amazonaws.com github-production-upload-manifest-file-7fdce7.s3.amazonaws.com github-production-user-asset-6210df.s3.amazonaws.com *.rel.tunnels.api.visualstudio.com wss://*.rel.tunnels.api.visualstudio.com objects-origin.githubusercontent.com copilot-proxy.githubusercontent.com proxy.individual.githubcopilot.com proxy.business.githubcopilot.com proxy.enterprise.githubcopilot.com *.actions.githubusercontent.com wss://*.actions.githubusercontent.com productionresultssa0.blob.core.windows.net/ productionresultssa1.blob.core.windows.net/ productionresultssa2.blob.core.windows.net/ productionresultssa3.blob.core.windows.net/ productionresultssa4.blob.core.windows.net/ productionresultssa5.blob.core.windows.net/ productionresultssa6.blob.core.windows.net/ productionresultssa7.blob.core.windows.net/ productionresultssa8.blob.core.windows.net/ productionresultssa9.blob.core.windows.net/ productionresultssa10.blob.core.windows.net/ productionresultssa11.blob.core.windows.net/ productionresultssa12.blob.core.windows.net/ productionresultssa13.blob.core.windows.net/ productionresultssa14.blob.core.windows.net/ productionresultssa15.blob.core.windows.net/ productionresultssa16.blob.core.windows.net/ productionresultssa17.blob.core.windows.net/ productionresultssa18.blob.core.windows.net/ productionresultssa19.blob.core.windows.net/ github-production-repository-image-32fea6.s3.amazonaws.com github-production-release-asset-2e65be.s3.amazonaws.com insights.github.com wss://alive.github.com api.githubcopilot.com api.individual.githubcopilot.com api.business.githubcopilot.com api.enterprise.githubcopilot.com; font-src github.githubassets.com; form-action 'self' github.com gist.github.com copilot-workspace.githubnext.com objects-origin.githubusercontent.com; frame-ancestors 'none'; frame-src viewscreen.githubusercontent.com notebooks.githubusercontent.com; img-src 'self' data: blob: github.githubassets.com media.githubusercontent.com camo.githubusercontent.com identicons.github.com avatars.githubusercontent.com private-avatars.githubusercontent.com github-cloud.s3.amazonaws.com objects.githubusercontent.com release-assets.githubusercontent.com secured-user-images.githubusercontent.com/ user-images.githubusercontent.com/ private-user-images.githubusercontent.com opengraph.githubassets.com copilotprodattachments.blob.core.windows.net/github-production-copilot-attachments/ github-production-user-asset-6210df.s3.amazonaws.com customer-stories-feed.github.com spotlights-feed.github.com objects-origin.githubusercontent.com *.githubusercontent.com; manifest-src 'self'; media-src github.com user-images.githubusercontent.com/ secured-user-images.githubusercontent.com/ private-user-images.githubusercontent.com github-production-user-asset-6210df.s3.amazonaws.com gist.github.com; script-src github.githubassets.com; style-src 'unsafe-inline' github.githubassets.com; upgrade-insecure-requests; worker-src github.githubassets.com github.com/assets-cdn/worker/ github.com/assets/ gist.github.com/assets-cdn/worker/
server: github.com
content-encoding: gzip
accept-ranges: bytes
set-cookie: _gh_sess=B%2FU9pB1%2F2XtQOEPnFUC9sCQZ0C8SYKqpj9DdWLYb7ilrdE1VLEYsKSLTaGYkTM%2BNmj5kXoMVT57pK7uPqxAe%2B4%2FyVvLwVWqaumEgptyFz84iR23HH%2F77PPkAHPM7I2JCXHWs3%2BhoYhIV%2Fl0Op%2FM5hwh6yMss3aT%2BymtcQMAg39CenAPhXYh%2FTfhnFC05v4Q6brDt5zFVO7n%2FdeiTJsVPR5%2Bw6Wyn2BzDWBsKLY7tStomvbVl8Fq7qNYNuB%2BK3UaiOgSROFXe3hnHYeCM9c7X%2Bw%3D%3D--2A8oXtGi6orw0elO--25GyVUWsx5LIJZ7pHQ9Raw%3D%3D; Path=/; HttpOnly; Secure; SameSite=Lax
set-cookie: _octo=GH1.1.1927608943.1753619040; Path=/; Domain=github.com; Expires=Mon, 27 Jul 2026 12:24:00 GMT; Secure; SameSite=Lax
set-cookie: logged_in=no; Path=/; Domain=github.com; Expires=Mon, 27 Jul 2026 12:24:00 GMT; HttpOnly; Secure; SameSite=Lax
x-github-request-id: B974:15E61D:D5CA61:116F362:68861A60
Dataset Generator · holdenk/spark-testing-base Wiki · GitHub
Skip to content
Navigation Menu
{{ message }}
-
-
Notifications
You must be signed in to change notification settings - Fork 356
Dataset Generator
Mahmoud Hanafy edited this page Apr 22, 2016
·
2 revisions
DatasetGenerator
provides an easy way to generate arbitrary Datasets, to be able to check any property.
If you don't know scalacheck, I suggest you read about it first; to understand the concepts of properties and generators.
You can generate arbitrary datasets using method arbitraryDataset
. Just create a generator for your required Dataset type or use generators that are supported by default.
Example: (Supported Generator)
class SampleDatasetGeneratorTest extends FunSuite with SharedSparkContext with Checkers {
test("test generating Datasets[String]") {
val sqlContext = new SQLContext(sc)
import sqlContext.implicits._
val property =
forAll(DatasetGenerator.genDataset[String](sqlContext)(Arbitrary.arbitrary[String])) {
dataset => dataset.map(_.length).count() == dataset.count()
}
check(property)
}
}
You can create custom generator for your own datatype.
Example: (Custom Generator)
class SampleDatasetGeneratorTest extends FunSuite with SharedSparkContext with Checkers {
test("test generating Datasets[Custom Class]") {
val sqlContext = new SQLContext(sc)
import sqlContext.implicits._
val carGen: Gen[Dataset[Car]] =
DatasetGenerator.genDataset[Car](sqlContext) {
val generator: Gen[Car] = for {
name <- Arbitrary.arbitrary[String]
speed <- Arbitrary.arbitrary[Int]
} yield (Car(name, speed))
generator
}
val property =
forAll(carGen) {
dataset => dataset.map(_.speed).count() == dataset.count()
}
check(property)
}
}
case class Car(name: String, speed: Int)
You can’t perform that action at this time.